Data Engineer

San Francisco
Data Engineering
Full time
Introducing Crux, a network where data is simple. So businesses can get straight to the point.

We clean, normalize, and enrich datasets, delivering them delightfully through our platform in the cloud. That way, businesses can say goodbye to the burdens of data, and hello to the benefits. Our team provides data engineering support to our partners, so they have more time and energy for finding insights and creating value.

Because the future depends on data. Let’s make it delightful.  

 
The Role:
By market definition, a data engineer is responsible for creating, maintaining and understanding data and the resulting delivery infrastructure. They are the connection between smart business users and not-so-smart data repositories. They are capable (through a solid command of various scripting languages (Python, R, SQL)) of taking any source of data and performing an EVL(ST) Extract, Validate, Load, Standardize, Transform to the correct data store, in the form agreed upon by the Data Engineer and the end-user. Data Engineers are often responsible for the efficacy, quality and elegance of their solutions. They are business savvy and understand the importance of the data they are piping in - to an extent.

Responsibilities

    • Contribute to the design and development of our Python data workflow management platform
    • Design and develop tools to wrangle datasets of small and large volumes of data into cleaned, normalized, and enriched datasets
    • Build and enhance a large, scalable Big Data platform (Spark, Hadoop)
    • Refine processes for normalization and performance-tuning analytics

About You

    • You love building elegant solutions that scale
    • You bring deep experience in the architecture and development of quality backend production systems, specifically in Python
    • You love working on high-performing teams, collaborating with team members, and improving our ability to deliver delightful experiences to our clients
    • You are excited by the opportunity to solve challenging technical problems, and you find learning about data fascinating
    • You understand Server, Network, and Hosting Environments, RESTful and other common APIs, common data distribution, and hosted storage solutions


    • Must Have:
    • 5+ years of full-time experience in a professional environment
    • Expertise in Python
    • Experience with ETL and/or other big data processes
    • Experience with at least 2 popular big data / distributed computing frameworks, eg. Spark, Hive, Kafka, Map Reduce, Flink
    • Experience working independently, or with minimal guidance
    • Strong problem solving and troubleshooting skills
    • Ability to exercise judgment to make sound decisions
    • Proficiency in multiple programming languages
    • Strong communications skills, interpersonal skills, and a sense of humor

    • Even Better
    • Data skills: RDBMS SQL and NOSQL, structured and unstructured data, BigQuery
    • Proficiency in Jupyter, C24; familiarity with ETL, CDC, and workflow tools
    • Experience working in a cloud-based environment, such as GCP or AWS


At Crux, diversity is valued and and treatment of employees and applicants are based on merit, talent and qualification. We encourage people from underrepresented groups to apply.  We believe the key to success is bringing together unique perspectives and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. For qualified applicants with criminal histories, consideration will be consistent with the requirements of the San Francisco Fair Chance Ordinance. All your information will be kept confidential according to EEO guidelines.