Data Management Engineer

Redwood City, CA / Remote /
Technology – Data /
Full Time
Citrine’s Data Engineering team works closely with our customers to solve challenging problems at the intersection of material science, data management, and machine learning.

About Citrine

At Citrine, we’re changing the way new materials are developed.

We are the industry leader in materials informatics, the application of data-driven methods to materials and chemicals development. Our platform provides data management and AI tools that help our customers rapidly develop better, more sustainable materials. Our users are scientists and engineers at huge manufacturing and materials companies, and researchers at leading universities and government labs. Our platform enables our users to accelerate the development of new materials.

In 2020 Citrine was recognized for technology innovation by the Global CleanTech Group and was named one of the most promising AI startups by CB Insights. As a team, we are ambitious with our goals, passionate about our vision, and eager to grow and learn from each other. Our team is growing fast and looking for the best to join us.

Though our technology was originally built by materials scientists, our team now consists of professionals trained in a diverse set of fields, including data science, physics, biology, and computer science. We have offices in the San Francisco Bay Area, Chicago, and Pittsburgh, and our customers include Fortune 1000 materials and product companies.

About the Role

Citrine’s Data Engineering team works closely with our customers to solve challenging problems at the intersection of material science, data management, and machine learning. As a Data Management Engineer, you will work closely with our customers to understand their current scientific data generation, storage and retrieval systems. You will be required to create advanced application integration solutions and configure, deploy and enhance ETL pipelines that seamlessly integrate existing customer systems with the Citrine Platform. 

Data are the lifeblood of both Citrine and our customers. To our customers, their data not only represent the distilled knowledge of decades worth of research, but also the foundation from which they can build machine learning models of materials behavior using the Citrine platform. The Data Management Engineer will be on the front line of our customer interactions, working with customers to determine the best path for integrating the Citrine data platform with their data sources.

Working at Citrine offers the rare opportunity to collaborate with applied scientists at the leading edge of statistical learning theory and application. Here are a few representative peer-reviewed publications describing research done at Citrine in support of the platform’s AI capabilities:

Assessing the Frontier: Active Learning, Model Accuracy, and Multi-objective Materials Discovery and Optimization (2019). at https://arxiv.org/abs/1911.03224 
Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery (2018). at https://doi.org/10.1039/C8ME00012C
Overcoming data scarcity with transfer learning. (2017). at https://arxiv.org/abs/1711.05099 
High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates. (2017). at https://doi.org/10.1007/s40192-017-0098-z 

Responsibilities

    • Engage directly with customers to understand their scientific & enterprise data and data systems.
    • Direct the design, development and maintenance of data integration solutions according to customer requirements
    • Establish best practice protocols for data management
    • Author, maintain, and deploy data handling scripts in customer’s cloud and on-premise infrastructures. 
    • Determine how data should be integrated onto the Citrine platform to maximize its scientific and business value without incurring unnecessary ingestion effort.
    • Setup, monitor, and troubleshoot pipeline jobs and processes.

Skills and Qualifications

    • Material Science/Chemistry/Chemical Engineering experience
    • Experience with data systems in the materials/chemicals industry
    • Experience building, scheduling, scaling and maintaining ETL pipelines
    • Strong experience with relational databases (any vendor)
    • Strong programming in Python (not just scripting)
    • Excellent technical communication skills, written and verbal

Preferred Skills and Qualifications

    • Experience setting up and managing data systems in an industrial materials science/chemistry environment (ELN, LIMS, SAP).
    • Experience with pipeline automation and integration tools (Airflow, Luigi, Tibco etc.).
    • Experience with Java / Scala.

Equal Opportunity

Citrine Informatics is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, creed, color, or national origin.

Our Benefits (for exempt, full time employees based within the United States)

401k with matching up to 4% of salary
Medical, vision, dental insurance (we pay 100% of your premium and 75% of your dependents)
Equity options within the company
Parental leave
Flexible PTO on top of our 14 paid company holidays (includes your birthday!)
Free financial counseling 
$250 tech allowance