Data Engineer

San Francisco
Member of Technical Staff
Data quality and quantity can make or break any machine learning application. Here at OpenAI we are looking for a data engineer to lead dataset creation, curation, and management for a wide variety of applied and research projects. You’ll be an integral part of a team of software and machine learning engineers and research scientists working on some of the most cutting-edge AI projects in the field.

You will:

    • Work with large and complex raw data sets from project start to finish
    • Develop and apply machine learning-based cleaning and curation techniques, innovating and pushing the boundaries of existing methods
    • Create, curate and sculpt new data sets and develop better systems for doing so
    • Develop and scale data architecture for your team and others
    • Design and build end to end data systems that can be scaled across the company
    • Work closely with ML Engineers, Software Engineers and Researchers on a daily basis

You’ll be a good fit for this role if you are:

    • Results-driven and enjoy working closely with a team
    • Comfortable and excited by working in large, distributed systems
    • Excited to develop and apply new and existing techniques
    • Familiar with the basics of machine learning
    • Engaged by OpenAI’s mission of building safe and beneficial artificial general intelligence.


    • Health, dental, and vision coverage for you and your family
    • Unlimited paid time off and generous parental leave
    • Lunch and dinner each day
    • 401(k) plan
About OpenAI

OpenAI's mission is to build safe artificial general intelligence (AGI), and ensure AGI's benefits are as widely and evenly distributed as possible. We expect AI technologies to be hugely impactful in the short-term, but their impact will be outstripped by that of the first AGIs.

We focus on long-term research, working on problems that require us to make fundamental advances in AI capabilities. By being at the forefront of the field, we can influence the conditions under which AGI is created. As Alan Kay said, "The best way to predict the future is to invent it."

We publish at top machine learning conferences, open-source software tools for accelerating AI research, and release blog posts to communicate our research. We will not keep information private for private benefit, but in the long term, we expect to create formal processes for keeping technologies private when there are safety concerns.