Machine Learning Data Analyst

Open to Remote
About the role

Andrej Karpathy, the director of the self-driving car unit at Tesla Motors, spent most of his PhD labeling a vast number of images during the construction of the ImageNet dataset. This work gave him key intuitions about the nature of image classification, enabling him to build the state-of-the-art computer vision model at the time. ImageNet went on to kickstart our current AI renaissance. 

We're looking for someone to work side-by-side with our machine learning engineers and perform a similar function: we need an expert to label our most difficult and ambiguous data. They’ll also need to clearly communicate their intuitions and insights.

We’ll be building the world’s best dataset of skills, beginning with software engineering skills, so you should be intimately familiar with the world of software. This dataset could unlock a lot of useful insight about the world: most people don’t know what skills they have or need and what jobs are available to them. Developing the capability to quantify skills would unlock tools that could help solve this problem.

This is gritty work, but also one of the most important tasks at Sourceress. The quality of your work will determine the performance of our machine learning models, which are at the very core of our business.

About Sourceress

Our mission is to help people find work that matters. We believe that the world is better when people understand the opportunities available to them. Our human-assisted AI platform delivers great results to our customers (customer quote: "I'd have a panic attack if you guys stopped existing").

Because of this, we raised $3.5M from OpenAI researchers and Lightspeed Venture Partners at one of the highest ever valuations coming out of YC. Our team has previously sold companies, published machine learning research, has Dropbox's former Chief of Staff, and hails from MIT, Google, Airbnb, McKinsey, etc.

Help us create a world where all 7 billion people work at jobs that they love, do things that they’re great at, and work for companies that are solving meaningful problems.

Read more on our blog

See our values here


    • You’ll spend most of your time labeling data. You’ll work alongside an existing contracting team of several dozen and be responsible for our most quantifiably difficult or ambiguous datasets.
    • You’ll work closely with machine learning engineers to refine specs or, in some cases, redesign datasets from the ground up.
    • You’ll label and categorize edge cases, which we’ll use to test future model improvements; this way we can be sure we don’t introduce performance regressions.
    • You’ll work with machine learning engineers to debug strange model behavior.
    • You’ll suggest missing model features to machine learning engineers by reflecting on your intuitions and decision making process, improving our models' ability to pick up on signals in the data.
    • You’ll be responsible for generating ideas for how to streamline and improve the internal tools you use.


    • Knowledge of software engineering, tools, libraries, etc
    • Grit.
    • Strong communication skills.
    • Reflective. You should be able to explain why you make decisions.
    • Excellent judgment in the face of ambiguity.
    • Ideal candidates will have good product and design skills.

This role is exceptionally remote-friendly. About half of our team is full-time remote, our San Francisco office has “portals”(a large TV, high quality microphone, and webcam) in every well-trafficked room, and remote team members even participate in lunch conversations and book club.

You'll also learn a lot about how a hyper-growth AI-powered startup works from the inside. You’ll gain intimate familiarity with what high-quality data looks like, what internal tooling is effective, how to iterate on features and models, insights from working with machine learning engineers, etc.