Senior MLOps Engineer

Remote /
Engineering – Active Learning /
/ Remote
Labelbox’s mission is to build the best products to align with artificial intelligence. Real breakthroughs in AI are reliant on the quality of the training data. Labelbox's data engine enables organizations to dramatically improve the quality of their training data, which makes their machine learning models more accurate and performant. We are determined to build software that is more open, easier-to-use, and singularly focused on helping our customers get to production AI faster.

Current Labelbox customers are transforming industries within insurance, retail, manufacturing/robotics, healthcare, and beyond. Our platform is used by Fortune 500 enterprises including Allstate, Black + Decker, Bayer, Warner Brothers and leading AI-focused companies including FLIR Systems and Caption Health. We are backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures (Google's AI-focused fund), Databricks Ventures, Snowpoint Ventures and Kleiner Perkins.

About Active Learning

In machine learning, data curation is only part of the battle to create a successful model. Once data is labeled, it is used to train, validate, and test models with the goal of production deployment. Upon validation, an engineer may discover a model may perform well at predicting one particular subset or class of data, but struggle on another. 

The mission of the Active Learning team is to enable rapid model iteration with tooling and workflows to surface insights into model performance after completing the training process. We seek to give engineers the tools they need to validate models against ground truth data, spot model inaccuracies, identify gaps in cohorts of training data, and initiate workflows to improve that data for future model versions.

About the Role

As our MLOps engineer, you will be responsible for building our model inference and training infrastructure for both internal and external users. This includes model serving using Tensorflow Serving or TorchServe; performance optimization; monitoring, maintenance, and reporting; integration with labeling and data curation processes; development of generic training and inference services; as well as debugging and troubleshooting.

About You

    • Strong software engineering skills. Experience working with distributed systems.
    • You have experience designing microservices and data processing pipelines at scale.
    • Working machine learning knowledge of either LLMs or Video Models.
    • Ability to modify and train open source models.
    • Experience with optimizing models for production deployments (e.g. architecture modifications, quantization, or fusing layers).
    • Experience with distributed training and/or inference.
    • Previous experience deploying systems for efficient batch or online ETL.
Labelbox strives to ensure pay parity across the organization and discuss compensation transparently.  The expected annual base salary range for this United States based position is $170,000 - $215,000. This range is not inclusive of any potential equity packages or additional benefits. Exact compensation varies based on a variety of factors, including skills and competencies, experience, and geographical location.

Do great work. From anywhere.

We hire great people regardless of where they live. Work wherever you’d like as reliable internet access is our only requirement. We communicate asynchronously, work autonomously, and take ownership of our work.