Machine Learning Operations Engineer

New York, New York

AI & Data – Machine Learning /

Full Time /

Remote

Apply for this job

What you’ll do:

Design, architect and develop inference infrastructure for models and services which are scalable and can handle a large number of simultaneous requests
Apply Engineering best practices – Automation, code reviews, integration tests, performance load tests and CI/CD
Collaborate with cross-functional team (product, engineering, research) to solve complex engineering challenges

Requirements:

Operational experience on a production system that hosts LLMs
Experience with GCP cloud
Experience with building, deploying, and maintaining Kubernetes production clusters
Experience with deploying infrastructure as code (Terraform, Google Deployment Manager, etc.)
Strong experience with Python and/or Java/Kotlin/Rust/Go
Strong experience operating on large volumes of data on the cloud (e.g. vector search, object storage, key/val store, relational databases, etc.)
Experience with software engineering and CI/CD best practices and deployment of AI models and services in production

The US base salary range for this role is $136,000 - $209,000, not inclusive of equity + benefits. Our salary ranges are determined by experience, skills, qualifications and location. The provided range on each posting details the minimum and maximum across all applicable locations in the United States. Your recruiter can share more information regarding our benefits package, equity as well as sales commissions (if applicable).

Apply for this job