Machine Learning Operations Engineer

New York, New York
AI & Data – Machine Learning /
Full Time /
Remote

What you’ll do:

    • Design, architect and develop inference infrastructure for models and services which are scalable and can handle a large number of simultaneous requests
    • Apply Engineering best practices – Automation, code reviews, integration tests, performance load tests and CI/CD
    • Collaborate with cross-functional team (product, engineering, research) to solve complex engineering challenges

Requirements:

    • Operational experience on a production system that hosts LLMs
    • Experience with GCP cloud
    • Experience with building, deploying, and maintaining Kubernetes production clusters
    • Experience with deploying infrastructure as code (Terraform, Google Deployment Manager, etc.)
    • Strong experience with Python and/or Java/Kotlin/Rust/Go
    • Strong experience operating on large volumes of data on the cloud (e.g. vector search, object storage, key/val store, relational databases, etc.)
    • Experience with software engineering and CI/CD best practices and deployment of AI models and services in production
The US base salary range for this role is $136,000 - $209,000, not inclusive of equity + benefits. Our salary ranges are determined by experience, skills, qualifications and location. The provided range on each posting details the minimum and maximum across all applicable locations in the United States. Your recruiter can share more information regarding our benefits package, equity as well as sales commissions (if applicable).