Software Engineer (ML Serving)

San Francisco, CA /
Engineering /
About the role:
Anyscale is looking to hire strong engineers to build next generation high-performance machine learning serving systems (both our open source libraries and SaaS offering).

Much of the tooling used to serve ML models today is inherited from the previous generation of infrastructure, but emerging ML applications come with a new set of requirements: high compute requirements, the need for specialized hardware, and composing many different models along with business logic in a single request.

Our goal is provide a simple but powerful set of tools to make bringing complex ML applications to production a reality.

About the ML Serving team:
The ML Serving team’s mission is to build world class systems for serving ML models in production. Part of this work is building and maintaining the open source Ray Serve library, as well as contributing directly to the Anyscale platform used by our customers to run mission-critical applications.

Much of our work is user-facing: you’ll have the opportunity to collaborate with open source users and customers from small startups with lean ML engineering teams to industry-leading companies using Ray, such as Uber, Shopify, and ByteDance.

As part of this role, you will:

    • Building a highly available service for ML model serving.
    • improve Ray Serve and our other libraries to make authoring next-generation production ML applications as easy as possible.
    • Improve our autoscaling capabilities to drive performance improvements & cost savings.
    • Reduce latency and improve throughput for single- and multi-model serving.

We'd love to hear from you if you have:

    • Solid background in algorithms, data structures, system design.
    • Experience working with modern machine learning tooling (PyTorch, Tensorflow, JAX).
    • At least 1 year of relevant work experience (new grads should apply to a separate job posting).

Bonus points!

    • Experience building and maintaining an open source project.
    • Experience building and operating machine learning infrastructure in production.
    • Experience building highly available serving systems.
About Anyscale:
Anyscale provides an application development platform for developers to build distributed applications. We’re commercializing a popular open source project called Ray, which is a framework for distributed computing as well as an ecosystem of libraries for scalable machine learning. Our goal is to build a standardized platform for distributed computing. Ray was developed at UC Berkeley by Robert Nishihara and Philipp Moritz, under the guidance of Ion Stoica and Michael Jordan, and the four of them have co-founded Anyscale. The company raised a $20.6M Series A and a $40M Series B funding from Andreessen Horowitz (a16z), NEA, Foundation Capital, Intel Capital, Ant Financial, Amplify Partners, 11.2 Capital, and The House Fund.
With Ray, we're making it easy to program at any scale (from your laptop to the datacenter) by providing easy-to-use, general-purpose, and high-performance tools. In addition, we are building a rich ecosystem of libraries (for reinforcement learning, hyperparameter search, experiment management, machine learning training, prediction serving, and more) on top of the core distributed system so that users can rapidly build sophisticated applications. Help us build the future of software development.
Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.