Software Engineer (ML Ecosystem)

San Francisco, CA /
Engineering /
Full-time
About the role:
Anyscale is looking to hire strong individuals to develop open source machine learning libraries.

The software industry largely operates on a messy zoo of specialized distributed systems such as Spark, Horovod, and TensorFlow Serving. These systems cannot easily be composed together and used as elements of a larger application. On the Machine Learning Ecosystem team at Anyscale, we are developing a rich ecosystem that will allow developers to import powerful distributed libraries and compose them together to build new applications.

Part of this work will be open source as part of Ray, which is a distributed Python execution engine as well as an ecosystem of libraries for scalable machine learning.

About the ML Ecosystem team:
The ML Ecosystem team’s mission is to make it really easy to do distributed machine learning on Ray and Anyscale. Specifically, our team maintains and develops features for a broad number of libraries — including RaySGD (distributed deep learning), Ray Tune (distributed hyperparameter tuning), RLlib (reinforcement learning), and XGBoost-on-Ray.

Our team is the most user-facing engineering team on the open source side, collaborating with ML engineering teams at organizations like Shopify, Uber, and Bytedance.

As part of this role, you will:

    • Build elastic, scalable, fault-tolerant distributed machine learning libraries that power the next generation of machine learning platforms around the world
    • Benchmark and improve performance and scalability of different machine learning libraries
    • Work closely with other engineers developing Ray to build core abstractions and simplify machine learning services for open source users
    • Work closely with the open source community (with ML researchers, ML engineers, data scientists) to scope and build new abstractions for scalable machine learning

We'd love to hear from you if you have:

    • Solid background in algorithms, data structures, system design
    • Experience with machine learning frameworks and libraries (PyTorch, Tensorflow)
    • At least 1 year of relevant work experience (new grads should apply to a separate job posting)

Bonus points!

    • Experience working with a cloud technology stack (AWS, GCP, Kubernetes)
    • Experience building machine learning training pipelines or inference services in a production setting
    • Experience with big data tools (Spark, Flink, Hadoop)
    • Experience in building scalable and fault-tolerant distributed systems

About Anyscale:
Anyscale provides an application development platform for developers to build distributed applications. We’re commercializing a popular open source project called Ray, which is a framework for distributed computing as well as an ecosystem of libraries for scalable machine learning. Our goal is to build a standardized platform for distributed computing. Ray was developed at UC Berkeley by Robert Nishihara and Philipp Moritz, under the guidance of Ion Stoica and Michael Jordan, and the four of them have co-founded Anyscale. The company raised a $20.6M Series A and a $40M Series B funding from Andreessen Horowitz (a16z), NEA, Foundation Capital, Intel Capital, Ant Financial, Amplify Partners, 11.2 Capital, and The House Fund.

With Ray, we're making it easy to program at any scale (from your laptop to the datacenter) by providing easy-to-use, general-purpose, and high-performance tools. In addition, we are building a rich ecosystem of libraries (for reinforcement learning, hyperparameter search, experiment management, machine learning training, prediction serving, and more) on top of the core distributed system so that users can rapidly build sophisticated applications. Help us build the future of software development.

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.