Software Engineer, Machine Learning Systems

Menlo Park
R&D /
Full-Time Employment /
On-site
We are looking for a machine learning systems expert enthusiastic about engaging with all facets of the ML system stack. We’re looking for someone who is eager to traverse the entire ML system stack, iterate fast on building new ML cloud systems, and is hungry to build and own enormous contributions.

About the role:

    • ML System engineers in our team are responsible for one or more of the following
    • Deployment and management of high-performing compute clusters.
    • Enhancing inference and training performance through optimizations across the system stack, encompassing high-level mechanisms such as queuing and scheduling, medium-level optimizations within inference and training engines, and low-level optimizations targeting GPU kernel efficiency.

Qualifications:

    • Experience building and rapidly prototyping production cloud-based software
    • Demonstrated fluency with data structures, algorithms, architecture, and agile software best practices in any language
    • Experience in Python and C++/Rust
    • Understanding of the latest technologies in LLMs, like LoRa, Mamba, etc.
    • Understanding or willingness to learn about the entire system stack
    • Desire to work in an inclusive and collaborative environment
    • An interest in continually learning from others, teaching others, and digging into new challenges

Nice to have:

    • Desire to create speed of light training and inference systems for next-generation AI
    • Deep technology expertise in machine learning systems, e.g. TinyML, Triton, CUDA, ROCm, Exo, MLIR, Halide, etc
We believe in hiring passionate individuals who believe in the AI revolution to make software accessible to all. If you’re excited about this role but are not sure if your past experience aligns perfectly, we still encourage you to apply and meet with us.