Machine Learning Engineering Lead

Engineering /
Full Time
/ On-site
We're conducting interviews on a rolling basis for this role.

As ML Engineering Lead, you will ideally have deep technical knowledge solving practical engineering problems and running large language models on high performance computers, and experience managing and scaling a team of engineers in a fast-paced environment. Understanding AI alignment and the core mission of our work at Conjecture is a significant benefit.

You will oversee the team of research engineers responsible for building our core technologies, and work alongside the CTO in making decisions on overall engineering efforts and alignment strategies. You will help improve our core infrastructure, improve inference and build speed and improve our tooling capabilities. This role benefits from deep industry experience.

List of responsibilities may include:

    • Working on large-scale ML frameworks that train models in parallel across many machines.
    • Building internal tooling for model inference, visualisation, and infrastructure.
    • Implementing new models or optimisation techniques from research papers.
    • Building large-scale datasets.
    • Manage research engineers, train and give feedback through code reviews or other mechanisms.
    • Implement and improve code review systems and best practices.

You might be a good fit for this role if:

    • You are able to solve small, isolated problems like bugs in code, as well as grapple with large meta-level problems, such as epistemic strategies and research agendas.
    • You have a deep understanding of performance in HPC workloads, have worked with large GPU clusters, and ideally with some modern ML frameworks (e.g., PyTorch, Jax)
    • You are good at collaboration and teamwork - many of our projects are large engineering efforts that involve most or all of the team. Alongside this you care about improving the technical skill of your team and can hire, coach and manage them.
    • You care about the impact of your work on the longterm future of humanity and creating safe and beneficial AI.

Experience with the following would be a bonus:

    • Modern methods for distributed model training for large language models (such as pipeline and data parallelism) 
    • Breadth of knowledge across a few areas of systems research (networking, operating systems, programming languages)
    • Familiarity with CUDA programming and GPU internals
    • Managed ML teams in the past, solid understanding of how to build a good technical culture.