Interpretability Research Scientists and Research Engineers

Interpretability Team /
Full-time /
Rolling Basis Applications: We are currently reviewing applications on a rolling basis, our active hiring round recently concluded. Please note that due to a high volume of very talented applicants, there may be a lag of several weeks between submission and follow-up from Apollo Research on your application. Thank you for your patience.

About The Role: We’re particularly worried about AI that is strategically deceptive, i.e. where high-level aspects of its internal cognition, such as its plans and goals, are intentionally not accurately reflected in its external behavior during observation. To address this failure mode, we want to understand how NNs “think”. In the long run, we aim to explain and perhaps reverse engineer arbitrary mechanisms of arbitrary neural networks. 

We’re pursuing a new approach to mechanistic interpretability that we’re not yet publicly discussing due to potential infohazard concerns. However, we expect the day-to-day work of most scientists and engineers to be comparable to existing public interpretability projects such as sparse coding, indirect object identification, causal scrubbing, toy models of superposition, or transformer circuits, as well as converting research insights into robust tools that can scale to very large models.

We are currently hiring Research Scientists and Research Engineers. In practice, the boundaries between these roles can be fluid as Research Scientists engage in programming and Engineers participate in experimental design. The definition of these roles may evolve over time, and we welcome applications for both positions.

We’re hiring Research Scientists and Research Engineers. Candidates are welcome to apply for multiple roles. 

We welcome applicants of all races, genders, sexes, ages, abilities, religions, sexual orientations, regardless of pregnancy or maternity, marital status, or gender reassignment.

Apollo Research is a London-based technical AI safety organization. We specialize in auditing high-risk failure modes, particularly deceptive alignment, in large AI models. Our primary objective is to minimize catastrophic risks associated with advanced AI systems that may exhibit deceptive behavior, where misaligned models appear aligned in order to pursue their own objectives. Our approach involves conducting fundamental research on interpretability and behavioral model evaluations, which we then use to audit real-world models. Ultimately, our goal is to leverage interpretability tools for model evaluations, as we believe that examining model internals in combination with behavioral evaluations offers stronger safety assurances compared to behavioral evaluations alone.

Culture: At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.

We are looking for the following characteristics in a strong candidate

    • For all skills, we don’t require a formal background or industry experience and welcome self-taught candidates. Essential qualities for both Research Engineers and Scientists include:

    • Solid ML engineering: The ability to create, manipulate, and analyse modern machine learning models, especially transformers, accurately and quickly. Furthermore, we value having experience with NN model training (familiarity with distributed training would be a bonus).
    • A good understanding of Linear Algebra: Neural Networks are Linear Algebra machines. Therefore, almost all of the daily quantities an interpretability researcher/engineer will be working with require an intuitive understanding of Linear Algebra. We also think that this is a good predictor of whether a candidate will be able to contribute to our agenda in the long run even when the exact agenda changes.
    • Some prior experience in interpretability: This can include academic research, industry experience, or independent research as part of a program like SERI MATS, ARENA, the AI safety camp, an individual grant, and more. Prior experience is not strictly necessary but we value even small non-public projects to get a better feeling for how you approach interpretability.
    • A scientific mindset: We’re looking for candidates who are really trying to understand how NNs work and who’re looking to build or already have a solid mechanistic model of NNs. This can include good intuitions about how to design experiments and looking for ways in which your current hypotheses could be wrong. 
    • Values: We’re looking for team members who thrive in a collaborative environment and are results-oriented. We believe in nurturing a workspace where you can thrive professionally while still having time outside of work for doing what matters to you, and be amenable to changes in your life circumstances and commitments. We value diversity of thought and backgrounds, and are strongly committed to proactively creating an inclusive culture where everyone feels comfortable and that they can do their best work.

    • For Research Scientists, we are especially looking for a track record of empirical research, good Linear Algebra skills, solid writing abilities, and the ability to communicate ideas clearly.
    • For Research Engineers, we’re especially looking for people who are solid at software engineering and have hands-on experience with NNs. Basic experimental design skills, basic data analysis skills, and plotting skills are expected. 

    • We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position nonetheless are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds, and are excited to give you opportunities to shine. 


    • Target Start Date: Depends on the candidate but within 4 months from the first interview is preferred.
    • Time Allocation: Full-time
    • Location: London office, sharing the building with the London AI safety hub (ex SERI MATS office). This is an in-person role. In rare situations, we may consider partially remote arrangements on a case-by-case basis.
    • Work Visas: We will sponsor UK visas for people without current work permission.
    • Rolling Basis Applications: We are currently reviewing applications. Please note that due to a high volume of very talented applicants, there may be a lag of several weeks between submission and follow-up from Apollo Research on your application. Thank you for your patience.
    • Timelines: These are rolling applications. We aim to take 1 month from the first to the final interview but it strongly depends on the candidate's and our staff's availability.

Employee Benefits

    • Private Medical Insurance
    • Flexible work hours and schedule
    • Unlimited vacation
    • Unlimited sick leave
    • Lunch, dinner, and snacks provided for all employees on work days
    • Paid work trips, including staff retreats, business trips, and relevant conferences
    • A yearly $1,000 professional development budget

Interview Process

    • Our multi-stage process includes a screening interview, a take home test (approx. 2 hours), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job. There are no leetcode-style general coding interviews. If you want to prepare for the interviews, you can work on hands-on interpretability projects.

About The Team

    • The current Interpretability team consists of Lucius Bushnaq (Interpretability Researcher), Dan Braun (Lead Engineer), and Lee Sharkey (Research/Strategy Lead). We hope to hire at least two more people for the team. 
    • You will mostly work with the Interpretability team, but we are a small organization, so you will also interact with others, including Marius Hobbhahn (CEO), Chris Akin (COO), Mikita Balesni (Evals Researcher) Jérémy Scheurer (Evals Researcher), and our policy advisors (tba).
Equality Statement: Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation. 

Questions: write to

Apollo Research is a fiscally sponsored project of Rethink Priorities

Thank you very much for applying to Apollo Research.