Evals Research Scientists, Research Engineers, and Research Lead
Evals Team /
Rolling Basis Applications: We are currently reviewing applications on a rolling basis, our active hiring round recently concluded. Please note that due to a high volume of very talented applicants, there may be a lag of several weeks between submission and follow-up from Apollo Research on your application. Thank you for your patience.
About The Role: The capabilities of current AI systems are evolving at a rapid pace, providing us with great opportunities and challenges. The challenges could stem from either deliberate misuse or the deployment of sophisticated, yet misaligned AI models. At Apollo Research, we are especially concerned with deceptive alignment, i.e. where a model outwardly seems aligned but is in fact misaligned, and conceals this fact from human oversight.
The potential ramifications of deceptively aligned advanced AI could be catastrophic. Additionally, deceptive alignment causes other behavioral evaluations to yield inaccurate results since the model will attempt to fool the tests. Thus, our current objective is to develop evaluations specifically tailored for deceptive alignment.
To evaluate models, we employ a variety of methods. Firstly, we intend to evaluate model behavior using basic prompting techniques and agentic scaffolding, similar to AutoGPT. Secondly, we aim to fine-tune models to study their generalization capabilities and elicit their dangerous potential within a safe, controlled environment (we have several security policies in place to mitigate potential risks). On a high level, our current approach to evaluating deceptive alignment consists of breaking down necessary capabilities and tracking how these scale with increasingly capable models. Some of these capabilities include situational awareness, stable non-myopic preferences, and particular kinds of generalization. In addition, we plan to build useful demos of precursor behaviors for further study.
We’re hiring Research Scientists, Research Engineers, and a Research Lead. Candidates are welcome to apply for multiple roles.
We welcome applicants of all races, genders, sexes, ages, abilities, religions, sexual orientations, regardless of pregnancy or maternity, marital status, or gender reassignment.
Apollo Research is a London-based technical AI safety organization. We specialize in auditing high-risk failure modes, particularly deceptive alignment, in large AI models. Our primary objective is to minimize catastrophic risks associated with advanced AI systems that may exhibit deceptive behavior, where misaligned models appear aligned in order to pursue their own objectives. Our approach involves conducting fundamental research on interpretability and behavioral model evaluations, which we then use to audit real-world models. Ultimately, our goal is to leverage interpretability tools for model evaluations, as we believe that examining model internals in combination with behavioral evaluations offers stronger safety assurances compared to behavioral evaluations alone.
Culture: At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.
We are looking for the following characteristics in a strong candidate
- For all skills, we don’t require a formal background or industry experience and welcome self-taught candidates.
- Large Language Model (LLM) steering: The core skill of any current evals role is steering LLMs. This can take many different forms such as: Prompting: eliciting specific behavior through clever word choice.Supervised fine-tuning: creating datasets and then fine-tuning models to improve a specific capability or to study aspects of learning/generalization. RL(HF/AIF): using other models, programmatic reward functions, or custom reward models as a source of feedback for fine-tuning an existing LLM.LLM scaffolding: chaining inputs and outputs from various models in a structured way, e.g. in an AutoGPT-style fashion.
- Empirical Research Experience: Especially for the scientist position, we’re looking for candidates with prior empirical research experience. This includes the design and execution of experiments as well as writing up and communicating these findings. Optimally, the research included working with LLMs. This experience can come from academia, industry, or independent research.
- Software Engineering: Model evaluators benefit from a solid foundation in software engineering. This can include developing APIs (ideally around LLMs or eval tasks), data science, system design, data engineering, and front-end development.
- Generalist: Most evals tasks require a wide range of skills ranging from LLM fine-tuning to developing frontend labeling interfaces. Therefore, we're seeking individuals with diverse skill sets, a readiness to acquire new skills rapidly, and a strong focus on results.
- Scientific Mindset: We think it is easy to over-interpret evals results and, thus, think a core skill of a good evals engineer or scientist is to keep track of potential alternative explanations for findings. Ideally, any candidate should be able to propose and test these alternative hypotheses in new experiments.
- Values: We’re looking for team members who thrive in a collaborative environment and are results-oriented. We believe in nurturing a workspace where you can thrive professionally while still having time outside of work for doing what matters to you, and be amenable to changes in your life circumstances and commitments. We value diversity of thought and backgrounds, and are strongly committed to proactively creating an inclusive culture where everyone feels comfortable and can do their best work.
- For Research Scientists, we’re looking for great LLM steering, strong experimental design skills, great writing and communication skills, and a good conceptual understanding of model evaluations and deceptive alignment.
- For Research Engineers, we particularly value candidates with strong software engineering and LLM steering abilities. Basic competencies in experimental design, data analysis, and plotting skills are essential.
- For Research Lead, we’re looking for someone with strong research experience, a proven track record in evaluating language models and research leadership experience. This includes clear communication and expectation setting, great scientific writing and a good personal fit with the existing team.
- We want to emphasize that people who don’t fulfill all of these characteristics but think they would be a great fit for the position nonetheless are strongly encouraged to apply. We understand that excellent candidates can come from a variety of backgrounds.
- Target Start Date: Depends on the candidate but within 4 months from the first interview is preferred.
- Time Allocation: Full-time
- Location: London office, sharing the building with the London AI safety hub (ex SERI MATS office). This is an in-person role. In rare situations, we may consider partially remote arrangements on a case-by-case basis.
- Work Visas: We will sponsor UK visas for people without current work permission.
- Rolling Basis Applications: We are currently reviewing applications. Please note that due to a high volume of very talented applicants, there may be a lag of several weeks between submission and follow-up from Apollo Research on your application. Thank you for your patience.
- Timelines: These are rolling applications. We aim to take 1 month from the first to the final interview but it strongly depends on the candidate's and our staff's availability.
- Private Medical Insurance
- Flexible work hours and schedule
- Unlimited vacation
- Unlimited sick leave
- Lunch, dinner, and snacks provided for all employees on work days.
- Paid work trips, including staff retreats, business trips, and relevant conferences
- A yearly $1,000 professional development budget.
- Our multi-stage process includes a screening interview, a take-home test (approx. 2 hours), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job. There are no leetcode-style general coding interviews. If you want to prepare for the interviews, we suggest working on hands-on LLM evals projects.
About the team
- The current evals team consists of Mikita Balesni and Jérémy Scheurer. Marius Hobbhahn (CEO) will be involved in an advising/guiding position. We hope to hire at least two more people for the team.
- You will mostly work with the evals team, but we are a small organization, so you will also interact with others, including Lee Sharkey (Research/Strategy Lead), Chris Akin (COO), Dan Braun (Lead Engineer), Lucius Bushnaq (Interpretability Researcher) and our policy advisors (tba).
Equality Statement: Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.
Questions: write to firstname.lastname@example.org
Apollo Research is a fiscally sponsored project of Rethink Priorities
Thank you very much for applying to Apollo Research.