Evals Research Scientist / Engineer

London
Evals Team /
Full-time /
On-site
APPLICATION DEADLINE: We are reviewing applications on a rolling basis and encourage early submissions.

ABOUT APOLLO RESEARCH:

The capabilities of current AI systems are evolving at a rapid pace. While these advancements offer tremendous opportunities, they also present significant risks, such as the potential for deliberate misuse or the deployment of sophisticated yet misaligned models. At Apollo Research, our primary concern lies with deceptive alignment, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight.

Our approach involves conducting fundamental research on interpretability and behavioral model evaluations, which we then use to audit real-world models. Ultimately, our goal is to leverage interpretability tools for model evaluations, as we believe that examining model internals in combination with behavioral evaluations offers stronger safety assurances compared to behavioral evaluations alone.

In our evaluations, we focus on LM agents, i.e. LLMs with agentic scaffolding similar to AutoGPT or SWE agent. We also fine-tune models to study their generalization capabilities and elicit their dangerous potential within a safe, controlled environment (see our security policies). We’re looking for a research scientist/engineer who is excited to work on these and similar projects. 

At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.

We welcome applicants of all ethnicities, genders, sexes, ages, abilities, religions, sexual orientations, regardless of pregnancy or maternity, marital status, or gender reassignment.

THE ROLE. The Evals team focuses on the following efforts:

    • Conceptual work on safety cases for scheming.
    • Building evaluations for deceptive alignment-related properties, such as situational awareness or deceptive reasoning. 
    • Conducting evaluations on frontier models and publishing the results either to the general public or a target audience such as AI developers or governments.
    • Creating model organisms and demonstrations of behavior related to deceptive alignment, e.g. exploring the influence of goal-directedness on scheming.
    • Building a high-quality software stack to support all of these efforts.

CANDIDATE CHARACTERISTICS. We are looking for the following characteristics in a strong candidate. For all skills, we don’t require a formal background or industry experience and welcome self-taught candidates.

    • Large Language Model (LLM) Steering: The core skill of our evals research scientist role is steering LLMs. This can take many different forms, such as: 
    •  Prompting: eliciting specific behavior through clever word choice.
    •  LM agents & scaffolding: chaining inputs and outputs from various models in a structured way, e.g. in an AutoGPT-style fashion such that they are more goal-directed and agentic.
    • Supervised fine-tuning: creating datasets and then fine-tuning models to improve a specific capability or to study aspects of learning/generalization. 
    •  RL(HF/AIF): using other models, programmatic reward functions, or custom reward models as a source of feedback for fine-tuning an existing LLM.
    •  Fluent LLM usage: With increasing capabilities, we can use LLMs to speed up all parts of our pipeline. We welcome candidates who have integrated LLMs into their workflow.

    • Software Engineering: Model evaluators benefit from a solid foundation in software engineering. This can include developing APIs (ideally around LLMs or eval tasks), data science, system design, data engineering, and front-end development.
    • Empirical Research Experience: We’re looking for candidates with prior empirical research experience. This includes the design and execution of experiments as well as writing up and communicating these findings. Optimally, the research included working with LLMs. This experience can come from academia, industry, or independent research.
    • Generalist: Most evals tasks require a wide range of skills ranging from LLM fine-tuning to developing frontend labeling interfaces. Therefore, we're seeking individuals with diverse skill sets, a readiness to acquire new skills rapidly, and a strong focus on results.
    • Scientific Mindset: We think it is easy to overinterpret evals results and, thus, think a core skill of a good evals engineer or scientist is to keep track of potential alternative explanations for findings. Ideally, any candidate should be able to propose and test these alternative hypotheses in new experiments.
    • Values: We’re looking for team members who thrive in a collaborative environment and are results-oriented. We believe in nurturing a workspace where you can thrive professionally while still having time outside of work to do what matters to you and be amenable to changes in your life circumstances and commitments. We value diversity of thought and backgrounds and are strongly committed to proactively creating an inclusive culture where everyone feels comfortable and can do their best work.

    • We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position nonetheless are strongly encouraged to apply. We believe that excellent candidates can come from various backgrounds and are excited to give you opportunities to shine.

LOGISTICS:

    • Start Date: Target of September/October 2024
    • Time Allocation: Full-time
    • Location: Our office is in London, and the building is shared with the London Initiative for Safe AI (LISA) offices. This is an in-person role. In rare situations, we may consider partially remote arrangements on a case-by-case basis.
    • Work Visas: We will sponsor UK visas for people who currently don’t have UK work permission.

BENEFITS:

    • Private Medical Insurance
    • Flexible work hours and schedule
    • Unlimited vacation
    • Unlimited sick leave
    • Lunch, dinner, and snacks provided for all employees on work days
    • Paid work trips, including staff retreats, business trips, and relevant conferences
    • A yearly $1,000 professional development budget
ABOUT THE EVALS TEAM: The current Evals team consists of Mikita Balesni, Jérémy Scheurer, Alex Meinke, and Rusheb Shah. Marius Hobbhahn currently leads the Evals team. We’re a small and tight-knit team, so you will likely interact with all members of the Evals team regularly. You will mostly work with the Evals team, but you will likely sometimes interact with the Interpretability team, e.g. for white-box evaluations, and with the Governance team to translate technical knowledge into concrete recommendations. You can find our full team here.

APPLICATION DEADLINE: We are reviewing applications on a rolling basis and encourage early submissions.

ABOUT THE INTERVIEW PROCESS: Our multi-stage process includes a screening interview, a take-home test (approx. 150 minutes), 3 technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to the candidate's tasks on the job. There are no leetcode-style general coding interviews. If you want to prepare for the interviews, we suggest working on hands-on LLM evals projects (e.g. as suggested in our starter guide).



This role is eligible for AI Futures, a UK Government program designed to help the next generation of AI organizations attract global talent to the UK. Successful international candidates may be able to get up to £10,000 to meet relocation costs, subject to terms and conditions. Apollo Research will handle the logistics of such expenses.