(Senior) Machine Learning RE/RS

Berkeley
Open Positions – Engineering & Research /
Employee /
Hybrid
We are offering a $21k referral bonus for this role. You can refer people through our form, and it lists the terms of this bonus.

About METR

We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. Our work advances the science of AI measurement by understanding frontier AI systems' ability to complete complex tasks without human input, and directly executing those measurements to inform risk assessments and consensus within the AI industry, among policymakers, and the public. 

Our work has been cited by NIST, a previous US President, the UK Government, Nature, The New York Times, and Time Magazine. Through our work with leading AI labs, governments, and academia, we ensure that our insights can quickly be leveraged to promote the safe development of increasingly powerful AI systems. We believe it is robustly good for civilization to have a clear understanding of what types of danger AI systems pose and how high the risk is, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.

What We're Looking For
We’re looking for a combination of skills across “research science”, “research execution” and software engineering. You may not have all of these skills (for example, we don’t expect software engineering to be a large part of the role for narrowly focused researchers)

Research Science
- You have strong knowledge of relevant literature and general research good practice.
- You have good understanding of how particular projects fit into METR's overall mission - you are thinking about things like "how will this generalize to future models", or "how does this relate to alignment evals".
- You reliably notice important but subtle methodological limitations.
- You are undaunted by open-ended mandates - you can take a confusing or ill-posed question and produce insightful and helpful frameworks / proposals / results.
- You can write great papers.

Research execution
- You are an experienced executor/contributor; you are familiar with patterns of successful and unsuccessful execution in frontier ML research. You are undaunted by "I've never done this before" or even "no-one has done this before".
- Your total output is "team-sized" - you manage multiple people or run a project or are several times more productive as an IC than core staff.
- You are creative, ambitious and entrepreneurial. You work fast and are highly responsive and available. You can juggle many balls when it is useful.

Software Engineering
- You balance rapid prototyping with the creation of maintainable, scalable systems and make sound technical decisions.
- You lead large projects from ideation to delivery, balancing innovative ML solutions with reliable, high-quality code.
- You set high standards for system architecture, code quality, and maintainability, influencing broad software practices across the organization.
$250,000 - $450,000 a year
Foundational evaluations research:
- Identify the biggest limitations to current understanding of frontier model capabilities and propensities
- Generate and rapidly derisk new methodologies and frameworks that can move the field forward
- Ensure these are externally valid, connecting with our threat models and helping us better predict risk
- Publish these as useful artifacts (datasets, environments, papers, model organisms) that the field can build on
- Streamline methodologies for use in evaluation sprints or live dashboards

Evaluation sprints and iteration:
- As new models are developed, partner with labs to provide external oversight so that we have the ability to “sound the alarm” if risk levels are unacceptably high
- Develop new techniques on the fly to deal with unexpected capabilities, behaviors or features of models
- Spot subtle methodological flaws or missing evidence, ensuring our evaluations are trustworthy and rigorous
- Draw conclusions about overall levels of risk, and communicate these clearly
- Anticipate the methodologies and artifacts we’ll need to assess risk from future generations of models, and turn lessons from evaluation sprints into new research directions

Advising and presenting our work (optional)
- Brief key stakeholders, including policymakers and lab leadership, on our workAnswer questions and provide feedback on technical aspects of policy decisions, or help design governance mechanisms 
- For very experienced and exceptional researchers we are open to exploring paying much higher than this stated range.
Our Culture

Everyone at METR is extremely smart, motivated, and mission-driven. We believe our work can meaningfully shape humanity's future for the better, and we want to be the best people in the world doing this work. We have a tight-knit, collaborative research culture rooted in truth-seeking and integrity. We're fiercely committed to producing high-quality, trustworthy science. We're honest and transparent about our results, especially when they may go against the grain. We've earned trust as reliable partners who handle confidential information with care. We maintain a low-ego, drama-free environment focused on what matters.

Hybrid Requirements: Our technical team members are in our office in Berkeley 3-5 days/week. Please let us know in your application if this is a constraint. If you lack US work authorization and would like to work in-person (strongly preferred), we can likely sponsor a cap-exempt H-1B visa for this role.

We encourage you to apply even if your background may not seem like the perfect fit! We would rather review a larger pool of applications than risk missing out on a promising candidate for the position.

We are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.