Researcher: Expression of Interest

Berkeley
Open Positions: ARC Theory – Research /
Employee /
On-site
Please note: we paused hiring in early 2024, and expect to reopen hiring in the second half of the year. Please use this form to make an application for us to process once we reopen hiring.

What is ARC’s Theory team?
The Alignment Research Center (ARC) is a non-profit whose mission is to align future machine learning systems with human interests. The high-level agenda of the Theory team (not to be confused with the Evals team) is described by the report on Eliciting Latent Knowledge (ELK): roughly speaking, we’re trying to design ML training objectives that incentivize systems to honestly report their internal beliefs.

For the last year or so, we’ve mostly been focused on an approach to ELK based on formalizing a kind of heuristic reasoning that could be used to analyze neural network behavior, as laid out in our paper on Formalizing the presumption of independence. Our research has reached a stage where we’re coming up against concrete problems in mathematics and theoretical computer science, and so we’re particularly excited about hiring researchers with relevant background, regardless of whether they have worked on AI alignment before. See below for further discussion of ARC’s current theoretical research directions.

Who is ARC looking to hire?
Compared to our last hiring round, we have more of a need for people with a strong theoretical background (in math, physics or computer science, for example), but we remain open to anyone who is excited about getting involved in AI alignment, even if they do not have an existing research record.

Ultimately, we are excited to hire people who could contribute to our research agenda. The best way to figure out whether you might be able to contribute is to take a look at some of our recent research problems and directions:

- Some of our research problems are purely mathematical, such as these matrix completion problems – although note that these are unusually difficult, self-contained and well-posed (making them more appropriate for prizes).
- Some of our other research is more informal, as described in some of our recent blog posts such as Finding gliders in the game of life.
- A lot of our research occupies a middle ground between fully-formalized problems and more informal questions, such as fixing the problems with cumulant propagation described in Appendix D of Formalizing the presumption of independence.

What is working on ARC’s Theory team like?
ARC’s Theory team currently has 4 permanent team members, Mark Xu, Jacob Hilton, Eric Neyman and Dávid Matolcsi, alongside a varying number of temporary team members (recently anywhere from 0–3).

Most of the time, team members work on research problems independently, with frequent check-ins with their research advisor (e.g., twice weekly). The problems described above give a rough indication of the kind of research problems involved, which we would typically break down into smaller, more manageable subproblems. This work is often somewhat similar to academic research in pure math or theoretical computer science.

In addition to this, we also allocate a significant portion of our time to higher-level questions surrounding research prioritization, which we often discuss at our weekly group meeting. Since the team is still small, we are keen for new team members to help with this process of shaping and defining our research.

ARC shares an office with several other groups working on AI alignment such as Redwood Research, so even though the Theory team is small, the office is lively with lots of AI alignment-related discussion.

What are ARC’s current theoretical research directions?
ARC’s main theoretical focus over the last year or so has been on preparing the paper Formalizing the presumption of independence and on follow-up work to that. Roughly speaking, we’re trying to develop a framework for “formal heuristic arguments” that can be used to reason about the behavior of neural networks. This framework can be thought of as a confluence of two existing approaches:

- Mechanistic interpretability: uncertain and defeasible, but not machine verifable
- Formal proof: machine verifable, but strictly confident only
- Formal heuristic argument (our approach): uncertain and defeasible and machine verifiable

This research direction can be framed in a couple of different ways:
- As a formalization of mechanistic interpretability: Mechanistic interpretability is a research field seeking to reverse-engineer the weights of neural networks into human-understandable programs. A number of the field's central concepts, such as a “feature”, are currently defined informally. Putting the field onto more of a formal footing could bring clarity to the methods and goals of the field, remove the need to have humans or human-like systems in the loop, and elucidate how interpretability could be applied to solve downstream problems.
- As a way of dealing with out-of-distribution generalization failures: We think that a formal heuristic argument that explains a neural network’s training set performance could be used to flag new datapoints that trigger unusual behavior inside the model. We have been calling this approach “mechanistic anomaly detection”, since it can be thought of as a way to detect anomalies in the model’s internal activations at inference time. Further details are given in this blog post.

Hiring process
Our current interview process involves:
- 3-hour take-home test involving math and computer science puzzles
- 30-minute non-technical phone call
- 1-day onsite interview

We will compensate candidates for their time when this is logistically possible.

Employment details
ARC is based in Berkeley, California, and we would prefer people who can work full-time from our office, but we are open to discussing remote or part-time arrangements in some circumstances. We can sponsor visas and are H-1B cap-exempt.

We are accepting applications for both visiting researcher (1–3 months) and full-time positions. The intention of the visiting researcher position is to assess potential fit for a full-time role, and we expect to invite around half of visiting researchers to join full-time. We are also able to offer straight-to-full-time positions, but we anticipate that we will only be able to do this for people with a legible research track-record. We are currently only interested in applicants who can start before Dec 31, 2025.

Salaries are in the $150k–400k range for most people depending on experience.
Further information
If you have any questions about anything in this posting, please email hiring@alignment.org.

If you want to provide any feedback, you can use this form: https://forms.gle/DndeoBekS6ViyifW6