Generalist Software Engineer, Web Dev and ML focus

Berkeley

Engineering & Research /

Employee /

On-site

About us

ARC Evals does empirical research to determine whether frontier AI models pose a significant threat to humanity. It’s robustly good for civilization to have a clear understanding of what types of danger AI systems pose, and know how high the risk is. You can learn more about our goals from Beth’s talk.

Some highlights of our work so far:

- Establishing autonomous replication evals: Thanks to our work, it’s now taken for granted that autonomous replication (the ability for a model to independently copy itself to different servers, obtain more GPUs, etc) should be tested for. For example, labs pledged to evaluate for this capability as part of the White House commitments.

- Pre-release evaluations: We’ve worked with OpenAI and Anthropic to evaluate their models pre-release, and our research has been widely cited by policymakers, AI labs, and within government.

- Inspiring lab evaluation efforts: Multiple leading labs are building their own internal evaluation teams, inspired by our work.

- Early commitments from labs: Anthropic credited us for their recent Responsible Scaling Policy (RSP), and OpenAI recently committed to releasing a Risk-Informed Development Policy (RDP). These fit under the category of “evals-based governance”, wherein AI labs can commit to things like, “If we hit capability threshold X, we won’t train a larger model until we’ve hit safety threshold Y”.

We’ve been mentioned by the UK government, Obama, and others. We’re sufficiently connected to relevant parties (labs, governments, and academia) that any good work we do or insights we uncover can quickly be leveraged.

Background

We've built an interface for interacting with models and assessing their abilities at various tasks, having contractors write demo actions and rate model-generated actions.

The basic idea:

- do tasks as a sequence of steps, where at each step we have the model generate a number of options for the next action, then have either the model or the humans choose which action to go with

- let models do tasks in a factored-cognition style – delegating sub-tasks to other copies of the model – then be able to view the tree of tasks and see where models struggle and where they do well

- have contractors add their own options, and rate the quality of generated options

- tag steps with what abilities or difficulties they highlight

Screenshots:

1. http://res.cloudinary.com/lesswrong-2-0/image/upload/v1669820148/mirroredImages/svhQMdsefdYFDq5YM/zmwsyetcis9ra5pcqpc8.png

2. http://res.cloudinary.com/lesswrong-2-0/image/upload/v1669820148/mirroredImages/svhQMdsefdYFDq5YM/gxe7xiyr22tth9moocbr.png

Job Mission:

Improve the interface for interacting with models so we can generate data, evaluate model properties and understand our existing data better + faster.

Key Outcomes:

- Design and build new ways to interact with models and with our existing data so far

- Unlock discoveries by things like:

- Improve user experience and therefore speed and accuracy of data generation

- Automating the process of prompting the model, running model's commands, formatting task and delegating to different model, etc - allowing us to discover properties of the model we only see after getting a long way into some task

- Build tools to visualize and interact with data that lets us spot and understand patterns in the model's strengths and weaknesses

- Improve model performance on our tasks by improving the 'scaffolding' we're using to help LMs accomplish the tasks (e.g. automatically ask the model whether it's stuck, if so restart)

- Make our data pipeline clean and beautiful, assess data quality from our contractors, catch bugs and problems, and produce useful datasets for the alignment community.

- Stick with the project and make yourself useful on whatever needs doing if we realize we need to throw lots of our work away and do something different

Someday/maybe outcomes:

- Help us develop ideas and build tools for evaluating harder-to-define model properties related to alignment, agency and deception

Key Competencies:

Essential

- Strong coding ability. Able to rapidly prototype features and write clear, easy-to-extend code. We're currently using React, Typescript, Python, SQL, Flask.

- Good communication skills: Always asks for clarification if priorities are ambiguous. Good at pairing and teaching others how your code works.

- Learning, ownership, and 'scrappiness': Quick to pick up whatever skills and knowledge are required to make the project succeed. Keeps the overall aims of the project in mind. Not afraid to point out if something should be done differently, even if it's not part of their core responsibility.

Nice-to-have

- Design + data-viz skills: Generates novel ideas for improving usability and providing an interaction experience that unlocks insights about the data or the model

- ML skills: Able to do things like tune hyperparameters for a finetuning run, investigate scaling laws for a particular property of interest, or suggest techniques we could use to improve model performance on our dataset.

- Basic ML knowledge: Good understanding of how LLMs 'think', and what sorts of things will affect performance.

- Conceptual alignment thinking: help generate and evaluate ideas for how to probe alignment, deception, agency and other conceptually slippery properties.

Fit

Reasons this might be a bad fit for you

- There's some uncertainty about whether we're on the right track, and we expect to pivot sometimes. If that sounds stressful or demotivating, this might not be a good fit.

- You want an experienced/senior manager - Beth has limited managing experience

- You want to work within an established process that's set by someone else. We don't have detailed processes or philosophies of development/product/project management

- You want to work on challenging algorithmic problems or gritty GPU kernels. Most of the coding you'll be doing is likely to be pretty straightforward in some sense, we'll likely only be interacting with models via APIs.

Reasons this might be an especially good fit

- You're excited about working on a relatively new team/project, growing together, and helping shape it into something really awesome

- Interacting a lot with cutting-edge models sounds interesting and fun

- You find it motivating for your work to have a pretty clear and direct story for how it's relevant to x-risk.

Apply for this job