Software Engineer, Platform
San Francisco /
Applied AI research and engineering team /
Join the API team, a core group who work to bring OpenAI's technology to the world in partnership with other organizations.
We are looking for a self-starter engineer who loves building and running production systems. In this role, you will build the systems that power a breadth of production ML use cases. You’ll also work closely with and directly accelerate machine learning researchers, but don't need to be a machine learning expert yourself. We value people who can quickly obtain a deep technical understanding of new domains, and enjoy being self-directed and identifying the most important problems to solve.
We look for a track record of the following:
- Experience designing, implementing, and running production services.
- Comfort managing and monitoring infrastructure deployments.
- Willingness to debug problems across the stack, such as networking issues, performance problems, or memory leaks.
- Experience with high-performance computing or infrastructure orchestration tools are a bonus.
- While we don't require machine learning expertise, it's a bonus to find someone with experience in tools of high performance computing, machine learning optimizations, and model scaling. This could include experience with mpi, NCCL, CUDA kernels, model and parameter sharding, Infiniband, and GPU hardware. Ultimately it is more important one has a willingness to explore, understand systems, and learn how it all ties together. We are constantly discovering new capabilities in our models. Turning those discoveries into safe and performant production systems requires a generalist mindset and curiosity.
You might be a good fit if you:
- Are self-directed and enjoy figuring out the most important problem to work on.
- Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.
- Know your way around a Unix shell.
- Build tools to accelerate your own workflows, but only when off-the-shelf solutions would not do.
- Have been a startup founder or an early-stage engineer.
- Enjoy fast paced work environment with tight feedback loops.
On this team you will be expected to balance building new components while improving the foundation on which it all runs. The codebase is currently a mix of Go and Python. Our services run on multiple Kubernetes clusters, and the clusters are versioned in Terraform. We always aim to deploy reliable systems with low error rates and redundancy built in. We’re only getting started building the API, and would like you to join to help build the rest of it!
We’re building safe Artificial General Intelligence (AGI), and ensuring it leads to a good outcome for humans. We believe that unreasonably great results are best delivered by a highly creative group working in concert. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
This position is subject to a background check for any convictions directly related to its duties and responsibilities. Only job-related convictions will be considered and will not automatically disqualify the candidate. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodations via email@example.com.
- Health, dental, and vision insurance for you and your family
- Unlimited time off (we encourage 4+ weeks per year)
- Parental leave
- Flexible work hours
- Lunch and dinner each day
- 401(k) plan with matching