Member of Technical Staff
Our infrastructure team manages our data center, and high-performance computing clusters. This includes running and scaling Kubernetes, deploying on-prem hardware, capacity planning, and working with other teams on experiment and tooling design. See our recent blog post (https://blog.openai.com/scaling-kubernetes-to-2500-nodes/) to get a sense of what kind of challenges we solve in our day-to-day work. This position closely resembles infrastructure/DevOps in a very large-scale startup.
We look for a track record of the following:
* Experience, designing, implementing, and running production services
* Comfort managing and monitoring large-scale infrastructure deployments
* Willingness to debug problems across the stack, such as networking issues, performance problems, or memory leaks
In this role, you will work closely with and directly accelerate researchers, but don't need to become a machine learning expert yourself. We value people who can quickly obtain deep technical understanding of new domains, and enjoy being self-directed and identifying the most important problems to solve. Experience with high-performance computing, or open-source contributions are a bonus.
We’re building safe Artificial General Intelligence (AGI), and ensuring it leads to a good outcome for humans. We believe that unreasonably great results are best delivered by a highly creative group working in concert.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.