Member of Technical Staff
The infrastructure team at OpenAI builds and runs the compute fabric powering OpenAI's results. Other teams build on this infrastructure for running and scaling up deep learning models, either self-serve or in collaboration with the infrastructure team. The infrastructure team runs our Kubernetes clusters, builds tools to improve experiment iteration time, helps design and run experiments across thousands of machines, and performs capacity planning and implementation. The work closely resembles infrastructure in a very large-scale startup.
We look for a track record of the following:
* Experience, designing, implementing, and running production services
* Comfort managing and monitoring large-scale cloud deployments, and writing tools and abstractions for others
* Willingness to debug problems across the stack, such as networking issues, performance problems, or memory leaks
In this role, you will work closely with and directly accelerate researchers, but don't need to become a machine learning expert yourself. We value people who can quickly obtain deep technical understanding of new domains, and enjoy being self-directed and identifying the most important problems to solve. Experience with high-performance computing or machine learning is a bonus.
We’re building safe Artificial General Intelligence (AGI), and ensuring it leads to a good outcome for humans. We believe that unreasonably great results are best delivered by a highly creative group working in concert.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.