Senior Site Reliability Engineer (SRE)
Engineering – Platform
Ambient.ai is a stealth AI company headquartered in Palo Alto on a mission to enable intelligent environments that are safe, efficient and sustainable. Our breakthrough technology combines cutting-edge deep learning with a contextual knowledge model to achieve human-like perception ability. Ambient's flagship product has been deployed by multiple Fortune 100 companies to solve a mission-critical problem in a way that has never been possible.
The company was founded in 2017 by experts in artificial intelligence from Stanford University who previously built iconic products at Apple, Google, Microsoft and Dropbox. We are a Series-A company backed by Andreessen Horowitz (a16z), SV Angel, YCombinator, and visionary angels like Jyoti Bansal, Mark Leslie and Elad Gil.
As a Site Reliability Engineer (SRE) on the Platform team, you will be responsible for building the infrastructure to improve the reliability and scalability of Ambient's core product. You'll collaborate with the rest of the engineering team to design the release and change management process, CI/CD pipelines, health monitoring systems and testing frameworks. You'll be instrumental in
- Design and implement process and tools for software release, deployment and change management.
- Own and maintain all operational aspects of Ambient's cloud services and edge servers, including AWS services such as ElasticSearch and Redis.
- Implement health monitoring systems and alerting using tools such as Prometheus / Grafana.
- Continuously improve the reliability of Ambient's services, and hence, our QoS, by following a data-driven approach.
- Enthusiastically participate in on-call rotations
- BS/MS in Computer Science or related field with at least 3+ years of experience in DevOps / Site Reliability.
- Strong understanding of web infrastructure, such as micro-services design pattern, pub-sub systems, databases (such as MySQL), web servers (such as nginx) and popular MVC frameworks such as Django.
- Experience with a distributed systems environment and ability to troubleshoot across the whole stack involving various independent services.
- Strong problem solving skills and endless desire for automation.
- Experience with Unix systems, shell scripting, Python (or other high-level languages such as C++ / Java) and tools such as Puppet, Chef, Ansible etc.
- Experience with AWS and / or GCP, especially services such as EC2, S3, ElasticSearch. RDS etc.
At Ambient.ai, we respect and admire the builders and the creators. Send us your most incredible project; we'd love to see.
Ambient.ai is proud to be an Equal Opportunity Employer.