Inference Operations Engineer
Palo Alto, CA
At SI we are building Amp.ai, the world’s first autonomous AI platform for self optimizing software. The Amp.ai console offers a new paradigm within Machine Learning, making it truly seamless, and accessible for applications. Our product offers unprecedented versatility, ease of deployment, and we’re looking for versatile, talented individuals who align with our culture and vision to join us in bringing this innovative technology to market.
As an Inference Operations Engineer, you will be responsible for some of the core components of our infrastructure. As we continue to scale our growing engineering team you will be responsible for ensuring the stability and site reliability of our infrastructure. This may include, but is not limited to, improve existing infrastructure, rewriting scripts, debugging, etc. You will design and execute the tools and procedures necessary to coordinate and streamline the operations of our release processes, incident alerting, response procedures, performance and availability measurement, monitoring systems.
- Work closely with our systems engineering team on provisioning, monitoring, alerting, and management of new infrastructure
- Manage and troubleshoot live distributed systems, in both production and non-production environments
- Ensure our cloud storage (Google Cloud) is cost effective and our infrastructure on the cloud is secure
- Help setup and run quality assurance and regression testing infrastructure
- Bachelor's degree in Computer Science, Electrical Engineering, or related field
- Experience with a cloud service provider (AWS, Google Cloud, Microsoft Azure)
- Proficiency with a scripting language (Python, Bash, Ruby, etc.)
- Experience with provisioning tools, monitoring systems, and build systems
- Experience with Kafka and Elasticsearch is a plus
- Experience with security and authentication protocols, systems, and tools is a plus