Inference Operations Engineer
Palo Alto, CA
At Scaled Inference our mission is to enable enterprises to reach their ultimate potential in the marketplace through the power of AI. Our flagship product Amp.ai is the first autonomous optimization platform that can drive sustained exponential business growth at any scale, offering unprecedented versatility and ease of deployment. We’re looking for versatile, talented individuals who align with our culture and vision to join us in bringing this innovative technology to market.
As an Inference Operations Engineer, you will be responsible for some of the core components of our infrastructure. As we continue to scale our growing engineering team you will be responsible for ensuring the stability and site reliability of our infrastructure. This may include, but is not limited to, improve existing infrastructure, rewriting scripts, debugging, etc. You will design and execute the tools and procedures necessary to coordinate and streamline the operations of our release processes, incident alerting, response procedures, performance and availability measurement, monitoring systems.
- Work closely with our systems engineering team on provisioning, monitoring, alerting, and management of new infrastructure
- Manage and troubleshoot live distributed systems, in both production and non-production environments
- Ensure our cloud storage (Google Cloud) is cost effective and our infrastructure on the cloud is secure
- Help setup and run quality assurance and regression testing infrastructure
- Bachelor's degree in Computer Science, Electrical Engineering, or related field
- Experience with a cloud service provider (AWS, Google Cloud, Microsoft Azure)
- Proficiency with a scripting language (Python, Bash, Ruby, etc.)
- Experience with provisioning tools, monitoring systems, and build systems
- Experience with Kafka and Elasticsearch is a plus
- Experience with security and authentication protocols, systems, and tools is a plus