Senior Site Reliability Engineer

Palo Alto, CA /
Engineering /
Full Time
About Instrumental:
Instrumental is creating the future of manufacturing, empowering hardware companies to optimize their factories through the use of artificial intelligence. Thanks to Instrumental, companies can:
- increase yields (the percent of goods passing inspection);
- decrease dark yield (the escape of goods that should have failed inspection); and,
- trace everything (to narrowly scope recalls to only those goods affected).

Our product is delighting customers, who now have access to technology that produces results previously unattainable. The Instrumental customer list is growing across a diverse set of manufacturing applications, and we are asking you to help us scale to the needs of that entire market.

We’re a small but mighty team that consistently works collaboratively, are supportive of each other, and are all highly energized by the opportunity for such a large impact. We actively work to promote an inclusive environment.

About the Role:
Instrumental is seeking a Site Reliability Engineer / DevOps Engineer that will build, maintain, and optimize infrastructure to ensure data collected from our systems in factories around the world is accessible and protected in the process. 

Our cloud infrastructure is built within AWS and our in-factory systems are a distributed fleet of Linux machines.

What you bring to the table: 
- Experience with building and scaling systems in a public cloud (AWS preferred) environment
- Hands-on experience with Linux administration in a distributed systems environment with an emphasis on security
- Computer networking experience, including TCP, DNS, NAT, routing, firewalls, and VPNs
- Familiarity with orchestration and monitoring tools like Docker, Terraform, Cloudwatch and Datadog
- Ability to effectively communicate complex ideas to technical and non-technical individuals alike
- Ability to mentor others and influence architectural decisions with a focus on security, scalability, and high performance

What you can expect in this role: 
- System monitoring and maintenance for our cloud infrastructure and current in-factory systems deployed around the world
- Becoming the go-to person for technical operational questions for our equipment and network solutions
- Develop and maintain scripts to ensure infrastructure is up, running, and remotely configurableResearching the problems we encounter and developing potential solutions to them, including but not limited to: AWS GovCloud, Great Firewall of China, VPN solutions, bandwidth issues, and end-to-end encryption
- Working cross-functionally with software and hardware engineering teams to identify new station and deployment requirements
- Participate in the current on-call rotations with rest of the team
- Iterating on our security initiatives
- Automating tasks to ensure maximum uptime as our offering evolves


This position requires candidates to be a U.S. Citizen, Permanent Resident Alien, or Protected Individual per 8 U.S.C. 1324b(a)(3).