Lead/Principal DevOps Engineer
San Francisco /
Software – Infrastructure Engineering /
The Software Infrastructure team is responsible for developing and delivering secure, scalable, highly-available services that support all of all.health’s technology and services. In addition to supporting millions of user devices streaming data into our systems, we also run massive-scale systems used to power all.health's unique insight system and the machine learning and big data analytics platforms used to test and develop our next generation of algorithms and devices.
This position will be remote/WFH during the Covid-19 Pandemic, but will be based in SF when offices reopen in 2021.
- Identify projects required to improve availability and reduce operational expense.
- Design and implement systems to safely enable all.health's continued data growth without incurring incremental operational overhead or loss of availability.
- Design and implement highly available services to reduce complexity in legacy systems and enable fault isolation and improved reliability.
- Design and implement adjustments to the overall network and system infrastructure of our cloud services to improve security, flexibility and availability of the overall system.
- Diagnose and resolve performance issues in complex systems to ensure reliable user-facing performance of the overall all.health system.
- Participate in a 24/7 on-call rotation, responsible for the stable and reliable operation of all cloud services.
- Perform postmortem analysis in response to operational incidents, analyzing system telemetry to drive continual improvement.
- Participate and contribute in architectural discussions, reviews, improvements and future projects
- Increase automation and environment agnostic infrastructure while keeping an eye on new tools and techniques when possible
- BA/BS in Computer Science or equivalent experience
- Experience operating a complex highly available system
- Experience with high volume services
- Experience with distributed systems
- Experience with Docker, Kubernetes, Azure and Google Cloud Platform.
- Strong understanding of configuration management: Puppet, Salt Stack, Ansible
- Experience with environment agnostic, transportable applications and infrastructure
- Experience with complex deployments and data structures