Senior DevOps Engineer

San Francisco /
Software – Infrastructure Engineering /
Full-time
The Software Infrastructure team is responsible for developing and delivering secure, scalable, highly-available services that support all of all.health’s  technology and services. In addition to supporting millions of user devices streaming data into our systems, we also run massive-scale systems used to power all.health's unique insight system and the machine learning and big data analytics platforms used to test and develop our next generation of algorithms and devices.

Responsibilities

    • Identify projects required to improve availability and reduce operational expense.
    • Design and implement systems to safely enable all.health's continued data growth without incurring incremental operational overhead or loss of availability.
    • Design and implement highly available services to reduce complexity in legacy systems and enable fault isolation and improved reliability.
    • Design and implement adjustments to the overall network and system infrastructure of our cloud services to improve security, flexibility and availability of the overall system.
    • Diagnose and resolve performance issues in complex systems to ensure reliable user-facing performance of the overall all.health system.
    • Participate in a 24/7 on-call rotation, responsible for the stable and reliable operation of all cloud services.
    • Perform postmortem analysis in response to operational incidents, analyzing system telemetry to drive continual improvement.
    • Participate and contribute in architectural discussions, reviews, improvements and future projects
    • Increase automation and environment agnostic infrastructure while keeping an eye on new tools and techniques when possible

Requirements

    • BA/BS in Computer Science or equivalent experience
    • Experience operating a complex highly available system
    • Experience with high volume services
    • Experience with distributed systems
    • Experience with Docker, Kubernetes, Azure and Google Cloud Platform.
    • Strong understanding of configuration management: Puppet, Salt Stack, Ansible
    • Experience with environment agnostic, transportable applications and infrastructure
    • Experience with complex deployments and data structures
all.health has developed a comprehensive preventative and proactive healthcare platform that combines clinical-grade sensors, machine learning, patient histories, insurance claims data, and other information to provide real-time at-risk screening for several disease conditions; these include acute respiratory infections such as COVID-19, and chronic conditions such as hypertension and diabetes. Contextualized 24/7 data along with clinician input and interventions will then be used to guide positive behavior changes. The premise is to catch various health conditions early and help reverse or manage the negative effects.