Software Engineer, Site Reliability (remote)

Santa Monica, California /
Engineering /
Who we are

We work at the intersection of deep technology, science and experience design. Sensible is built to help consumers and businesses understand, plan for, and mitigate all types of climate and weather risk, from travel and events to homeownership and energy production. Our first product embeds with travel and outdoor events partners, offering their customers a guarantee against the weather. This means a customer can have confidence that they will have a great time in the sun, otherwise they'll get money back!

We recognize that we're living in a world with more climate disruption than ever before. We also believe that it is one of unprecedented opportunity for solutions.

With rich data from satellites and other developing technologies, we have the right information, engineering and technology to help us relate to our environment with a new kind of awareness and understanding.

Sensible is a team built on trust, feedback, and communication. We recognize that diversity of background, skills, and experiences makes stronger teams, and we are therefore an equal opportunity employer.

What you'll be working on

    • Coordinate with engineering and product leaders to maintain a working roadmap for  business systems reliability and developer experience improvements and projects
    • Document and maintain SRE best practices
    • Maintain existing cloud based infrastructure including AWS resources and Kubernetes clusters
    • Maintain and improve monitoring, logging, and instrumentation/tracing systems
    • Implement and improve observability, alerting, on-call systems and procedures
    • Improve and implement CI/CD practices and pipelines for deploying containerized apps
    • Improve and implement monitoring for basic cloud security concerns including AWS/Kubernetes access management, endpoint security, and obfuscation of sensitive information

Required Qualifications

    • A bachelor's degree in a STEM related field, or equivalent industry experience
    • Commitment to the spirit of continuous improvement
    • Flexibility around working hours in order to maintain high systems availability
    • Experience and comfort working with the following technologies or their equivalent:
    • AWS: IAM, VPC, EC2, Routing/Security, EKS, S3, ALB/NLB, RDS/Aurora
    • Kubernetes: Cluster management, deployments/services/pods, autoscaling, metrics, ingress, certificate management
    • Observability: SLOs/SLAs, SLIs/KPIs, metrics
    • CI/CD: Github actions or another common CI system like Circle, Travis, AWS Codepipeline, etc…
    • Programming: an imperative language like Python, Node, Go, Java, and/or Rust
    • Tooling: Terraform, Docker, AWS Cloudformation, Git

Desired Qualifications

    • Experience with developing custom event-based pipelines for CI/CD and/or systems automation/management
    • Experience with creating custom SlackOps integrations for systems notifications and administration
    • Demonstrated ability to create basic internal tool webapps to facilitate things like configuration management, deployments, security, and/or monitoring systems
    • Experience maintaining system reliability in high-traffic environments - 10000+ requests/minute
    • Experience being on, maintaining, and shepherding On Call rotations.