Site Reliability Engineer

Virtual /
Engineering – Infrastructure /
Full-Time
Are you interested in building the technical foundation of the worldwide transition to clean energy? Do you enjoy working with a highly motivated and talented team to deliver mission critical software? Voltus is growing our Site Reliability Engineering [or “Platform”] team to help deploy, manage, troubleshoot, and enhance our Platform and tools for its internal and external customers.

As a Site Reliability Engineer you will be responsible for deploying and maintaining our core Platform, which consists of Hashicorp’s Nomad, Consul, and Vault systems in AWS. In addition, you will help manage and maintain our monitoring systems, which currently include Prometheus and Datadog. 

You will build innovative automated solutions and tools to help debug and resolve problems in production and prevent them from recurring. Further, you will proactively seek out system weaknesses and find ways to fix them before they cause production issues using monitoring data, logs, and watching trends.


Responsibilities

    • Keeping our core Platform (Nomad, Consul, Vault) up and running and performing optimally.
    • Working closely with internal partners and teams to ensure that we ship software that meets security, SLA, and performance requirements.
    • Writing, updating, and using documentation, including runbooks/playbooks
    • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
    • Debugging complex problems across an entire stack and creating solid solutions
    • Developing CI/CD processes to improve cadence.

Required Skills and Attributes

    • 5 years experience with software engineering, software development, or system operations.
    • Excellent communication skills, both verbal and written.
    • Knows their way around a Unix/Linux shell, can write shell scripts, and understands Linux internals.
    • Experience debugging complex problems.
    • Experience designing, building, and operating large-scale production systems
    • Knows Python, Java, Go, Rust, or similar.
    • Understands networking and messaging, especially between services.
    • Has hands-on experience using source control (Git, GitHub) and feature branching strategies.
    • Has experience with a variety of open-source databases (MySQL, Postgres, Redis, Cassandra, etc.).
    • Intellectually curious and always wants to learn more.

Preferred Skills and Attributes

    • Experience with DevOps engineering or SRE.
    • Experience with the Hashistack (Vault, Consul, Nomad).
    • Experience with containers, such as with Docker.
    • Experience with monitoring and observability such as with Datadog, Prometheus or similar.
    • Build systems such as Make, Bazel, or similar.
    • Experience automating infrastructure, testing, and deployments using tools like Ansible, Chef, or Terraform and can explain the Infrastructure as Code paradigm.
    • Understands the difference between provisioning and configuration management.
At Voltus, we are proud to be an equal opportunity employer because we recognize that a diverse organization begins with a diverse candidate pool. This means we do not tolerate discrimination of any kind and are committed to providing equal employment opportunities regardless of your gender identity, race, nationality, religion, age, sexual orientation, veteran status, disability status, or marital status.