Senior Site Reliability Engineer (Remote)

United States /
Engineering /
Full-time
Mattermost is an open source platform for secure collaboration across the entire software development lifecycle. Hundreds of thousands of developers around the globe trust Mattermost to increase their productivity by bringing together team communication, task and project management, and workflow orchestration into a unified platform for agile software development. 

Founded in 2016, Mattermost’s open source platform powers over 800,000 workspaces worldwide with the support of over 4,000 contributors from across the developer community. The company serves over 800 customers, including European Parliament, NASA, Nasdaq, Samsung, SAP, United States Air Force and Wealthfront, and is backed by world-class investors including Battery Ventures, Redpoint, S28 Capital, YC Continuity. To learn more, visit www.mattermost.com.

We value high impact work, ownership, self-awareness and being focused on customer success. If these values match who you are, we hope you'll learn more about working at Mattermost and apply!

We are looking for an engineer with demonstrated experience in software development and infrastructure using Kubernetes. You will be ensuring high reliability and scaling of Mattermost’s new SaaS offering through building tools, deploying infrastructure and automation in Kubernetes.

Here is some of the challenges and work of SRE team:
- Monitoring Cloud Environments at Scale with Prometheus and Thanos
- How We Use Sloth to do SLO Monitoring and Alerting with Prometheus
- Automate EKS Node Rotation for AMI Releases

Responsibilities:

    • Build services and tools to ensure the stability of Mattermost’s SaaS offering
    • Define infrastructure in code with IaC tools like Terraform
    • Write thoughtful and high-quality code in Go
    • Follow our engineering best practices, and ensure alignment with our Leadership Principles
    • Provide technical mentorship for fellow engineers
    • Develop services to handle automatic recovery from incidents and disasters
    • Automate incident or disaster simulations to identify blindspots
    • Set technical vision and innovate to be on the forefront of self-healing SaaS services
    • Implement, maintain and tune monitoring and alerting systems
    • Deploy applications to and manage Kubernetes clusters
    • Participate in our on-call rotation to respond to incidents and resolve problems.

Required Background/Skills:

    • Bachelor's degree in Computer Science or related fields, or significant professional DevOps or SRE experience
    • 5+ years of previous experience as a developer or SRE with operational responsibilities
    • Proven experience responding on-call to incidents with superior knowledge of incident response processes
    • Strong skills and experience working with Kubernetes inside and out
    • Strong skills and experience working with infrastructure as code tools, such as Terraform
    • Solid programming skills and experience with or an ability to quickly become proficient in Go
    • Familiarity with container systems such as Kubernetes & Docker
    • Familiarity with GitOps and Chaos Engineering
    • Ability and willingness to be on-call

Preferences:

    • Experience with distributed application systems using HTTP, WebSockets, RPC, pub/sub, etc. at scale
    • Open source contributions to related projects
    • Knowledge of Grafana and Prometheus suite
    • Comfortable with GitHub, Jira, Jenkins, CircleCI
    • Experience with WebRTC for real-time communication architectures
    • Experience working in open source communities
Mattermost is a remote-first company with staff living and working across the globe. We are currently hiring staff in these countries/regions:

Australia - Brazil - Canada - Chile - Finland - Georgia - Germany - Greece - India - Ireland - Mauritius - Mexico - Pakistan - Philippines - Poland - South Africa - Turkey - Ukraine - Uganda - United Kingdom - United States

We are constantly working towards adding more countries/regions to this list, but first we need to make sure we are compliant with local laws and regulations, which takes time. 

Mattermost is made up of people from a wide variety of backgrounds and lifestyles. We embrace diversity and invite applications from people from all walks of life. We don't discriminate against staff or applicants based on gender identity or expression, sexual orientation, race, religion, age, national origin, citizenship, disability, pregnancy status, veteran status, or any other differences. Also, if you have a disability, please let us know if there's any way we can make the interview process better for you; we're happy to accommodate!