Senior Site Reliability Engineer (Remote)
Waterloo, ON /
Mattermost is the industry’s leading open-source enterprise-grade messaging platform. Customers including Intel, Ubisoft, Samsung, Cigna, BNP, European Commission, Social Security Administration, and Affirm use Mattermost to enable their teams to collaborate securely and privately anywhere. Many of the world’s leading privacy-conscious enterprises like The US Department of Defense work better by connecting people, tools, and automation to increase developer collaboration using Mattermost. Our private cloud messaging platform offers secure, configurable, highly scalable messaging using web, mobile, and desktop applications and provides deep integrations with hundreds of SaaS and on-premises tools and applications.
We value high impact work, ownership, self-awareness and being focused on customer success. If these values match who you are, we hope you'll learn more about working at Mattermost and apply!
We are looking for an engineer with demonstrated experience in software development and infrastructure using Kubernetes. You will be ensuring high reliability and scaling of Mattermost’s new SaaS offering through building tools, deploying infrastructure and automation in Kubernetes.
- Build services and tools to ensure the stability of Mattermost’s SaaS offering
- Define infrastructure in code with Terraform and other tools
- Write thoughtful and high-quality code in Go
- Follow our engineering best practices, and ensure alignment with our Leadership Principles
- Provide technical mentorship for fellow engineers
- Develop services to handle automatic recovery from incidents and disasters
- Automate incident or disaster simulations to identify blindspots
- Set technical vision and innovate to be on the forefront of self-healing SaaS services
- Implement, maintain and tune monitoring and alerting systems
- Deploy applications to and manage Kubernetes clusters
- Participate in our on-call rotation to respond to incidents and resolve problems.
- Bachelor's degree in Computer Science or related fields, or significant professional DevOps or SRE experience
- 5+ years of previous experience as a developer or SRE with operational responsibilities
- Strong experience with AWS and other cloud providers
- Understand Kubernetes inside and out
- Proven experience responding on-call to incidents with superior knowledge of incident response processes
- Strong skills and experience working with infrastructure as code tools, such as Terraform
- Familiarity with container systems such as Kubernetes & Docker
- Solid programming skills and experience with or an ability to quickly become proficient in Go
- Ability and willingness to be on-call
- Experience with distributed application systems using HTTP, WebSockets, RPC, pub/sub, etc. at scale
- Open source contributions to related projects
- Knowledge of Grafana and Prometheus
- Comfortable with GitHub, Jira, Jenkins, CircleCI
- Experience working in open source communities
Mattermost is a remote-first company with staff living and working across the globe. We are currently hiring staff in these countries/regions:
Canada - Chile - Finland - Georgia - Germany - India - Mauritius - Philippines - Poland - South Africa - Turkey - Ukraine - United Kingdom - United States
We are constantly working towards adding more countries/regions to this list, but first we need to make sure we are compliant with local laws and regulations, which takes time.
Mattermost is made up of people from a wide variety of backgrounds and lifestyles. We embrace diversity and invite applications from people from all walks of life. We don't discriminate against staff or applicants based on gender identity or expression, sexual orientation, race, religion, age, national origin, citizenship, disability, pregnancy status, veteran status, or any other differences. Also, if you have a disability, please let us know if there's any way we can make the interview process better for you; we're happy to accommodate!