Senior SRE/DevOps Engineer
Metabase is the easiest way for people to get insights from their data, from tiny startups who get up and running quickly to major corporations with tens of thousands of users. That's why people love us.
We bring data tools with the elegance and simplicity of consumer products to the crufty world of enterprise business intelligence. We provide an opinionated open source starting point for how companies should measure, analyze and share their data, which is used by tens of thousands of companies.
Tens of thousands of companies use Metabase every day to answer questions about their data. While we seek to become the de-facto self-managed open source analytics software for organizations everywhere, many customers want an ability to use Metabase without worrying about the operational details of self-hosting. That’s why we recently launched our Metabase Cloud product. We’re looking for operations engineers to help build out and run our new and quickly growing ‘Metabase Cloud’ hosted product.
- Own and operate our application stack and AWS infrastructure to orchestrate and manage our hosted customer instances of Metabase
- Debug runtime issues across the different levels of our application stack and hosting stack.
- Develop and build our internal tooling and automation to manage the lifecycle of a hosted Metabase installation, from purchase to deployment, zero-downtime upgrades, and general operational health
- Continuously improve our automated deployments and testing
We're looking for someone who:
- Is thoughtful and careful
- Compulsively automates everything and documents it
- Is able to make solid technical judgements and back them up articulately
- Has at least 5 years of experience building and operating production infrastructure, ideally on public cloud
- Strong Kubernetes and AWS experience
- Strong experience with IaC and Terraform
- Can write high quality and readable code in a modern language (e.g. Python, Go, etc.)
- Experience with modern monitoring stacks (e.g Prometheus/Grafana/Datadog)
Projects you could work on:
- Multi-region hosting
- Automate EKS cluster provisioning
- Extend our CRDs and Operators
- Improve the RDS sharding strategy for our multi-tenant platform
- Unify and improve our CI/CD platforms
- Collaborate with core application developers on changes to improve our application metrics, deployment speeds and CI integration.
- Maintain our SOC2 compliance and security posture
We're a global team (50% outside the US), fully distributed (from Thailand to California), who get things done asynchronously, with plenty of uninterrupted time, supporting each other to do the best work of our careers. We offer flexibility (define your own schedule and work from wherever you want), autonomy, and an environment that fosters growth, learning, and development. We're relentlessly user-focused and believe in building long-term value, not short-term hacks. And we raised a $30M Series B to take our approach to the next level for years to come.