Lead Site Reliability Engineer
UK London
Engineering – Infrastructure Engineering /
Employee - Regular/Permanent /
Hybrid
Inclusion at Bumble Inc.
Bumble Inc. is an equal opportunity employer and we strongly encourage people of all ages, colour, lesbian, gay, bisexual, transgender, queer and non-binary people, veterans, parents, people with disabilities, and neurodivergent people to apply. We're happy to make any reasonable adjustments that will help you feel more confident throughout the process, please don't hesitate to let us know how we can help.
In your application, please feel free to note which pronouns you use (For example: she/her, he/him, they/them, etc).
At Bumble, Site Reliability Engineers (SRE) are responsible for ensuring the reliability, scalability and performance of software systems while bridging the gap between development, security and operations.
We proactively manage, automate, and safeguard our infrastructure to deliver a robust foundation for the business and an exceptional experience for our stakeholders.
What you'll be doing
- Design and build new tools and services from the ground up to solve complex problems
- Build automation frameworks to streamline repetitive tasks
- Design and maintain scalable, highly available and fault-tolerant systems
- Build and maintain observability tooling including logging, Monitoring, tracing and alerting systems
- Develop and maintain automation tooling to reduce manual intervention
- Implement infrastructure as code (IaC) for infrastructure provisioning.
- Monitor system health and performance, identifying and fixing issues
- Respond to system outages, troubleshooting root causes and implementing preventative measures
- Collaborate with engineering teams and security engineers to improve system reliability, security and performance
- Participate in on-call rotations
- Create and maintain documentation to improve knowledge sharing across teams
About you
- Excellent problem solving, analytical skills
- Strong communication and collaboration skills are a must
- Proficiency in at least Python or Golang programming languages
- Experience with CI/CD pipelines
- Strong Proficiency with Kubernetes architecture
- Prior experience in SRE, System administration or DevOps roles
- Strong proficiency with Linux/Unix operating systems, including hands-on experience in configuration and troubleshooting
- Proficiency with using Puppet for configuration management, automation and system provisioning
- Hands-on experience in Monitoring and observability platforms such as: Grafana, Prometheus, Elasticsearch, jaeger
- Experience with Cloud architectures such as GCP or AWS
- Familiarity with SQL databases and broker systems such as Kafka
- You are a solution-orientated professional with a passion for problem-solving
- You take pride in ensuring systems are performant, stable and efficient
- You thrive in a collaborative environment
- Continuous learning is important to you and your activity explores new tools and techniques.
- You are curiosity-driven and are constantly seeking new ways to improve processes and implement new modern solutions
- You are committed to ensuring quality is at the heart of every project.
About Us
Bumble Inc. is the parent company of Bumble, Badoo, Fruitz and Official. The Bumble platform enables people to build healthy and equitable relationships, through kind connections. Founded by Whitney Wolfe Herd in 2014, Bumble was one of the first dating apps built with women at the centre and connects people across dating (Bumble Date), friendship (Bumble BFF) and professional networking (Bumble Bizz). Badoo, which was founded in 2006, is one of the pioneers of web and mobile dating products. Fruitz, founded in 2017, encourages open and honest communication of dating intentions through playful fruit metaphors. Official is an app for couples that promotes open and honest communication between partners and was founded in 2020.