Site Reliability Engineer - Productivity Engineering

London
Engineering – Platform /
Permanent /
Hybrid
We are looking for a passionate Site Reliability Engineer (SRE) to join Spotify’s Identity and Access Management team in our Productivity Engineering Studio, the team responsible for advancing the digital workplace at Spotify. This Engineer will be contributing to creating a more lightweight and robust environment to manage identity and access for Spotifiers.

Our mission is to give every Spotifier the best possible user experience from the moment they open their device. By doing so we seek to remove all barriers to collaboration, enabling every employee to work freely and securely—with an experience that makes Spotify into one of the best places to work. As an SRE in the IAM space, you will combine software and systems engineering to build and run large-scale, distributed, fault-tolerant systems with a focus on optimizing systems and implementing and managing identity services and platforms. At Spotify, our engineers are at the forefront of the SRE profession and are empowered to work on significant projects. Their expertise has been featured in both editions of O’Reilly’s “Site Reliability Handbook.” They achieve this from an environment that encourages intellectual curiosity, problem solving, and openness—one that provides the support and mentorship needed to “safe to fail”, to learn, and to grow. 

What You'll Do

    • Manage and improve the whole lifecycle for services—from inception and design, through deployment, operation and optimisation
    • Support product rollout of new IAM features, processes and new technologies
    • Work with your teammates and stakeholders to identify and mitigate complex IAM and other risk areas, including internal controls and other related opportunities for improvements
    • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
    • Participate in disaster recovery, incident resolution, capacity planning, monitoring and maintenance to ensure high availability. This includes being part of a weekly goalie and on-call rotation for the team

Who You Are

    • You are a valued teammate in a dynamic, autonomous, cross-functional agile team
    • You have experience designing, analyzing, optimizing code and troubleshooting large-scale distributed systems
    • You have had exposure to delivering identity solutions that are functional, secure, scalable, and reliable
    • You have technical knowledge of IAM systems, protocols and standards including but not limited to: SAML, SCIM, OAuth, OIDC, LDAP and Okta. Product hands-on experience is a plus
    • You know how to write distributed, high-performance services in Java or Python and are experienced deploying and operating services on Linux based infrastructure in GCP, Azure or AWS. Puppet and/or Terraform experience is a plus
    • You have a systematic problem-solving approach, coupled with effective communication skills and a sense of drive
    • You love working on a team where you constantly learn, experiment, and iterate quickly

Where You'll Be

    • You'll be based in London, UK