Director, Site Reliability Engineering

Brooklyn, NY
Engineering
Full-time
Kickstarter is looking for an SRE Director to manage a team of Operations Engineers and define and grow the Site Reliability Engineering function within Kickstarter. In addition to building and leading a team of engineers to manage our infrastructure, this individual will work across the Engineering organization to provide guidance and education on topics such as availability, performance, change management, application security, monitoring, and capacity planning.

About the Team

Kickstarter’s Operations Engineering team is a small team with engineers geographically distributed across the country. We manage Kickstarter’s infrastructure, supporting both the monolithic user-facing application that several product engineering teams are contributing to, as well as a handful of polyglot services that support that application. Our challenge is to make sure that engineers across the organization have the right tools to best understand and interact with code in our production environments: from continuous delivery pipelines to alerting and monitoring.

In This Role, You Will:

    • Lead and grow a team focused on operations and infrastructure.
    • Define roadmaps and priorities for growing and scaling our AWS deployment and associated tooling.
    • Partner with other engineering leaders to guide conversations and best practices in application architecture and cloud infrastructure.
    • Advocate for performance, security, global scale, and automation.
    • Define and be held accountable for process, metrics, and SLAs for our operations and infrastructure.

About You

    • You have 6+ years of software development experience, partly as a manager of an engineering team.
    • You have prior experience in the fields of devops, infrastructure engineering, systems engineering, or site reliability engineering.
    • You are a creative problem solver who understands and can articulate the tradeoffs involved in technical approaches, and can demonstrate pragmatism and resourcefulness when confronted with constraints.
    • You continuously demonstrate empathy with non-technical audiences and engineers whose area of interest or experience lies outside of SRE and Infrastructure.
    • You have strong opinions informed by experience on subjects like infrastructure-as-code, containerization, microservice orchestration, distributed systems, incident response, and AWS bric-a-brac, and are willing to experiment to refine those opinions.