Site Reliability Engineer
LaunchDarkly is dedicated to helping teams ship better software, faster. Developers and operations teams use our Feature Management platform to eliminate risk from their software development cycle. We serve over 100 billion feature flags daily for companies big and small.
We're based in downtown Oakland and growing quickly. Come join our talented and diverse team and work alongside alumni of Atlassian, Intercom, Google and Twitter. You'll help us tackle some of the most challenging engineering problems around, like how can we deliver feature flags to hundreds of millions of users in milliseconds, without breaking the bank. And the platform you help us build will help software developers everywhere sleep better at night.
At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. A team that succeeds together.
About the Role:
As an SRE at LaunchDarkly in Oakland, you'll help us build, scale, and operate LaunchDarkly's feature management platform, improving our reliability and automation.
LaunchDarkly serves billions of feature flags every day to customers around the world. And we ingest, analyze, and query billions of events per day. We need you to keep us ahead of the growth curve by making smart investments in tools, technology and people.
You'll help us create architecture and process that can handle the exponential growth of our product. You'll enable us to deliver at high scale, performance and reliability.
What you'll get to do:
- Work directly with our CTO and development team to define and evolve our architecture. You'll be a core voice in every technical decision we make.
- Deploy and operate our distributed, high-throughput, real-time data analytics pipeline, implemented as a set of Go microservices.
- Evolve our monitoring and analytics infrastructure.
- Diagnose and troubleshoot services during incidents.
- Tune and manage open-source tools like Elasticsearch, Kafka, Redis, and Cassandra.
- Evolve our CI/CD pipeline to survive an ever-growing number of engineers and accomodate an increasing rate of change safely.
- Enhance the use of configuration management tools to operationalize deployments
- Improve the reliability and efficiency of fault-tolerant distributed systems.
- Lead a team of engineers during incidents and executing a thorough incident management process.
On day one you should have:
- Experience building and operating large-scale production systems
- A track record of working collaboratively in a rapidly moving engineering team
- Strong understanding of networking technologies, plus practical experience dealing with networking issues in real-world environments
- A bias toward repeatability and eliminating human effort through software automation
- Self‐starter and problem solver, willing to solve difficult problems and work independently when necessary
- The ability to identify problems, propose solutions, gain consensus and see those solutions into production
- Strong testing background: experience building unit, integration, performance, and load tests
- Experience with real-time event logging, stats collection, and analysis
- Experience operating a large system on AWS
- Experience with Go
- Experience with commercial logging and monitoring tools
LaunchDarkly is a Feature Management Platform that serves over 100 billion feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka ”dark launching”). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.