Site Reliability Engineer
SF or Remote +/-4 hours PT /
We're on a mission to make programming more accessible by building the best, simplest, and fastest coding environment. Repl.it is a place to not only learn and practice programming but also to collaborate and ship applications. It's both exciting that we've been able to build a community of millions of users with a small team and an ambitious plan, and that we're inventing the future of programming.
Millions of people come to Repl.it to learn how to code, prototype ideas, and build applications. When we go down, it's not a mere annoyance; it's whether thousands of students learn to code that day and whether a developer's apps are up. As our founding SRE, not only will you have a real tangible effect on people's lives, you get to influence our engineering culture and how we build and scale services, and, in the future, grow and lead the SRE team.
Position: Site Reliability Engineer
Roles & Responsibilities
* Build tools to reduce ops toil & babysitting
* Keep Repl.it up and fast
* Influence architecture decisions to take into account availability, performance, scalability, and fault-tolerance.
* Identify trouble spots & single points of failure and delegate fixing to system owners
* Own and evolve our incident response practices
* 5+ years experience
* Systems programming experience (Go, Rust, or C/C++)
* Experience with profiling and performance optimizations
* Comfortable debugging production systems (instrumentation, monitoring, etc)
* Experience working on large projects at scale
* Self-directed and comfortable working autonomously
* Appreciation for simplicity and pragmatism
* Experience building Platform/Infrastructure/Runtime as a Service
* Experience with distributed systems, containers, and/or filesystems
Remote (currently only open to +/-4 hours from pacific time zone)
Ready to build the world's largest developer platform?