Site Reliability Engineer
Technical Operations – Technical Operations /
For businesses running dynamic and complex networks that exceed efficient human operational scale, Kentik is the provider of the only AIOps platform specifically designed for network professionals. Kentik uniquely unifies diverse data streams across cloud and traditional infrastructure to produce instant insights that accelerate network team efficiency, automate issue resolution, and create new business capabilities.
About the role
Kentik, the leading Network Analytics platform, is looking for an experienced SRE/DevOps engineer to architect, build, monitor, and operate the physical and virtual systems that power our products.
We’re looking for a self-starter who can quickly develop a clear plan and determine what’s necessary to implement and maintain it. We operate a well-organized, well-instrumented platform, with an emphasis on results rather than process. We’re expanding fast and offer enormous opportunities for employee growth.
What you'll do
- Be part of a 7-person team across 4 different countries and 4 different time zones, and will participate in a follow-the-sun on-call rotation, working during your preferred hours.
- Provide valuable feedback on team goals, projects, and processes. We believe in continuously improving our team.
- Manage a real-time, scalable, microservices-based infrastructure running on free software, in 6 physical locations and all 3 major clouds.
- Write design documents to introduce new architectures or suggest changes to our existing infrastructure, accommodating for growth and solving scaling issues, while deep-diving in diverse topics, ranging from NetFlow and IP routing, to mysql replication strategies or HTTP optimization.
- Take part in handling incidents and their RCA afterwards, providing valuable input at all stages.
- Contribute code and tools or improve and extend all kinds of existing code. We believe in infrastructure as code and cattle not pets principles, so everything about our infrastructure is reflected in code.
What we’re looking for
- Minimum experience: 4 years of relevant SRE or Systems Administration.
- Communication: Kentik is a remote-first company, so we're looking for a team player who is able to work and collaborate in an asynchronous environment via tools such as email, Google Docs, Slack, Zoom, and Git.
- Linux experience: You’ve managed Debian GNU/Linux or Ubuntu installations in the past.
- HTTP experience: You know what TLS, nginx, HAProxy, or HTTP headers are.
- Networking experience: Terms such as routes and iptables sound familiar.
- An urge to document code, processes, and infrastructure in runbooks and wikis.
- A preference to automate your way out of tedious and repetitive tasks — toil is bad.
- Some familiarity with coding in Python, Ruby, or Go, while using Git to record your changes.
Desirable skills or experience
- You've worked in a microservices environment in the past and feel comfortable working with complex architectures.
- You know your way around Linux networking and have worked with policy routing or have some experience with dynamic routing. Example software: iptables, bird, FRR/Quagga, exaBGP.
- You have developed configuration management code in the past, either to configure servers or networking hardware. Example software: Puppet, Ansible, Chef.
- You’ve used, configured, or know how a CI/CD pipeline like Jenkins works.
- You’re familiar with a metrics platform such as Grafana or Prometheus, understand why it's needed, and would always check it to validate your assumptions.
- You’ve used or configured an ELK stack in the past.
- You’ve got some experience with Cloud infrastructure, and used tools such as Terraform or Packer.
The list is not exhaustive or in any way a checklist; we'll be happy to hear about any other skills or experience you may have!
If you find the position interesting, we’d be happy to hear from you!
Why work at Kentik?
We offer a competitive salary, first rate benefits and the chance to work at a fast growing, well-funded startup that is building something special: the first AIOps platform for the network professional. Kentik is located in the San Francisco neighborhood SOMA, near Oracle Park, a hotbed of innovation and talent, with team members around the world. You will have the opportunity to work alongside world-class engineers, network experts, and technology thought leaders to build the future of digital operations.