Senior Site Reliability Engineer - Core Product Team

Remote - US /
Research & Development – Engineering /
Full-time
At Rollbar, we help developers build better software faster - and make their lives easier. We are a ~70 -person team based in San Francisco, Barcelona, and Budapest. Over 100,000 developers use our product to power all kinds of applications that affect people’s lives and livelihoods. Rollbar is used by some of the best engineering teams in the world, including Twilio, Salesforce, Zendesk, Affirm and Twitch.

We are looking for an experienced Senior Site Reliability Engineer to join our Core Product team to help scale our systems and services to the next level. In this role, you will have a massive impact on our frontend and backend API infrastructure and at the same time will make it much easier for our engineers to work with these systems.

We believe small autonomous and distributed teams are the most effective way to move faster and build dynamic systems. Our architecture and workflows are focused on scalable microservices (container-based), short cycle time and a highly agile culture.

Rollbar tech stack:

    • Applications:
    • Next.js Jamstack FE running on Netlify
    • Apollo GraphQL server running on GCP
    • Python Pyramid web APIs running on GCP
    • Infrastructure MySQL, Kafka, Elasticsearch, Redis, Memcache, Beanstalkd, Zookeeper, Hashicorp Consul, Hashicorp Vault
    • CI/CD Docker, Docker Compose, Kubernetes (GKE), Terraform, Ansible, GitHub, GitHub Actions CircleCI
    • Observability DataDog, Rollbar, Grafana, Prometheus, Graphite, StatsD,
    • Cloud Google Cloud Platform

You will:

    • Work with the team as we migrate services out of our monolithic application.
    • Measure, monitor, and improve system performance, availability, and reliability for services owned by the team and in doing so, help the team build up expertise in these areas.
    • Join our on-call rotation and eventually lead incident response.
    • Help improve the tools we use to build and run Rollbar’s Web team with a focus on CI/CD tools, automation and scalability.
    • Help us figure out a long term plan for our Jamstack Frontend.
    • Help us improve our staging and other test environments to ensure higher quality testing of features prior to release.

You have:

    • 5+ years experience in technical operations, DevOps, or similar roles with solid grounding in Computer Science fundamentals
    • Comfort coding in bash, Python, Node, or Golang and with excitement to learn new languages and frameworks
    • Experience working on microservice architecture and operating a SaaS product in a public cloud
    • Experience with large scale distributed systems
    • Experience with code containerization, specifically Docker
    • Passion and excitement for tracking & improving performance
    • Experience operating services running on cloud providers - GCP experience strongly preferred

Bonus points:

    • Experience working with Terraform and Kubernetes

Benefits and perks:

    • Rapid career growth opportunities
    • Competitive salary and stock options
    • Medical, dental and vision insurance
    • Parental leave: 12 weeks
    • Generous hardware, software, and home office set up allowance
    • Fully remote work environment
    • Inclusive team-oriented culture
    • Have fun while making a meaningful impact