Staff Site Reliability Engineer

San Francisco, CA /
Engineering – Engineering /
Full-time
Are you ready to join the API revolution?

About the role:

You will be responsible for leading a team that is building and maintaining workflows that automate releasing, testing and deploying Kong Cloud products. Improving staging and production environments for SaaS distributions. Research and build monitoring/analyze tools to optimize building and deploying code-base, manage distributed systems and application resources. You will build workflows for delivering our software to a variety of platforms including AWS, GCP, Azure, and container technologies like Docker, Swarm, Mesosphere and Kubernetes to automate deployment and scale our products.

What you bring:

    • BS degree in Computer Science, similar technical field of study or equivalent practical experience
    • Minimum of 8 years of relevant work experience
    • Minimum of 3 years of leading/mentoring a team
    • Experience with continuous/rapid release engineering (CI/CD)
    • "Infrastructure as Code" configuration management systems such as Terraform, Chef, Puppet or Ansible
    • Experience building and administering alerting and monitoring systems for API services
    • Strong knowledge of PostgreSQL/MySQL and Linux/Unix systems
    • Knowledge of one or more mainstream programming languages (Go, C/C++, Python, PHP)
    • Strong skills in network services such as DNS, TLS/SSL, HTTP
    • Experience working in a 24/7/365 service environment

Bonus Points:

    • Experience implementing secure and highly-available distributed systems/microservices
    • Performance analysis and debugging with tools like perf, sar, strace, dtrace
    • Time series databases (OpenTSDB, Graphite, Prometheus, Grafana)
    • Design, implement, manage and orchestrate container clusters