Site Reliability Engineer

San Francisco, CA
Engineering – Engineering
Full-time
Are you ready to join the API revolution?

About the role:

As a Site Reliability Engineer, you will be responsible for building and maintaining workflows that automate releasing, testing and deploying Kong Cloud products. Improving staging and production environments for SaaS distributions. Research and build monitoring/analyze tools to optimize building and deploying code-base, manage distributed systems and application resources. You will build workflows for delivering our software to a variety of platforms including AWS, GCP, Azure, and container technologies like Docker, Swarm, Mesosphere and Kubernetes to automate deployment and scale our products.

What you bring:

    • Minimum of 3 years of relevant work experience
    • Experience with continuous/rapid release engineering (CI/CD)
    • "Infrastructure as Code" configuration management systems such as Terraform, Chef, Puppet or Ansible
    • Experience building and administering alerting and monitoring systems for API services
    • Strong knowledge of PostgreSQL/MySQL and Linux/Unix systems
    • Knowledge of one or more mainstream programming languages (Go, C/C++, Python, PHP)
    • Strong skills in network services such as DNS, TLS/SSL, HTTP
    • Experience working in a 24/7/365 service environment
    • BS degree in Computer Science, similar technical field of study or equivalent practical experience

Bonus Points:

    • Experience implementing secure and highly-available distributed systems/microservices
    • Performance analysis and debugging with tools like perf, sar, strace, dtrace
    • Time series databases (OpenTSDB, Graphite, Prometheus, Grafana)
    • Design, implement, manage and orchestrate container clusters