Site Reliability Engineer

Sydney
Technology – Engineering /
Regular - Full Time /
Hybrid
The Company  

Cover Genius is a Series E insurtech that protects the global customers of the world’s largest digital companies including Booking Holdings, owner of Priceline, Kayak and Booking.com, Intuit, Uber, HopperRyanair, Turkish Airlines, Descartes ShipRush, Zip and SeatGeek. We’re also available at AmazonFlipkarteBay, Wayfair and SE Asia’s largest company, Shopee. Our partners integrate with XCover, our award-winning insurance distribution platform, to embed protection for millions of customers worldwide each year.
 
Our team and products have been recognized with dozens of awards including by the Financial Times which ranked Cover Genius as the #1 fastest-growing company in APAC in 2020. Our diverse team across 20+ countries and many language groups commit itself to diverse cultural programs, in particular “CG Gives” which makes social entrepreneurs out of us all and funds development initiatives in global communities.

Our People are
Bold, Authentic, Purposeful and Inspired

Our People are not
Perfect, Traditional, Complacent or Cautious

About the role:

The primary responsibility of Site Reliability Engineers is to ensure the reliable operation of production systems. In addition Site Reliability Engineers work across a wide range of technical areas to automate and improve platforms and operations in the following areas:
- Releases processes
- Observability
- Security
- Core Network & Infrastructure
- Datastores & Disaster Recovery

They continually monitor the system’s health and control security, sharing ownership of production workloads with software engineering teams. Along with Software Engineers, SREs are responsible for writing and maintaining technical documentation such as tutorials, guides, and blameless post-mortems. SREs also design and create information dashboards based on logging and monitoring data. They are key team members in helping automate, scale and drive efficiency across the technology products & platforms.

Main Duties & Responsibilities:

    • Analyze, test and modify systems to improve reliability and optimize performance particularly at an architectural/infrastructure level
    • Develop and maintain observability tooling and dashboards
    • Implement automation tools and frameworks, CI/CD pipelines, Reduce toil
    • Troubleshoot production issues and coordinate with the development team to streamline code deployments
    • Apply AWS and GCP knowledge and skills to create & maintain cloud infrastructure for software projects
    • Design, develop and implement software integrationsCollaborate with Software Engineers and other team members with the goal of improving engineering tools, systems, procedures and data security
    • Develop and maintain design and troubleshooting documentation and runbooksOptimize and control costs of the company’s computing infrastructure

To be successful in this role you will bring:

    • Understanding of SRE Principles and best practices
    • Experience using & configuring modern observability tools such as ELK/EFK, Prometheus, Grafana
    • Comfortable scripting & developing internal tooling with Bash and at least one programming language (e.g. python, go)
    • Experience working with infrastructure & configuration as code tools such as Terraform, Cloudformation, Chef, Puppet etc.
    • Experienced with container technology such as Docker and Ideally experienced with using and managing Kubernetes clusters
    • Experience working with Linux
    • Solid understanding of networking and system architectureSolid understanding of how to deploy, scale and monitor web applications and databases
    • Good knowledge of AWS and/or GCP platforms and associated best practices
    • Bachelor Degree in Computer Science/Engineering or equivalent practical experience
    • Strong communication and documentation skills
    • Curious and self motivated learner
    • Professional approach
    • Good team member
    • Organisational and time management skills
    • Excellent attention to detail
    • Positive approach to change