Senior Site Reliability Engineer

United States - Denver, CO
Product Development and Operations /
Full-time /
Remote
Guidewire is searching for a Sr. Site Reliability Engineer who is hungry for a rare chance to transform insurance with the industry’s leading Analytics platform.  As a member of the SRE-Analytics Team, you’ll be responsible for building and evolving our SRE practice for Analytics.  The Analytics team at Guidewire uses internet scale data collection, adaptive machine learning, generative automated intelligence (Gen AI), and insurance risk modeling capabilities to help insurers and other financial institutions model evolving risks, develop new products, and make better business decisions.  This role is a great opportunity for individuals motivated by learning cutting edge technologies and their application to solve real world business problems. Guidewire is the AWS for insurance companies that use our platforms and applications.  The solutions developed by you and this team will be used by hundreds of insurance companies and impact billions of dollars in annual transactions

Downtime and failures are inevitable, but how SREs deal with the problem is what’s important. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments.  Part of the responsibility SREs have is to collaborate with developers to troubleshoot and solve problems and reduce customer impact where possible. SREs will also need to go one step further after the incident to document and examine what went wrong and develop measures such as automated runbooks to handle the issue moving forward.

When on-call, you will be responsible for:

    • Responding to any critical incidents and ticket escalations.
    • Following and documenting our post incident response/post mortem processes.
    • Executing planned patching or improving related automation Engineering to reduce toil, tune alerts, and improve documentation

When NOT on-call, you will be responsible for:

    • Engineering to re-platform or migrate layers of our infrastructure to Kubernetes ecosystems.
    • Analyzing our AWS infrastructure and related applications/services for design and architectural opportunities to improve overall reliability and cost intelligence.
    • Creating patterns of observability to ensure all alerts have consistent content/config to ensure triaging is short and overall MTTR is continuously improved.
    • Analyzing incident data to determine the next opportunity to improve reliability.
    • Influencing engineers to improve application reliability and scalability to run efficiently.
    • Documenting every action, if not captured as code, so your findings turn into repeatable actions and then into automation.
    • Improve operational processes (such as deployments and upgrades) to make them as boring as possible

Required Skills:

    • Proven experience triaging and debugging distributed systems on cloud infrastructure Proven experience in designing and engineering CI/CD pipelines within K8S and legacy ecosystems.
    • Experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud native approaches.
    • Experience in designing and engineering monitors, dashboards, and synthetic testing.
    • Experience in managing infrastructure config at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible.
    • Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale.
    • Comfortable with Linux system administration, with the ability to program/script using Python, Go, Java, shell, or equivalent.
    • Good verbal and written communication skills

Preferred Skills

    • SRE Certified in multiple categories.
    • AWS Certified in multiple categories.
    • Experience with Datadog Cloud Monitoring.
    • Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design.
    • Proficiency with multiple pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, and GitHub Actions.
    • Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc
About Guidewire

Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.

As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.

For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.

Guidewire Software, Inc. is proud to be an equal opportunity and affirmative action employer. We are committed to an inclusive workplace, and believe that a diversity of perspectives, abilities, and cultures is a key to our success. Qualified applicants will receive consideration without regard to race, color, ancestry, religion, sex, national origin, citizenship, marital status, age, sexual orientation, gender identity, gender expression, veteran status, or disability. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.

Disability Accommodations and Guidewire’s Appeals Process. Guidewire provides accommodations to the hiring process to create a fair opportunity for candidates with disabilities to contend for open positions. Accommodation requests should be directed to (650) 356-4940 or Accommodations@guidewire.com. If things do not go as hoped, we invite you to use our appeals process. Guidewire promises to independently review any denied accommodation and any decision not to offer you the position. The appeals process is the same in either case. Within five business days of receiving a notice of denial of an accommodation, or receiving a notice of your non-selection for a vacancy, call (650) 356-4940 or e-mail Accommodations@guidewire.com to make an appeal. Guidewire will assign a new decision-maker to review the request and/or hiring decision, who will then notify you in writing of a decision within 10 business days.