Site Reliability Engineer - Kafka - Guidewire Data Platform

United States - Austin, TX /
Product Development /
Are you passionate about solving interesting technical challenges by defining, designing deploying and troubleshooting Cloud services, platforms, and infrastructure, always thinking about reliability, scalability, resilience, security, and performance?
Guidewire Data Platform SRE team is responsible for maximizing service reliability, performance and scalability of Data Platform services. The mission is shared end to end ownership of the services with Development team.
We are seeking Kafka experts to run our Big Data Platform and SaaS applications reliably.
You will work in a highly collaborative environment operating next-generation platforms and services using the latest and greatest cutting-edge & bleeding-edge technologies. This is an opportunity to define, build, evangelize, and optimize our SRE practices!

Note: Our hiring team is open to remote / telecommuter candidates based in these time zones, e.g. - PST or MST or CST. The incumbent will work closely with our engineering team based out of our San Mateo, CA headquarters.

Required Skills

    • Strong operational background running Kafka clusters at scale
    • Strong understanding of Kafka broker, connect, and topic tuning and architectures
    • Strong understanding of Linux fundamentals as related to Kafka performance
    • Extensive experience with continuous deployment of cloud services
    • Prior experience in running data platforms using Big Data stack (Kafka, Hadoop, Spark, Hive) on public cloud (AWS/Azure)
    • Good programming skills in Java, python or any scripting language to build tooling for self service automation
    • Prior experience with CI/CD (Teamcity, Jenkins) for gate promotion
    • Prior experience with IaC tools like Terraform, AWS CloudFormation
    • Good problem solving and analytical skills to troubleshoot issues in a complex multi-tier environment
    • Experience is Incident (PagerDuty) and Post Incident Management 
    • Expertise in Monitoring tools (Datadog, ELK)
    • Knowledge of capacity and scalability of cloud resources
    • Knowledge on blue/green deployment and chaos engineering
    • Comfortable working with Kubernetes (AWS EKS), Docker containers
    • Eager to learn new things and passionate about technology!

What you would do

    • Maintain a 24x7 production and non-production Kafka environment with a high level of service availability.
    • Drive incidents to resolution by coordinating with multiple engineering teams
    • Partner with other development teams in defining and implementing improvements in service architecture.
    • Implement automation and orchestration for manual processes required to operate and deploy cloud services
    • Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks
    • Form and maintain relationships with internal and external partners
    • Develop deeper insights and analysis into the quality of experience for our customers
    • Build a positive work environment based on accountability, in collaboration with the engineering and operations management teams across the Guidewire.

What you would need to succeed

    • BS/MS in Computer Science, Computer Engineering, Math’s or equivalent work experience
    • 8+ years of relevant work experience
    • You have experience solving infrastructure and application problems with software
    • You have a big-picture perspective on systems and tools
    • You can collaborate with other engineering teams to understand their systems and help improve them
    • You have strong technical knowledge of cloud infrastructure, distributed systems, networking, storage, operating systems
    • Experience with Kafka and/or other messaging platforms
    • Experience with AWS and its native services, e.g. MSK
    • Agile development methodologies
Our employee culture…


About Guidewire
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently.

Guidewire combines core, data, digital, analytics, and AI to deliver our platform as a cloud service. More than 400 insurers, including the largest and most complex in the world, run on Guidewire.

As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1000+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of add-ons that accelerate integration, localization, and innovation.

Guidewire Software, Inc. is proud to be an equal opportunity and affirmative action employer. We are committed to an inclusive workplace, and believe that a diversity of perspectives, abilities, and cultures is a key to our success. Qualified applicants will receive consideration without regard to race, color, ancestry, religion, sex, national origin, citizenship, marital status, age, sexual orientation, gender identity, gender expression, veteran status, or disability. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.

Disability Accommodations and Guidewire’s Appeals Process. Guidewire provides accommodations to the hiring process to create a fair opportunity for candidates with disabilities to contend for open positions. Accommodation requests should be directed to (650) 356-4940 or If things do not go as hoped, we invite you to use our appeals process. Guidewire promises to independently review any denied accommodation and any decision not to offer you the position. The appeals process is the same in either case. Within five business days of receiving a notice of denial of an accommodation, or receiving a notice of your non-selection for a vacancy, call (650) 356-4940 or e-mail to make an appeal. Guidewire will assign a new decision-maker to review the request and/or hiring decision, who will then notify you in writing of a decision within 10 business days.