Software Engineer, System Reliability

Tokyo
Product & Technology – Enterprise Technology /
Employee /
Hybrid
Woven by Toyota is the mobility technology subsidiary of Toyota Motor Corporation. Our mission is to deliver safe, intelligent, human-centered mobility for all. Through our Arene mobility software platform, safety-first automated driving technology and Toyota Woven City — our test course for advanced mobility — we’re bringing greater freedom, safety and happiness to people and society. 

Our unique global culture weaves modern Silicon Valley innovation and time-tested Japanese quality craftsmanship. We leverage these complementary strengths to amplify the capabilities of drivers, foster happiness, and elevate well-being.

TEAM
Software development in the automotive industry comes with unique challenges. From deploying artifacts to cloud, mobile devices, and vehicles, to meeting vehicle security certifications and navigating different types of testing and simulation, it can be overwhelming for automotive software developers. That's why the Enterprise Technology Engineering Team (EnTec) builds solutions that enhance productivity, so developers can focus on their true passion – software development – without being bogged down by setup and customization tasks.

At EnTec, our mission is to make software development for Woven by Toyota and the greater Toyota organization as a whole more productive and efficient. We use the latest technologies to help engineering teams go faster, with safety as our top priority. Our modern, agile, and transparent services are designed to bring to life Woven by Toyota's vision of "Mobility to Love, Safety to Live."

WHO ARE WE LOOKING FOR
The SRE team collaborates with the product development team, sharing the same codebase, but with a primary focus on non-functional requirements. Our objective is to enhance production readiness and reliability. We are looking for an SRE lead engineer who has experience in managing a team of employees and contractors. A successful candidate will have a background in software engineering, DevOps, and cloud engineering. We are interested in individuals who are passionate about establishing SRE best practices such as SLA measurement, error budget and automation/toil reduction and promoting a culture of excellence within engineering platform products. You'll report to the team lead of SRE under Platform Engineering. This role is hybrid, and also demands standby support over weekends, holidays and off hours on a rotation basis as well.

RESPONSIBILITIES:

    • Lead a team of SRE  engineers who are doing 24/7 follow the sun model SRE work. 
    • Design, develop, and deliver software systems for improved product monitoring, reliability, and development efficiency
    • Establish SLA for platform products, implement SLI monitoring and alerting, evangelize and implement  the advanced SRE concepts of error budgets. 
    • Participate in on-call rotations to monitor and respond to incidents, ensuring service health
    • Provide guidance on reliability practices throughout the software development lifecycle, including architecture and code reviews
    • Establish SRE best practices within product teams, including capacity planning, chaos testing, and disaster recovery drills
    • Learn from incidents through blameless post-mortems and address service reliability issues through hands-on coding
    • Enhance development and operations teams' efficiency through task automation and reducing toil

MINIMUM QUALIFICATIONS:

    • Bachelor’s degree in Computer Science, Technology, Engineering, Mathematics, or equivalent practical experience
    • 4+ years of experience in Go, Python, or a similar language
    • Proficient in data structures, algorithms, and software design
    • Experience in leading teams. 
    • Intermediate to advanced level of expertise in public cloud technologies, Kubernetes, and Infrastructure as Code
    • Experience with APM solutions and monitoring systems such as Prometheus, Wavefront, Dynatrace, New Relic, etc
    • Experience automating monitoring best practices through MaC (monitoring as code)
    • Proficient in implementing SLO, SLI measurement, error budget management, and reporting
    • Hands-on experience in automation and toil reduction solutions to improve productivity
    • Experience enhancing stability, reliability, performance, and high availability of systems through strategic planning and architecture designs
    • Business level English skills

NICE TO HAVES:

    • Japanese language skill to interact with customers. 
    • Hands-on experience in SRE best practices, including disaster recovery planning, chaos testing, capacity planning, and more
    • Proficient in production on-call, troubleshooting, and incident management
    • Previous experience in an SRE, DevOps, or Platform Engineering role
    • Professional/Associate level Certification with one of the popular cloud providers is advisable for this role
If you are located outside of Japan we will set up an interview over Google Hangout Meet.

WHAT WE OFFER
・Competitive Salary - Based on experience
・Work Hours - Flexible working time with NO core-hours
・Paid Holiday - 20 days per year (prorated)
・Sick Leave - 6 days per year (prorated)
・Holiday - Sat & Sun, Japanese National Holidays, and other days defined by our company
・Japanese Social Security - all applicable (Health Insurance, Pension, Workers’ Comp, and Unemployment Insurance, Long-term care insurance)
・In-house Training Program (software study/language study)

By submitting your application you agree to the following terms: https://woven.toyota/en/applicant-privacy-notice

Our Commitment
・We are an equal opportunity employer and value diversity.
We pledge that any information we receive from you will be used ONLY for the purpose of hiring assessment.