Site Reliability Engineer, Tools Engineering
Tokyo
Product & Technology – Arene /
Employee /
Hybrid
About Woven by Toyota
Woven by Toyota, a part of the Toyota Group, is challenging the current state of mobility through human-centric innovation and empowering mobility transformation. Through our AD/ADAS technology, our automotive software development platform Arene OS, our mobility test course Toyota Woven City, and Toyota’s growth fund, Woven Capital, we are pioneering the movement of people, goods, information, and energy, weaving a future of enhanced safety, connectivity and well-being for all.
=========================================================================
TEAM
Arene's objective is to make vehicle programming accessible to everyone by simplifying vehicle software development and increasing deployment frequency without compromising safety and security. This will create a new market of vehicle application developers who will use software to integrate vehicles into our daily lives in innovative ways. Arene aims to significantly improve how vehicles are designed and developed, and we are collaborating closely with Toyota to achieve this goal in its next-generation vehicles.
The Arene Site Reliability Engineering Team (SRE) defines, develops, and evangelizes sound development and operational processes and tooling to ensure system stability and resilience, observability, and compliance with organizational standards. In doing so, we work directly with teams and empower them to deploy and manage their services at scale. We practice a culture of continuous improvement and blameless post-mortems.
As a member of the Arene SRE team, you will create tools and processes to simplify operations and bring visibility into anomalies and service degradations. You will work across teams in various domains and embed with teams when necessary. We support services deployed across AWS, GCP, and on-premise data centres. Tools we use include Docker, Kubernetes, Helm, Terraform, Github, and GCP Logging. You will report to the SRE Manager. This role’s workplace is on-site in Japan, in-office at least three days per week.
WHO ARE WE LOOKING FOR?
You are an engineer who is passionate about operational excellence and automation. You have broad knowledge across SRE domains, solid coding skills, strong drive, and experience communicating updates and resolutions to customers and other stakeholders.
RESPONSIBILITIES
- Monitoring the operational health of services across Arene and triaging incidents
- Mentor teams in the definition and creation of Service Level Objectives (SLOs), Service Level Indicators( SLIs), and on-call and incident response procedures
- Designing, implementing, and maintaining tools related to observability, incident management, and deployments
- Working with our automotive and cloud software teams to improve the operation experience and system reliability
- Promoting operational best practices among teams across the organization
MINIMUM QUALIFICATIONS
- 5+ years of experience in C++, Python, Java, Go, or similar languages
- Experience with cloud-based software solutions, containerization (Docker, Kubernetes), along with infrastructure-as-code (Terraform, CloudFormation, Azure RM, etc.)
- Familiarity with Unix, including a strong understanding of Unix fundamentals and scripting
- Professional experience with build tools and CI/CD pipelines and automation, such as Github Actions, Gitlab CI, and Jenkins)
- Knowledge of observability tooling and best practices for large, distributed systems, such as Datadog, Prometheus, GCP Logging, Pagerduty
NICE TO HAVES
- At least 3 years of experience in an SRE role
- Experience troubleshooting and debugging complex distributed systems
- Experience writing custom SRE tooling
- Experience architecting, deploying, operating, and monitoring solutions on AWS, GCP, or Azure
- Ability to build custom Terraform modules and providers
- Experience communicating and explaining SRE concepts and practices to a broad range of technical expertise, including mentorship of fellow engineers and promoting best practices to entire teams
- Knowledge of or experience with safety and security standards
=========================================================================
Important Points
・All interviews will be arranged via Google Meet, unless otherwise stated.
・The same job descriptions are available in both English and Japanese; therefore, we kindly ask that you apply to only one version.
・We kindly request that you submit your resume in English, if possible. However, Japanese resumes are also acceptable. Please note that, depending on the English proficiency requirements of the role, we may request an English version of your resume later in the process.
WHAT WE OFFER
・Competitive Salary - Based on experience
・Work Hours - Flexible working time
・Paid Holiday - 20 days per year (prorated)
・Sick Leave - 6 days per year (prorated)
・Holiday - Sat & Sun, Japanese National Holidays, and other days defined by our company
・Japanese Social Insurance - Health Insurance, Pension, Workers’ Comp, and Unemployment Insurance, Long-term care insurance
・Housing Allowance
・Retirement Benefits
・Rental Cars Support
・In-house Training Program (software study/language study)
Our Commitment
・We are an equal opportunity employer and value diversity.
・Any information we receive from you will be used only in the hiring and onboarding process. Please see our privacy notice for more details.