Site Reliability Engineering Manager
Toronto, Ontario, Canada
Development – Operations Engineering
TradeRev is a revolutionary vehicle appraisal and auctioning application. We are the market leader who has changed the way cars are sold in North America and the UK.
Why Work with Us?
Whether your Fun is in working with new technology, learning skills, collaborating with awesomely talented people or growing into a Specialist, Team Lead or People Manager, we want to feed this fire in you. If you have the work ethic, skill and motivation, we will help you find a path that leads you where your passion wants to go. So come on; let's have Fun together.
Take a look at our other benefits here: http://work.traderev.com/
Our Core Values: We are FHAB. Fun. Honest. Accountable. Brave
As the Site Reliability Engineering Manager, you will be one of the brilliant minds behind the scalability and performance of our product. You will be responsible for creating the roadmap for the ongoing growth and maintenance of the production environment. Leading our Site Reliability Engineering team, you will own infrastructure concerns across the enterprise in addition to the stability, reliability, and scalability of the production environment. This role is also expected to play a key role in driving the ongoing conversation with various stakeholders on systems development, automation, and architecture.
- Supervise a team of SREs, ensuring that production applications your team supports are stable, reliable, and well-documented
- Manage and participate in incident response/on-calls for critical issues affecting the production environment
- Work closely with Development teams to ensure tight alignment between systems architecture, applications, deployment pipelines, security initiatives, and underlying platform architecture.
- Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
- Identify scaling bottlenecks and help TradeRev services scale to meet user demand
- Advocate for and apply best practices when it comes to availability, scalability, operational excellence, and efficiency
- Other duties as required
Skills & Qualifications:
- 5+ years of experience leading engineering teams responsible for large scale distributed systems
- 4+ years of experience in an Operational role, DevOps, SRE, or Software Engineering
- 2+ years of experience with cloud computing and solid experience of setup and management of cloud infrastructure, preferably AWS
- Bachelor's Degree in Computer Science, Computer Engineering or equivalent work experience
- Working knowledge of CI/CD pipelines
- Experience with configuration management and infrastructure automation tools, ie Ansible, Terraform, SaltStack, Puppet, Chef, etc
- Experience with the operational aspects of software systems such as monitoring, centralized logging, and alerting
- Passionate about operational excellence, availability, and automating away manual tasks
- Passionate about problem-solving with strong technical communication skills and desire to collaborate with others
We thank all applicants for their interest. Only candidates selected for an interview will be considered.
TradeRev is an equal opportunity employer committed to diversity. TradeRev is committed to providing employment in accordance with the Ontario Human Rights Code and the Accessibility for Ontarians with Disabilities Act. Any assessment and selection materials or processes used during the recruitment process will be available in an accessible format to applicants with disabilities, upon request. If contacted for an interview, please advise Human Resources if you require disability-related accommodation.