Site Reliability Engineer
Development – Operations Engineering
TradeRev is a revolutionary vehicle appraisal and auctioning application. We are the market leader who has changed the way cars are sold in North America and the UK.
Why Work with Us?
Whether your Fun is in working with new technology, learning skills, collaborating with awesomely talented people or growing into a Specialist, Team Lead or People Manager, we want to feed this fire in you. If you have the work ethic, skill and motivation, we will help you find a path that leads you where your passion wants to go. So come on; let's have Fun together.
Take a look at our other benefits here: http://work.traderev.com/
Our Core Values: We are FHAB. Fun. Honest. Accountable. Brave.
As a Site Reliability Engineer, you will utilize your software and systems engineering background to build and run large-scale, distributed, fault-tolerant systems. Your role is to ensure that TradeRev’s systems - both internal and externally facing-have reliability and maximum uptime.
Our current team focuses on optimizing existing systems, building infrastructure and eliminating work through automation. You are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on manual operational work, postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and technical standards.
- Build scalable systems, using best practices around automation, pushing changes that improve reliability and velocity
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning and reviews
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Provide mentorship and training to other team members on technologies and processes; drive education and knowledge transfer of design patterns, technical practices, and relevant technologies and tools
- Drive high standards around incident response practices and policies
- 4+ years' of experience in an Operational role, DevOps, SRE, or Software Engineering
- In-depth experience with cloud computing and solid experience of setup and management of cloud infrastructure
- You can write code - in any language. You’ve implemented your work to production
- Extensive experience with configuration management and infrastructure automation tools, ie Ansible, Terraform, SaltStack, Puppet, Chef, etc
- Experience with large scale distributed systems in the cloud and concerns like load balancing and disaster recovery
- Experience with the operational aspects of software systems such as monitoring, centralized logging, and alerting
- Bachelor of Computer Science or Computer Engineering
We thank all applicants for their interest. Only candidates selected for an interview will be considered.