Platform Operations Engineer
Technology – Infrastructure
Trainline is an innovative, tech business with a mission to make travel as simple, seamless and affordable as possible. We’re proud to be Europe’s leading independent train and coach platform and rank among the highest-rated travel and ticketing apps globally. Today, we offer our customers travel to thousands of destinations in and across 45 countries in Europe and beyond. That’s more than £2.7 billion in ticket sales annually, and over 80 million visits to our apps and websites each month.
Our culture is central to our success. We’re driven to sustain our phenomenal growth from recent years, and this means we’re always working closely and collaboratively to turn our ideas into reality. It’s this sense of pace, innovating and improving pretty much everything we do, that makes Trainline so exciting and unique - we truly believe our work has a genuine impact and will change travel for the better.
The Platform Operations team are responsible for the overall Availability, Performance and Reliability of the entire trainline platform..at peak times over 185 people per minute are booking Trains! We are a growing company that loves new technology. We run a diverse platform that is 100% hosted on AWS utilising the best of what it has to offer, coupled with our own tooling this allows us to embrace Continuous Delivery, DevOps and Cloud environments to their full potential. You will often find members of our leadership team as well as our development community speaking at meetups and conferences.
What you'll be working on...
- You will be heavily involved in Major incidents relating to Production, External Test and Staging environments. This is right from initial event, participating in the rapid response to service restoration and identifying follow up preventative measures.
- You will be part of the team that has ownership of all monitoring tools, ensuring that they keep up with the rate of change that we have, ensuring that everything from BAU alerting using this tooling to report on and improve upon SLA\OLA’s by holding teams accountable for their service quality.
- You will take ownership for and provide priority support to Retailing and Fulfilment systems, ranging critical incidents to proactively working on preventative measures by learning and questioning the status of the platform, taking ownership to ensure that issues are not forgotten after they are resolved
- You will work with DevOps Engineers in product aligned teams to ensure applications are understood and that Continuous Delivery activities are carried out in a safe and timely manner – there is trust, but we still need eyes on the prize…
- You will use your own experience and learning to provide a fresh approach to troubleshooting and processes, we want you to think outside the box coming up with innovative and unique solutions, pushing the bar higher each time
- You will participate in an On-Call schedule to ensure that our systems are supported at all times, you will have the freedom to suggest and push for engineering solutions to failures, taking pride in every call out that is solved by automation rather than human intervention
- You will have a professional approach to these interactions that builds confidence you the abilities of you and your team
What you'll bring..
- Proven experience of being part of a team that managed operational environments on the hook for availability, reliability and performance
- Experience in being part of a support team in a high pressure, fast moving environment alongside Incident, Change and Service Desk management
- Ability to see and act upon potential issues, whether they are technical changes, processes or procedures
- A solid background in technology operations, with demonstrable ability in a range of technologies
- Very high energy and enthusiasm, with a passion for delivering awesome service
- Excellent interpersonal, relationship building and influencing skills
- Highly customer focused Analytical approach to decision making and problem resolution with experience of juggling multiple tasks and priorities
- Enterprise Technology: Experience with highly available, high transactional websites and applications within micro services architecture, clustered systems, N+1 architecture, automated deployments, disaster recovery and business continuity
- Operating Systems: Linux, Microsoft Windows Server (Including Active Directory, DNS, DHCP, IIS)
- AWS: EC2, S3, Lambda, VPC, CloudWatch, Terraform
- Automation and Scripting: Team City, Puppet, Consul, Powershell, Selenium, GitHub
- Monitoring: HP BSM, SCOM, New Relic Insights/APM, InfluxDB, Elastic Search, Kibana, Sensu, Grafana
- Production experience with frontend web services including Apache, IIS and NGINX
- Experience with e-commerce and website operations (WebOps)
- Experience with monitoring & web analytics tools
- Understanding of Networking, TCP\IP, Firewalls, NAT Instances, NGINX load balancer and traffic management
- Firm grasp on security and its importance within a cloud environment (PCI-DSS/SecOps)
- Understanding of database technologies such as Oracle, MS SQL, DynamoDB
- Understanding of DevOps and Agile methodologies
We value open expression at Trainline, we believe it’s the diversity of experience, backgrounds and perspectives of our employees that makes us who we are. We encourage everybody to play a part in changing the way people travel across the world.