Zopa Site Reliability Engineer (Telemetry and Application Performance Monitoring)

London
Technology – Tech Ops
Full Time
At Zopa, we’re shaping the future of finance.

We offer simple loans and smart investments that help people take control of their finances and do more with their money. In the 12 years we’ve been in business, we’ve helped more than 60,000 people lend over £3 billion to 246,000 UK consumers.

And our journey’s only just beginning. In November 2016 we announced our plans to build a next generation bank so that we can bring a greater range of smart, ethical finance products to even more people.

The Role:
As a Zopa Reliability Engineer you will help to remove inefficiencies in the way Zopa use (or don’t use) technology to drive value for the business. Automating consistently, helping to design and support the various technology infrastructure to help Zopa grow. Both in our self-hosted datacentres and the cloud.
 
You will have a passion for real-time event management in a finance and high-volume transaction based environment. You are passionate about end to end monitoring and service based alerting and deviation management, able to bridge the gap between infrastructure and application monitoring into ‘service-views’, one pane of glass for business performance is the goal! Through APIs and open-standards, you will ensure that the monitoring tools will integrate with others in use, to provide insight and intelligence on Zopa’s customer facing Products.
 
You will work in cooperation with Software Engineering, Reliability Engineering teams and Business Operational staff to deliver Dashboards, alerting and recovery automation, to minimise operational downtime and maximise MTTR and customer experience.
 
Day to day, you might be automating telemetry and application monitoring infrastructure to our multi cloud environment with terraform, improving our application release processes and pipelines, upgrading our container platforms to the latest version of Kubernetes and utilising the latest features, and designing processes and tools that help our developers to take more control of their products and the infrastructure they need.

Job Requirements:

    • Provide advice and best practice guidance to users of the APM framework and monitoring solutions.
    • Participate in the provisioning phase ensuring that the implementations of new systems and services are taking into account the monitoring aspects
    • Plan, develop, and test future releases of the APM framework (AppDynamics) and associated components
    • Understand the conceptual APM framework on all his dimensions (User experience, Run Time architecture, Business Transactions, Analytics and Reporting)
    • Knowledge of at least one programming language and the willingness to dabble in others (Go, Python, Java)
    • Cloud agnostic mentality. Exposure to cloud IaaS (AWS, GCP or other relevant)
    • Discover and automate legacy telemetry probes
    • Linux administration (CoreOS, Ubuntu)
    • Experience with Immutable infrastructure
    • Linux containers and orchestration (Docker, Kubernetes, Nomad)
    • Good Knowledge on Hashicorp stack
    • Knowledge of event streaming frameworks and technology
We are committed to equality of opportunity for all staff and applications from individuals are encouraged regardless of age, disability, sex, gender, sexual orientation, pregnancy and maternity, race, religion or belief and marriage and civil partnerships.