Monitoring Solutions Engineer

London
Technology – Tech Ops
Full Time

At Zopa, we’re shaping the future of finance.

We offer simple loans and smart investments that help people take control of their finances and do more with their money. In the 12 years we’ve been in business, we’ve helped more than 60,000 people lend over £2 billion to 246,000 UK consumers.

And our journey’s only just beginning. In November 2016 we announced our plans to build a next generation bank so that we can bring a greater range of smart, ethical finance products to even more people.

Role Overview

As a Monitoring Solutions Engineer at Zopa you will be responsible for the definition, configuration and support of the monitoring tools and infrastructure, providing insight and event management to the Zopa estate. You will have a passion for real-time event management in a finance and high-volume transaction based environment. You are passionate about end to end monitoring and service based alerting and deviation management, able to bridge the gap between infrastructure and application monitoring into ‘service-views’, one pane of glass for business performance is the goal! Through APIs and open-standards, you will ensure that the monitoring tools will integrate with others in use, to provide insight and intelligence on Zopa’s customer facing Products.
 
You will work in cooperation with Software Engineering, Infrastructure teams and Business Operational staff to deliver Dashboards, alerting and recovery automation, to minimise operational downtime and maximise customer experience. 

Key responsibilities

    • Definition, implementation and support of Zopa Systems Management Monitoring solution based on commercial and ad hoc software tools enabling robust event monitoring across all strategic platforms and services, databases, networks, servers and others as well as Service Level Agreement and Business Impact monitoring
    • Ensure automatic enrolment into monitoring stack for new server platforms
    • Participate in the provisioning phase ensuring that the implementations of new systems and services are taking into account the monitoring aspects and providing to the operations the necessary documentation and procedures in order to ensure the monitoring of the service in operation
    • Maintain health of the monitoring environment, keeping current with upgrades and patches and by troubleshooting and resolving issues with the associated tools
    • Write code to execute monitoring tasks that are out of the scope and capabilities of the monitoring systems
    • Provide intuitive and clear Dashboards to internal teams, showing service status etc
    • Automate the recovery of ‘dead’ monitoring probes
    • Participate in and support troubleshooting and analysis processes
    • Understand user requirements and propose and implement system enhancements as required to meet needs and ensure Zopa monitoring maintains industry best practice
    • Provide advice and best practice guidance to users of the APM tool and monitoring solutions to help them meet their day to day tasks and objectives with the tool
    • Manage health of APM tool to ensure that the production environment remains stable and up to date with product releases from the vendor
    • Plan, develop, and test future releases of the APM tool and associated components
    • Liaise with other teams within the Infrastructure Services organization to ensure the APM tool is integrated effectively into other IT services provided by the group
    • Be an Evangelist for the standardised and cross-business use of approved tools and practices
    • 09:00 to 17:30 weekdays, plus additional hours as required in a 24/7 environment

Requirements

    • APM expert - AppDynamics desired
    • Logstash expert - Splunk / ELK desired
    • Automated Call-out experience - PagerDuty / XMatters
    • ServiceNow experience required
    • Automation Tools, Chef, Jenkins, Puppet, Ansible
    • Front End development and UI experience, Dashboard creation
    • Good knowledge of AI and ML
    • Good understanding of Agile and Software engineering principles
    • Excellent succinct but detailed communication skills, written and verbal
    • Keeps a cool head under pressure
    • Excellent interpersonal, influencing skills, interacting appropriately with technical and business resources, driven but courteous
    • Understanding of Enterprise Architecture, on-prem and cloud IT environments
    • Strong analytical/fault finding/diagnostics/trouble-shooting skills
    • Methodical approach to problem solving and attention to detail
    • Flexible and ‘can-do’ attitude
    • Effective time management skills; with the ability to work on multiple tasks simultaneously, prioritizing tasks, shifting priorities, fluctuating workloads, deadline pressures
    • Degree level education or equivalent experience
    • ITIL experienced
    • Strong technical competencies resulting from previous working experience at expert level within an IT operational or support environment.
    • Experience of monitoring Amazon Web Services
    • Cloud native, containers, Kubernetes knowledge and experience desired

We are committed to equality of opportunity for all staff and applications from individuals are encouraged regardless of age, disability, sex, gender, sexual orientation, pregnancy and maternity, race, religion or belief and marriage and civil partnerships.