Data Engineer

London
Business Analyst
Full-time
Predict X is on a mission to give every business access to better insights and to facilitate data-driven decisions. Companies are increasingly inundated with data yet struggle to harness the benefits. We help our clients seamlessly integrate their data sources into one platform. Our technology reduces data complexity, increases data quality and allows business users across the organisation access to advanced data capability to make faster and more reliable business decisions. Headquartered in London, with offices in London, Spain and USA.

As a Data Engineer you will be working on our data pipeline architecture, with the aim of providing clean, usable data to our Business Analysts and Data Scientists. We are looking for forward thinking people who are keen to have an impact in shaping the role of a data engineer within PredictX. You will be instrumental in developing the scope of value that the team delivers to our business and customers. You will be responsible for helping to define and build modular pipeline components. You will ensure that the technical documentation is created and maintained, as well as training our Client Analytics and Implementation teams so that they can implement and support our clients using our products.

Key Responsibilities

    • Help design, build, maintain and operate the data pipeline
    • Defining and building modular data pipeline components
    • Ensuring that solid development practices, such as proper use of source control, full testing processes and automated deployment mechanisms, are followed
    • Collaborate with our data scientists and business analysts to discover where business value can be found within the data we have available
    • Maintaining existing systems and supporting migration to our new data pipeline architecture
    • Training our internal teams in how to implement and support our clients
    • Come up with ideas for and help integrate new datasets
    • Acting as a subject matter expert on all aspects of the data pipeline
    • Identifying potential performance issues, bottlenecks and pain points and recommend new and creative ways of resolving them

Who you are

    • We want you to come with creativity, expertise, flexibility and drive, but above all a desire to learn and keep learning
    • We want you to want to understand the big picture and how your work makes a difference

Experience

    • 3+ years of proven experience using Python to build data pipelines, including familiarity with python's core big data / data science libraries: e.g. pandas, pyspark, scikit-learn etcSolid understanding of database design and SQL
    • Experience working in cross functional agile teams, particularly teams including Data Scientists, Software Engineers and Business Analysts
    • The ability to communicate complicated technical solutions to non technical users
    • Take ownership of feature development and ongoing maintenance
    • Technical understanding of infrastructure components, their dependencies, and interactions between servers, virtual systems, networks, databases, web applications, etc

Skills

    • Distributed data processing, for example Spark
    • NoSQL Databases, such as MongoDB or Couchbase
    • Cloud computing platforms, such as Google Cloud Platform or AWS
    • Pipeline orchestration, for example Airflow
    • Technical understanding of infrastructure components, their dependencies, and interactions between servers, virtual systems, networks, databases, web applications, etc

Benefits

    • Competitive salary
    • Paid training days
    • Fruit, snacks, tea and coffee provided throughout the week
    • Monthly Pizza Friday
    • Quarterly company event
    • Subsidised Gym membership
    • Subsidised health scans
    • Football and Volley league teams