Senior Data Engineer

2017 was a great year for DueDil. We launched our new API, Pan-European data coverage, and have continued to grow our customer base; building exciting new relationships and cementing our long-term partnerships with some of the UK's most well known logos.
Throughout 2018, we will continue to strengthen the core capabilities integral to our mission; to establish DueDil as the world's largest source of private company information.

The Role:
Critical to achieving DueDil's vision is our ability to combine multiple disparate data sources from different providers into a unified view of companies and the people who run them. This requires us to develop web crawlers, automated matching algorithms, machine learning models and complex ETL processes to tie all these components together. As a Senior Data Engineer, you'll be expected to enhance and expand our data processing toolset to support our international expansion effort, while maintaining quality and reliability of our existing data products and services.
This will mean dealing with challenges such as order of magnitude increases in data volumes, assessing quality of data from multiple suppliers and building pipelines to match and extract valuable insight from these datasets. You will be working in a team of experienced Data Engineers and Data Scientists building next generation tools and transforming the Fintech industry

We are looking for:

    • Proven track record leading complex ETL and Data Infrastructure projects
    • Demonstrable ability working with high volume heterogeneous data with distributed systems such as Hadoop or Spark
    • Expert knowledge in one or more of the following languages – Python, Scala, Java
    • Strong understanding of data structures and algorithms
    • Deep knowledge of data modeling, data access, and data storage techniques
    • Familiarity with Unix systems, common command line tools e.g. grep, awk and source control tools e.g. git
    • Familiarity with Machine Learning and Statistics a plus

You should apply, if you:

    • Want to develop data-focussed products with visible and immediate impact
    • Are passionate about simple, resilient and maintainable code
    • Are able to identify and evaluate technical solutions to nebulous challenges

Our Tech Stack

    • Python
    • Scala
    • Spark
    • Hadoop
    • Elasticsearch
    • PostgresQL
    • TinkerPop
    • Redis
    • Kafka

How to apply:

    • Please send us your CV
    • a link to your GitHub profile (if applicable)