Data Scientist

San Francisco, CA /
Engineering Team /
Data Science
Deep Discovery is hiring a Data Scientist with domain experience in Anti Money Laundering (AML) or Know Your Customer (KYC) to help us build and understand a 1.5 billion node business graph of legitimate and illegitimate business. 

This graph spans the globe and incorporates text and structured data to look at relationships between people, companies and organizations using open sources of information like news on the web. This business graph drives our network-centric risk scoring models for the customers of global financial institutions. This is called a Know Your Customer (KYC) system for Anti Money Laundering (AML) and banks use these systems to evaluate the risk of doing business with their clients so they don’t face stiff fines from regulatory agencies. We are giving away free access to journalists as part of our social mission to enable leading investigative journalists and anti-corruption NGOs around the world.

We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business this involves several machine learning tasks: extracting knowledge graphs from news and other text, entity and identity resolution of the networks we collect about the economy, representation learning on the resulting graphs and their associated documents, building a scoring engine that uses our business graph to create an accurate risk score. Users do not believe predictions without explanations and the cost of errors is high, so the final machine learning component is the most critical: the system must be explainable in terms of the graphs from which we draw conclusions, and we use a graph database and network visualizations to explain our risk scores.

We’re looking for a self-motivated data scientist to join our team to be the glue that joins the rest of the team together: four ML Engineers, a Data Engineer and a Visualization Engineer. You will work on queries to understand identity resolution problems, turning these into rules for weakly supervised programmatic data labeling via a process known as weak supervision. You will do data analysis of structured, semi-structured, unstructured and graph datasets. You will build APIs for web applications and dashboards with our Visualization Engineer. You will profile datasets and drive the analysis that informs other engineering. You will perform light data engineering working with our Data Engineer. You will sit in the middle of the team and route information to where it needs to go. You will write the specifications for the interfaces for what we build.

While being published is good, the most important thing we want in a candidate is a track record of shipping products to real customers. We have data engineers but expect you to be fairly self-supporting in carrying out your work, so generalist skills are important. Candidates without advanced degrees are welcome, experience is education.

The ideal candidate will have:

    • Early-stage startup experience
    • A track record of shipping data-driven products to market
    • Solid Python 3 skills, including object-oriented analysis and design
    • Applied Anti Money Laundering (AML) or Know Your Customer (KYC) experience
    • Experience building machine learning solutions from end-to-end
    • Entity (ER) and identity resolution (IR) experience
    • Natural Language Processing (NLP) skills
    • Working knowledge of neural networks and embeddings
    • Plus: Graph database experience
    • Plus: knowledge of Social Network Analysis (SNA)
    • Plus: Network visualization experience
    • Strong plus: Snorkel or related weak supervision experience