Data Scientist

San Francisco, CA /
Engineering Team /
Data Science
Deep Discovery is hiring a Data Scientist with Natural Language Processing (NLP), network science, data analysis and web development experience to help build an artificial intelligence system to fight global corruption.

We are building an enormous business graph spanning the globe that incorporates text and structured data to look at relationships between people, companies and organizations using open sources of information like news on the web. This business graph drives our network-centric risk scoring models for the customers of global financial institutions. This is called a Know Your Customer (KYC) system for Anti Money Laundering (AML) and banks use these systems to evaluate the risk of doing business with their clients so they don’t face stiff fines from regulatory agencies.

Adaptations of this KYC system will also be developed into other products for government regulatory agencies, investor due diligence, supply chain risk assessment, and most importantly in support of our social mission, leading investigative journalists and anti-corruption NGOs around the world.

We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business this involves several machine learning tasks: extracting knowledge graphs from news and other text, entity and identity resolution of the networks we collect about the economy, representation learning on the resulting graphs and their associated documents, building a scoring engine that uses our business graph to create an accurate risk score. Users do not believe predictions without explanations and the cost of errors is high, so the final machine learning component is the most critical: the system must be explainable in terms of the graphs from which we draw conclusions, and we use a graph database and network visualizations to explain our risk scores.

We’re looking for a self-motivated data scientist to join our team to be the glue that joins the rest of the team together: two ML Engineers, a Data Engineer and a Visualization Engineer. You will work on entity and identity resolution problems. You will do data analysis of structured, semi-structured, unstructured and graph datasets. You will build web applications and dashboards with our Visualization Engineer. You will profile datasets and drive the analysis that informs other engineering. You will perform light data engineering working with our Data Engineer. You will sit in the middle of the team and route information to where it needs to go. You will write the specifications for the interfaces for what we build.

While being published is good, the most important thing we want in a candidate is a track record of shipping products to real customers. We have data engineers but expect you to be fairly self-supporting in carrying out your work, so generalist skills are important. Candidates without advanced degrees are welcome, experience is education.

The ideal candidate will have:

    • Early-stage startup experience
    • A track record of shipping data-driven products to market
    • Solid Python 3 skills, including object-oriented analysis and design
    • Entity (ER) and identity resolution (IR) skills
    • Natural Language Processing (NLP) skills
    • Working knowledge of various neural network architectures
    • Working knowledge of graph embeddings
    • Snorkel or related weak supervision experience
    • Schema matching or alignment experienceGraph database experience
    • Understanding of traditional Social Network Analysis
    • Network visualization experience