Graph ML Engineer

San Francisco, CA /
Engineering Team /
GNN Engineering
Deep Discovery is hiring a Machine Learning (ML) Engineer with graph neural network experience to build a GNN representation of the business graph incorporating text and structured representations. We look at relationships between companies on the open web and extract structured information from news and other text to build networks that drive the models that generate risk scores for the customers of banks, which they use when conducting background checks. This is called a Know Your Customer (KYC) system for Anti Money Laundering (AML) and banks use these systems to evaluate the risk of doing business with their clients so they don’t face stiff fines from regulatory agencies.

Adaptations of this KYC system will also be developed into other products for government regulatory agencies, investor due diligence, supply chain risk assessment, and most importantly in support of our social mission, leading investigative journalists and anti-corruption NGOs around the world.

We are taking a network-centric approach to KYC that evaluates clients in terms of the context in which they do business this involves several machine learning tasks: extracting knowledge graphs from news and other text, entity and identity resolution of the networks we collect about the economy, representation learning on the resulting graphs and their associated documents, building a scoring engine that uses our business graph to create an accurate risk score. Users do not believe predictions without explanations and the cost of errors is high, so the final machine learning component is the most critical: the system must be explainable in terms of the graphs from which we draw conclusions, and we use a graph database and network visualizations to explain our risk scores.

We’re looking for a self-motivated ML Engineer who is passionate about Graph Neural Networks that can learn the domain quickly and who can mine the literature and customize algorithms and systems to come up with novel solutions to the problems we face in delivering a product. While being published is good, the most important thing we want in a candidate is a track record of shipping products to real customers. We have data engineers but expect you to be fairly self-supporting in carrying out your work, so generalist skills are important. Candidates without advanced degrees are welcome, experience is education.

The ideal candidate will have:

    • Early-stage startup experienceA track record of shipping data-driven products to market
    • Solid Python 3 skills, including object-oriented analysis and designEntity (ER) and identity resolution (IR) skills
    • Natural Language Processing (NLP) skills
    • Working knowledge of various neural network architectures
    • Working knowledge of Graph Neural Networks (GNNs)
    • A track record of implementing self-supervised learning
    • Working knowledge of learning with limited labels via weakly supervised learning: semi-supervised learning, weak supervision, distant supervision, active learning
    • Schema matching or alignment experience
    • Graph database experienceUnderstanding of traditional Social Network Analysis
    • Published papers in CS, or other quantitative fields