Applied ML Researcher

San Mateo, CA
Machine Learning /
Full Time /
On-site
About Us

We are on a mission to bridge the gap between enterprise business knowledge and data, democratizing data discovery and curation to prepare organizations for the era of generative AI. Today's data tools are overly complex, poorly integrated, and siloed, forcing AI Practitioners and data scientists alike to spend more time wrestling with tools, relying on tribal knowledge, and navigating data lakes rather than doing meaningful data science work. The current landscape of data tools and processes is heavily manual and needs to catch up with the vast amount of data generated daily. With the advent of Gen AI and multi-modality, this challenge has only grown more complex and broken.

Backed by top VC funds, we are committed to making enterprise data AI-ready faster, more reliably, and with a stronger foundation of factual semantic knowledge. This leads to more accurate models, superior outcomes, and better business results. Our team of seasoned data infrastructure and machine learning experts (from LinkedIn, Visa, Truera, Hive, and Branch) has spent the past two decades building bespoke systems to solve these very challenges.

Join our growing team of ML research and data infrastructure experts. We're committed to empowering AI and data scientists to seamlessly integrate semantic learning with generative AI. Be part of our journey to shape the future of enterprise AI.

About the Role

You’ll join our applied ML research team focused on turning raw enterprise data into structured, contextualized knowledge graphs and embeddings. You’ll experiment with new approaches for distilling large models into smaller, more efficient ones; improve retrieval, ranking, and reasoning performance through feedback loops; and prototype methods that help LLMs extract and act on real-world knowledge.

We're looking for someone who thrives on iteration, cares about building with rigor, and is hungry to learn from some of the best engineers and researchers in the field.

What You’ll Be Doing

    • Prototype and refine models for extracting structured knowledge from text
    • Apply knowledge distillation techniques to compress and optimize LLMs for downstream tasks
    • Explore the use of reinforcement learning and feedback loops for improving model behavior
    • Build evaluation pipelines for entity linking, retrieval, and semantic consistency
    • Read, implement, and build upon recent research in LLM alignment, distillation, and symbolic grounding
    • Collaborate closely with infra and data engineers to scale your research into production-ready components

Prior Experience

    • 2–4 years experience (research lab, internship, academic project, or early industry role) working in ML or NLP
    • Exposure to knowledge distillation, RLHF, or curriculum learning techniques
    • Strong Python skills and familiarity with ML frameworks like PyTorch or TensorFlow
    • Experience with language models and transformers (e.g., BERT, LLaMA, or similar)
    • Solid understanding of ML fundamentals: training pipelines, loss functions, evaluation metrics
    • A collaborative mindset and willingness to work across research and engineering teams

Nice to Have

    • Familiarity with reinforcement learning, including policy optimization or reward modeling
    • Experience with semantic representations such as knowledge graphs or entity embeddings
    • Comfort working with tools like HuggingFace Transformers, Ray, or vLLM
    • Understanding of small-model techniques (pruning, quantization, adapter layers)
    • Interest in the LLM ecosystem and techniques for model alignment or prompt tuning
    • Prior contributions to open-source projects or academic publications in ML/NLP