Senior Data Scientist (Biomedical NLP)

San Francisco /
Artificial Intelligence and Data Science /
Sirona’s mission is to make the highest quality healthcare available to everyone. We’re building a diagnostic engine that learns from and empowers the world’s most accurate radiologists–but enabling diagnosticians is just the beginning. Through our end-to-end workflow, we’re working to improve the quality and efficiency of the entire episode of care, simplifying the process of medicine and allowing doctors to focus on patients. We are interdisciplinary thinkers who are passionate about technology and medicine, and believe in the power of technology to protect and preserve human life. We’re in stealth mode and we’re hiring.

Data Scientists at Sirona are responsible for the technological foundations of our product. Senior NLP data scientists define and implement strategies to handle free-form medical text, including developing internal NLP packages and/or heavily customizing open-source solutions.  They contribute to the development of ontologies and associated biomedical tooling, and support various other data science initiatives.


    • Advanced degree in computational linguistics, informatics, or related field 
    • Experience building and using ontologies and knowledge graphs to support natural language processing, data integration and analytics
    • Deep understanding of core NLP/NLU algorithms, tasks, model architectures, and open source resources: CRFs, Neural Networks (LSTMs, dense vector embeddings), rules-based/regex approaches. Named Entity Recognition and Disambiguation, POS tagging, syntactic dependency parsing. spaCy, gensim, NLTK, StanfordNLP, annotation tools (e.g. Brat, Daccano)
    • Track record building and productionizing NLP models in an industry setting
    • Expert coding ability

Preferred Requirements

    • PhD in relevant field
    • Published academic work in core NLP disciplines and/or open-source NLP software contributions
    • Experience working with medical and/or radiological text data
    • Proficiency with RDF, OWL, SPARQL or similar technologies
    • Familiarity with key biomedical informatics resources (e.g. UMLS, SnoMed, BioPortal, Radlex)
    • Experience leading teams and mentoring junior data scientists


    • Design and implement NLP pipelines to extract structured content from free-form radiology text
    • Help define and build core ontologies
    • Define NLP methodologies, including information models and ontologies
    • Create internal tooling to facilitate NLP and other data science pipelines
    • Develop or heavily customize open source solutions for Named Entity Recognition, POS tagging, and syntactic dependency parsing that will work on domain-specific medical data
    • Develop methods for structured descriptions of diagnostic model outputs
    • Develop tools for manual and automated label annotation of free-text radiology reports
    • Work with annotation team to develop labeled training and test datasets
    • Build internal tools and define associated statistics to provide insight into algorithm performance
    • Work with product development team on integration of NLU and text generation methods


    • Stock
    • Competitive salaries
    • Paid time off
    • Medical insurance
    • 401(k)
    • Apple equipment
    • Dedicated research computing resources
    • Catered lunches and team events
    • Sponsorship for conferences