Clinical Data Scientist

San Mateo, CA
About Roam

The modern healthcare system generates enormous quantities of diverse, disconnected data. These data sets present substantial analytic challenges, but can also illuminate new avenues of inquiry that yield unprecedented improvements in global health. Roam is realizing this potential by combining our proprietary data platform with advanced machine learning, empowering life sciences companies, hospital systems, insurers, and governments to make data-driven decisions that improve patient outcomes and guide innovation.

Roam Health Knowledge Graph is the foundation of Roam's data platform and is central to all our applications. This pre-built data ontology brings the world's vast healthcare information together using a patent-pending graph architecture that structures the data while embracing the uncertainty inherent in health datasets.

Our clients generate insight from this data platform through an application suite we’ve engineered to facilitate efficient, iterative analysis of patient-level data at scale and with unprecedented depth. Analysis performed within the Roam ecosystem bypasses the inefficient data integration processes currently required to modify or address a new research question. Roam's technologies have been used to improve drug development, bring new drugs to market, demonstrate value to payors, and compute real world outcomes. These sample use cases, though distinct, all bring Roam closer to achieving our mission: leverage artificial intelligence to bring about sustainable and affordable improvements in patient health.

About Role

The clinical data scientist role focuses on leveraging Roam’s data and machine learning assets to create analyses of patient pathways through disease and treatment progression for clients and internal development efforts. Our ideal candidate will have deep familiarity with data intensive analytics in healthcare or commercial life sciences and understands the limitations and challenges associated with clinical data, as well as how analysis of clinical data might shape strategy within life sciences organizations. Candidates will work at the intersection of statistical and medical expertise, and should possess strong communication skills and creative approaches to visualizing findings are critical. Ideal candidates will have a portfolio of projects or research to demonstrate their capabilities.


    • Work with life sciences clients, data partners, and academic medical centers to produce high-quality clinical data analysis consistent with relevant research.
    • Define use cases within our platform and identify data narratives that support them.
    • Serve as the subject matter expert to convey clinical and client considerations to our broader analytics team.
    • Explore patient journeys through tabular and graphical presentations of data.
    • Contribute to the development of data-supported intervention strategies which will improve healthcare processes and patient outcomes.
    • Propose and prototype to new platform capabilities.
    • Evaluate models in the context of real-world impact and applicability.
    • Develop creative approaches to best leverage incomplete or noisy health data to address high value analysis questions.
    • Bridge the gap between healthcare and machine learning experts for collaborative problem solving.


    • Bachelors (or higher) degree with relevant coursework in health informatics, biostatistics, epidemiology or similar.
    • 2+ years experience of working with healthcare data (e.g. EHR, clinical text, clinical trials, insurance claims, pharmacy, patient registries).
    • Ability to independently explore patient-level data using Python, R, SAS, or Matlab.
    • Experience in communicating insights and conclusions from large, complicated healthcare datasets, using a variety of statistical and analytical approaches.
    • Understanding of common healthcare concepts, including disease prevalence, risk factors, co-morbidities, medications, outcome measures, and how these are represented in (or created from) data.
    • Ability to understand scientific literature and experimental procedures, as well as the limitations and applications of this information in a clinical setting.

Beneficial Experience

    • MD, MS or PhD in a scientific field.
    • Developed statistical models using large, longitudinal healthcare datasets (such as insurance claims, EMRs, or patient registries).
    • Analyzed unstructured clinical data, e.g. clinical notes.
    • Python programming experience, with an emphasis on scientific computing libraries (e.g. Pandas, Scikit-Learn, Numpy, Scipy, Matplotlib, Seaborn).
    • A strong history of publishing academic research.