Machine Learning Engineer, Data Acquisition

Boston, MA

Engineering – Engineering /

Full-time /

Remote

Who we are

Zus is a shared health data platform designed to accelerate healthcare data interoperability by providing easy-to-use patient data via API, embedded components, and direct EHR integrations. Founded in 2021 by Jonathan Bush, co-founder and former CEO of athenahealth, Zus partners with HIEs and other data networks to aggregate patient clinical history and then translates that history into user-friendly information at the point of care. Zus's mission is to catalyze healthcare's greatest inventors by maximizing the value of patient insights - so that they can build up, not around.

As a Machine Learning Engineer within the Data Acquisition (DA) Team, you will play a critical role in bringing your ML expertise to Zus.

The Data Acquisition team is responsible for building and running the microservices based infrastructure which connects with external health data networks to collect information about our patients and load it into the Zus data stores at high volume, as well as supporting those services used by customers and internal stakeholders to request that data. You will be responsible for using your prior experience with large language models (LLMs) and MLOps to develop, deploy, and optimize solutions in collaboration with DA software engineering. You will work closely within this cross-functional team to design, implement, and scale machine learning solutions that address key business challenges.

In your role as a ML Engineer, you will be responsible for conducting research to explore new methodologies and techniques, and integrating them into our product offerings. You will develop prototypes to test and improve upon your innovations and develop feedback mechanisms to improve models with human oversight. You will work with software engineers to help deliver CI/CD pipelines, and automate workflows to ensure reliable and scalable model operations. You will be responsible for presenting your learnings and helping the team leverage these methods and techniques.

As part of our early team you will focus on:

Algorithm & Experimentation: Formulate hypotheses, build rapid prototypes, and run statistically powered offline/online experiments to validate impact.
Evaluation & Diagnostics: Design task-specific metrics, bootstrap confidence intervals, and perform slice-based error analysis and fairness checks.
Collaboration & Product Innovation: Work cross-functionally with engineers, product managers, and stakeholders to innovate on data collection, scaling, and normalization—delivering effective and efficient ML solutions.
Model & Tooling Selection: Select and adapt the right models—LLMs or classical—balancing latency, cost, and quality, and integrate them into our Data Acquisition stack.
Data Readiness: Partner with data engineers to ensure training/eval data stay clean, versioned, and bias-checked.
End-to-End Ownership: Take prototypes to production, collaborating with engineers to deploy, monitor, and continually improve models.

You're a good fit because you have:

3+ years building and shipping ML models, including hands-on experience with LLMs or classical NLP methods in production environments.
Experience partnering with software engineers to ship, monitor, and iterate on models in production.
Proficiency in Python (must-have); Java or Go a plus.
Strong understanding of machine learning frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn).
Solid grasp of classical ML algorithms—their assumptions, strengths, and failure modes (e.g., tree ensembles, logistic/linear models, nearest-neighbor indexes).
Hands-on experience designing offline or online experiments: crafting task-specific metrics, computing bootstrapped confidence intervals, and conducting slice-based error analysis.
Familiarity with cloud services (e.g., AWS, GCP, Azure) and distributed computing.
Excellent analytical and problem-solving skills with a keen attention to detail.
Demonstrated curiosity—comfortable jumping into unfamiliar domains, papers, or codebases and learning fast.
Strong verbal/written skills—able to explain complex ML concepts to diverse stakeholders.
Demonstrated ability to work effectively in a collaborative team environment.

It’s a bonus if you have:

Experience directly deploying or monitoring models in production environments
Healthcare domain experience (claims, EHR, clinical NLP)
Publications, OSS contributions, or peer-reviewed talks on ML/AI
Experience developing and even designing software for distributed data pipelines
Bachelor's degree in Computer Science, or Statistical Science preferred, advanced degrees are a plus.

$150,000 - $190,000 a year

We will offer you…

• Competitive compensation that reflects the value you bring to the team a combination of cash and equity

• Robust benefits that include health insurance, wellness benefits, 401k with a match, unlimited PTO

• Opportunity to work alongside a passionate team that is determined to help change the world (and have fun doing it)

Please Note: Research shows that candidates from underrepresented backgrounds often don’t apply unless they meet 100% of the job criteria. While we have worked to consolidate the minimum qualifications for each role, we aren’t looking for someone who checks each box on a page; we’re looking for active learners and people who care about disrupting the current healthcare system with their unique experiences.

We do not conduct interviews by text nor will we send you a job offer unless you've interviewed with multiple people, including the Director of People & Talent, over video interviews. Job scams do exist so please be careful with your personal information.

Apply for this job