Senior Data Engineer
San Francisco, CA
TEAM OVERVIEW The Data Science team at Clarify Health is at the core of powering our platform – Proposing, prototyping, and deploying product features powered by machine learning and statistical analysis. This is made possible by our investment in significant data assets, including longitudinal claims and clinical data for several million patient lives, paired with patient-generated data we collect through our platform.
The Senior Data Engineer will be responsible for helping to create and implement deep analytical solutions utilizing leading-edge technologies and operational excellence:
- Contributing to the design, development, and implementation of custom data models to solve real-world problems.
- Conducting machine learning on a growing database of longitudinal health information, with the goal of widening the breadth of applications while increasing the fidelity of our models in an automatable fashion
- Introducing new technologies and solutions to our data engineering processes that will facilitate machine learning at a higher scale
- Utilizing ETL tools to integrate build production and post-production data pipelines that move data from a variety of sources into a warehouse, monitor data quality, check for errors and conform data to standards
- Participating in data team workshops by interpreting business problems and sometimes complex statistical approaches into actions, balancing creativity with engineering practicality.
What we are looking for
- We are a small but rapidly growing team. As such, we are ideally looking for “all-around athletes” with strong leadership, analytics, and communication skills. We are also looking for new team members who will be a solid fit with our culture and have a strong passion for impact.
- The Senior Data Engineer will have:
- Demonstrable experience in applying machine learning and authoring production-level code in an industry setting, preferably with applications in the healthcare field (such as bioinformatics, biostatistics, epidemiology, economics, genomics, or public health)
- Self-sufficiency to query relational databases, research new features, and build resources necessary where they are not already existing
- Resourcefulness in a variety of machine learning packages in R or Python, with a keen knowledge of optimal selections with respect to predictive performance, computation time, and model interpretability
- Familiarity with generalized linear models and healthcare-related statistical methods such as hierarchical/mixed-effects modeling, regularization, survival analysis, and propensity score matching.
- Experience with integrating with a wide variety of data sources including web services, files, databases and web pages
- BS in computer science, information technology
- 5+ years direct data science experience
- PostgreSQL, Redshift
- R, Python, and their related configuration management tools
- ETL tools such as Airflow, Luigi, AWS Glue
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, on the basis of disability or any other federal, state or local protected class.