Data Engineer

India Remote
Engineering – Data Engineering /
Full time /
At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this our teams harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into action that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle.  Visit to learn more about us.

Data Engineering has teams that are responsible for collecting, curating, normalizing and matching data from hundreds of disparate sources from around the globe. Data sources include scientific publications, clinical trials, conference presentations and claims among others. In addition to developing the necessary data pipelines to keep every piece of information updated in real-time and provide the users with relevant insights, the teams are also building automated, scalable and low-latency systems for the recognition and linking of various types of entities, such as linking researchers and physicians to their scholarly research and clinical trials. As we rapidly expand the markets we serve and the breadth and depth of data we want to collect for our customers, the team must grow and scale to meet that demand.

As a Software Engineer on the Data Engineering team, you will be key in analyzing vast amounts of data and providing user support within an AWS cloud environment. You’ll be responsible for writing production grade pipelines using big data technologies and data wrangling to support our internal product. You’ll manage projects across all stages including application deployment to deliver the best scalable, stable, and high-quality healthcare data application in the market.

You will:
- Be responsible for product features related to data transformations, enrichment, and analytics.
- Work closely with internal stakeholders, gathering requirements, delivering solutions, while effectively communicating progress and tracking tasks to meet project timelines. 
- Act as a subject matter expert for Real World Evidence (RWE) data (claims, publications, payments), and represent the data commercially with customers, in collaborations with the product team, and in presentations to the ELC
- You’ll work within end-to-end delivery of data to produce and shape the direction of RWE data at H1
- Help steer the technical strategy and architecture, ensuring the smooth development, deployment, and scalability of applications across the entire technology stack.
- Collaborate closely with our Insights/AI team to build knowledge into our data and AI/ML platforms
- Work cross-functionally across the engineering, data, and product organizations to support your team in delivering the best healthcare data application in the market

You possess robust hands-on technical expertise encompassing both conventional and non-conventional ETL methodologies, alongside proficiency in T-SQL and Spark-SQL. Your skill set includes mastery of multiple programming languages such as Python (PySpark), Java, or Scala, as well as adeptness in streaming and other advanced data processing techniques. As a self-starter, you excel in managing projects across all stages, from requirement gathering and design to coding, testing, implementation, and ongoing support. Your proactive approach and diverse skill set make you an invaluable asset in driving innovation and delivering impactful solutions within our dynamic data engineering team.

- 3+ years of experience working with strong big data engineering teams and deploying products on AWS
- Strong coding skills in Python (PySpark), Java, Scala or any proficient language of choice and stacks supporting large scale data processing 
- Experience with Docker, Kubernetes or Terraform.
- Experience with databases like PostgreSQL
- Software management tools such as Git, JIRA, and CircleCI
- Strong grasp of computer science fundamentals: data structures, algorithmic trade-offs, etc.
-  Experience with data processing technologies like Spark Streaming, Kafka Streaming, K-SQL , Spark SQL, or Map/Reduce
- Understanding on various distributed file formats such as Apache AVRO, Apache Parquet and common methods in data transformation
- Experience in performing root cause analysis on internal and external data and processes to answer specific business questions and find opportunities for improvement
- Should be willing to manage projects through all the stages (requirements, design, coding, testing, implementation, and support).
- Ability to write clean, modular data processing code that is easy to maintain.

Not meeting all the requirements but still feel like you’d be a great fit? Tell us how you can contribute to our team in a cover letter! 

- Full suite of health insurance options, in addition to generous paid time off
- Pre-planned company-wide wellness holidays
- Retirement options
- Health & charitable donation stipends
- Impactful Business Resource Groups
- Flexible work hours & the opportunity to work from anywhere
- The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe