Data Engineer

United States
Product Development – Product Development /
Contractor /
We build breakthrough software products that power digital businesses. We are an innovative product development partner whose solutions drive rapid revenue, market share, and customer growth for industry leaders in Software and SaaS, Media and Publishing, Information Services, and Retail.
Our key differentiator is our Product Mindset. Our development teams focus on building for outcomes and all of our team members around the globe are trained on the Product Mindset’s core values – Minimize Time to Value, Solve For Need, and Excel at Change. Our teams apply this mindset to build digital products that are customer-facing and revenue-generating. Our business-minded approach to agile development ensures that we align to client goals from the earliest conceptual stages through market launch and beyond.

Hybrid Role:

    • This is a hybrid role and the selected resource will be required to work onsite in the Bethlehem, PA office a minimum of three days per week. Local candidates only.

Job Responsibilities:

    • Architect, build, and maintain scalable and reliable data pipelines including robust data quality as part of data pipeline which can be consumed by analytics and BI layer.
    • Design, develop and implement low-latency, high-availability, and performant data applications and recommend & implement innovative engineering solutions.
    • Design, develop, test and debug code in Python, SQL, PySpark, bash scripting as per Client standards.
    • Design and implement data quality framework and apply it to critical data pipelines to make the data layer robust and trustworthy for downstream consumers.
    • Design and develop orchestration layer for data pipelines which are written in SQL, Python and PySpark.
    • Apply and provide guidance on software engineering techniques like design patterns, code refactoring, framework design, code reusability, code versioning, performance optimization, and continuous build and Integration (CI/CD) to make the data analytics team robust and efficient.
    • Performing all job functions consistent with Client policies and procedures, including those which govern handling PHI and PII.
    • Work closely with various IT and business teams to understand systems opportunities and constraints for maximally utilizing Client Enterprise Data Infrastructure.
    • Develop relationships with business team members by being proactive, displaying an increasing understanding of the business processes and by recommending innovative solutions.
    • Communicate project output in terms of customer value, business objectives, and product opportunity.

Required Qualification:

    • 5+ years of experience with Bachelors / master's degree in computer science, Engineering, Applied mathematics or related field.
    • Extensive hands-on development experience in Python, SQL and Bash.
    • Extensive Experience in performance optimization of data pipelines.
    • Extensive hands-on experience working with cloud data warehouse and data lake platforms like Databricks, Redshift or Snowflake.
    • Familiarity with building and deploying scalable data pipelines to develop and deploy Data Solutions using Python, SQL, PySpark.
    • Extensive experience in all stages of software development and expertise in applying software engineering best practices.
    • Extensive experience in developing end-to-end orchestration layer for data pipelines using frameworks like Apache Airflow, Prefect, Databricks Workflow.
    • Familiar with :
    • RESTful Webservices (REST APIs) to be able to integrate with other services.
    • API Gateways like APIGEE to secure webservice endpoints.
    • Data pipelines, Concurrency and parallelism.
    • Experience in creating and configuring continuous integration/continuous deployment using pipelines to build and deploy applications in various environments and use best practices for DevOps to migrate code to Production environment.
    • Ability to investigate and repair application defects regardless of component: front-end, business logic, middleware, or database to improve code quality, consistency, delays and identify any bottlenecks or gaps in the implementation.
    • Ability to write unit tests in python using unit test library like pytest.