Principal Software Engineer - (R-13895)

Hyderabad - India /
Technology /
Employee: Full Time
/ On-site
Why We Work at Dun & Bradstreet
Dun & Bradstreet unlocks the power of data through analytics, creating a better tomorrow. Each day, we are finding new ways to strengthen our award-winning culture and accelerate creativity, innovation and growth. Our 6,000+ global team members are passionate about what we do. We are dedicated to helping clients turn uncertainty into confidence, risk into opportunity and potential into prosperity. Bold and diverse thinkers are always welcome. Come join us!

As a Principal Data Engineer, you will build and maintain Enterprise Level Data Pipelines utilizing the tools available within our Big Data Eco-System. This will require you to work closely with data analysts and scientists, and database and systems administrators to create data solutions. You will collaborate with business and technical teams to translate business requirements and functional specifications into innovative solutions. This will require research, awareness, interactivity, and the ability to ask the right questions. You will also be responsible for serving as a technical expert for project teams throughout the implementation and maintenance of business and enterprise software solutions, and in addition, you will provide consultation to help ensure new and existing software solutions are developed with insight into industry best practices, strategies, and architectures and pursues professional growth.

Essential Key Responsibilities

Design, build, and deploy new data pipelines within our Big Data Eco-Systems using Streamsets/Talend/Informatica BDM etc. Document new/existing pipelines, Datasets.
Design ETL/ELT data pipelines using StreamSets, Informatica or any other ETL processing engine. Familiarity with Data Pipelines, Data Lakes and modern Data Warehousing practices (virtual data warehouse, push down analytics etc.)
Expert level programming skills on Python
Expert level programming skills on Spark
Cloud Based Infrastructure: AWS (and the very many services it offers) i.e. EC2, RDS, AWS Redshift, EMR, Snowflake, Athena, PrestoDB
Experience with one of the ETL Informatica, StreamSets in creation of complex parallel loads, Cluster Batch Execution and dependency creation using Jobs/Topologies/Workflows etc.,
Experience in SQL and conversion of SQL stored procedures into Informatica/StreamSets, Strong exposure working with web service origins/targets/processors/executors, XML/JSON Sources and Restful API’s.
Strong exposure working with relation databases DB2, Oracle & SQL Server including complex SQL constructs and DDL generation.
Exposure to Apache Airflow for scheduling jobs
Strong knowledge of Big data Architecture (HDFS), Cluster installation, configuration, monitoring, cluster security, cluster resources management, maintenance and performance tuning
Create detailed designs and POCs to enable new workloads and technical capabilities on the Platform.  
Work with the platform and infrastructure engineers to implement these capabilities in production.
Manage workloads and enable workload optimization including managing resource allocation and scheduling across multiple tenants to fulfill SLAs.
Participate in planning activities, Data Science and perform activities to increase platform skills

Education, Experience and Competencies:

Minimum 6 years of experience in ETL/ELT Technologies, preferably StreamSets/Informatica/Talend etc.,
Minimum of 6 years hands-on experience with Big Data technologies e.g. Hadoop, Spark, Hive.
Minimum 3+ years of experience on Spark
Hands on experience with Databricks is a HUGE plus.
Minimum 6 years of experience in Cloud environments, preferably AWS
Minimum of 4 years working in a Big Data service delivery (or equivalent) roles focusing on the following disciplines:
AWS S3 Creating Buckets/Tokens etc.,
Big Data (Hadoop ecosystems/distributions e.g. Cloudera 5.14 or 5.15, Databricks)
Any experience with NoSQL and Graph databases
Informatica or StreamSets Data integration (ETL/ELT)
Exposure to role and attribute based access controls
Exposure to BI tools like Tableau, PowerBI, Looker, etc.,
Hands on experience with managing solutions deployed in the Cloud, preferably on AWS
Experience working in a Global company, working in a DevOps model is a plus