Data Engineering Manager
Remote: US, Canada
Engineering – Core Platform
What you'll do
Data quality and integrity are two areas of focus for your work in our existing, organically-grown data infrastructure. You would be helping to build the data engineering team, and would work with product teams to clarify what data pipelines are important, and then work with them to build process, tooling, and technology to ensure that downstream customers can trust the data they're consuming. Depending on the project, this might involve collaboration with the Data Science and Content Engineering teams to identify business-critical Hive tables, or working with Core Platform to suggest better approaches for scaling streaming data sets. Almost everything you would be working on would be to increase the "customer satisfaction" for internal customers of Scribd data.
- Strong written and verbal communication skills (we're remote!)
- Strong mentoring skills and experiencing training and educating teammates or colleagues.
- Experience building and delivering high quality data systems using tools from the Hadoop or Spark ecosystem
- Working knowledge of Sqoop, Hive, Impala, and HDFS.
- Experiencing structuring large scale datasets in S3.
- Fluency with at least one dialect of SQL (MySQL and Hive preferred)
- Ability to develop software, whether scripts for shuffling data around, batch tasks, or stream processing units.
- Streaming platform experience, typically based around Kafka.
- Working knowledge of how to build, train, and deploy ML models.
- Strong understanding of AWS data platform services and their strengths/weaknesses.
- Opinions on what data integrity means and how to scale it up the organization.
- Spark, Storm, Beam