Senior Data Engineer

Remote - US/Canada
Engineering – Core Platform
Full-Time
At Scribd (pronounced “scribbed”), we believe reading is more important than ever. Join our cast of unique characters as we build the world’s largest and most fascinating digital library: giving subscribers access to a growing collection of ebooks, audiobooks, magazines, documents, and more.
In addition to works from major publishers and top authors, we also create our own original content exclusively for Scribd users.
Our community includes over 1M subscribers in more than 190 countries. Join us in turning screen time into quality time!

What you'll do

Data quality and integrity are two areas of focus for your work in our existing, organically-grown data infrastructure. You would be responsible for building tools and technology to ensure that downstream customers can trust the data they're consuming. Depending on the project, this might involve collaboration with the Data Science and Content Engineering teams to repartition or optimize business-critical Hive tables, or working with Core Platform to implement better processing jobs for scaling our consumption of streaming data sets. Almost everything you would be working on would be to increase the "customer satisfaction" for internal customers of Scribd data.

Required Skills

    • Strong written and verbal communication skills (we're remote!)
    • You have 5+ years experience in data engineering
    • You have engineered scalable software using big data technologies (e.g. Hadoop, Spark, Hive, Flink, Samza, Storm, Elasticsearch, Druid, Cassandra, etc)
    • You have experience building data pipelines (real-time or batch) on large complex datasets
    • Fluency with at least one dialect of SQL (MySQL and Hive preferred)
    • Expertise in Scala, Java, or Python

Desired Skills

    • You have worked on and understand Streaming platforms, typically based around Kafka.
    • Strong understanding of AWS data platform services and their strengths/weaknesses.
    • Strong experience using  Jira, Slack, JetBrains IDEs, Git, GitLab, GitHub, Docker, Jenkins, Terraform. 
    • Experience using DataBricks