Senior Data Engineer

San Francisco
Engineering
Full-time
At Medium, words matter. We are building the best place for reading and writing on the internet—a place where today’s smartest writers, thinkers, experts, and storytellers can share big, interesting ideas; a place where ideas are judged on the value they provide to readers, not the fleeting attention they can attract for advertisers.

We are looking for a Senior Data Engineer that will design, build, ship and maintain our business critical Data Platform. In this role you will lead development of both transactional and data warehouse designs mentoring our team of cross functional engineers and Data Scientists. You'll also design, implement and tune tables, queries, stored procedures, and indexes.

At Medium, we are proud of our product, our team, and our culture. Medium’s website and mobile apps are accessed by millions of users each day. Our mission is to move thinking forward by providing a place where individuals, along with publishers, can share stories and their perspectives. Behind this beautifully-crafted platform is our engineering team who works seamlessly together. From frontend to API, from data collection to product science, Medium engineers work multi-functionally with open communication and feedback.

What Will You Do

    • You’ll work on high impact projects that improve data availability and quality, and provide reliable access to data for the rest of the business
    • Design, architect and support new and existing data and ETL pipelines and recommend improvements and modifications.
    • Create optimal data pipeline architecture and systems.
    • Assemble large, complex data sets that meet functional and non-functional business requirements.
    • Be responsible for ingesting data into our data warehouse and providing frameworks and services for operating on that data including the use of Spark.
    • Analyze, debug and correct issues with data pipelines
    • Communicate strategies and processes around data modeling and architecture to multi-functional groups and senior level management.
    • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
    • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Spark and AWS technologies.
    • You will build widely used data pipelines and tools making critical business data available to other teams.

About You

    • You have at least 5 years of experience implementing complex ETL pipelines preferably in connection with Hadoop or Spark.
    • You have lots of experience writing complex SQL and ETL processes
    • You have exceptional coding and design skills, particularly in Java/Scala and Python.
    • You've worked with large data volumes, including processing, transforming and transporting large-scale data
    • You have hands-on experience with AWS and services like EC2, SQS, SNS, RDS, Cache etc.
    • You have a BS in Computer Science / Software Engineering or equivalent experience.
    • You have knowledge of Apache Hadoop, Apache Spark (including pyspark), Spark streaming, Kafka, Scala, Python, and similar technology stacks
    • You have a strong understanding & usage of algorithms and data structures. 

Nice To Have

    • Spark data pipeline and or streaming experience
    • Redshift knowledge and operational experience
    • Machine Learning expertise
At Medium, we foster an inclusive, supportive, fun yet challenging team environment. We  value having a team that is made up of a diverse set of backgrounds and respect the healthy expression of diverse opinions. We embrace experimentation and the examination of all kinds of ideas through reasoning and testing. Come join us as we continue to change the world of digital media. Medium is an equal opportunity employer.

Interested? We'd love to hear from you.