Data Collection Engineer

New York, NY
Engineering /
Full-Time /
Hybrid
Your Role: Data Collection Engineer

As a Data Collection Engineer, you'll play a critical role in acquiring and structuring high-value external data that powers our core products. Your work will fuel our knowledge graph of millions of entities and directly support our mission to deliver transparency and insight into complex global networks.

You’ll work closely with engineering, research, and product teams to identify new data sources, develop reliable pipelines to gather, ingest, and structure that data, and continuously improve our ability to scale and adapt. You'll have ownership over how information flows into our platform — from design and architecture to reliability and performance — and help shape the systems that underpin our next generation of features and products.

What you'll do

    • Design and implement systems to collect, extract, and normalize external data from a variety of sources.
    • Collaborate with researchers and analysts to identify new sources of valuable company data and define integration strategies.
    • Build robust, scalable pipelines that ingest structured and semi-structured data into our database.
    • Ensure high levels of accuracy, coverage, and freshness across incoming data streams.
    • Contribute to the evolution of our data platform and internal tooling.
    • Improve system reliability, observability, and performance over time.

Who you are

    • 3+ years of experience as a backend or full-stack software engineer, ideally working with data ingestion or ETL systems.
    • Intimate knowledge of how to crawl the internet at scale.
    • Strong programming skills, especially in Python.
    • Experience working with structured and unstructured data from diverse external systems.
    • Comfortable debugging complex issues involving networking, content rendering, or inconsistent source data.
    • Proficient with SQL and relational databases.
    • A clear communicator who collaborates effectively with both technical and non-technical teammates.
    • Passionate about turning raw data into meaningful insight, and eager to work on technically nuanced challenges.

Ideally you'll have

    • Familiarity with headless browser automation or techniques for collecting data from dynamic content sources.
    • Expertise in the architure, technologies, and tools that run the modern internet such as DNS, networking, CDNs, WAFs, proxies and reverse proxies.
    • Experience with event-driven architecture.
    • Eagerness to incorporate new technologies and validate their usefulness using structured experiments and thorough testing.
    • Experience building health monitoring and observability tools for consumption by automated tools, engineers, and non-technical stakeholders.