Data Engineer

Engineering /
Who We Are:
Factored (an AI Fund Portfolio company) was conceived in Palo Alto, California by Andrew Ng and a team of highly experienced AI researchers, educators, and engineers to help address the significant shortage of qualified AI & Machine-Learning engineers globally. ​We know that exceptional technical aptitude, intelligence, communication skills, and passion are equally distributed around the world, and we are very committed to testing, vetting, and nurturing the most talented engineers for our program and on behalf of our clients.

We are currently looking for an exceptionally talented Data Engineer to join our team. You will be called on for a wide range of responsibilities, from data aggregation, scraping, validation, transformation, quality and DevOps administration of both structured and unstructured datasets. Ideally, you will be experienced in optimizing data architecture, building data pipelines and wrangling data to suit the needs of our algorithms and application functionality. Since you’ll be joining an early-stage startup at the ground level, you’ll need to be a self-starter with a high degree of initiative and accountability. You must be able to wear multiple hats and take on additional responsibility on our growing team.

What You Will Be Doing:

    • Create and maintain optimal data pipeline architecture across multiple data sources, including licensed and scraped data.
    • Assemble large, complex data sets that meet functional needs of AI/ML engineers and front-end engineers.
    • Design and develop optimal data processing techniques: automating manual processes, data delivery, data validation and data augmentation.
    • Develop any necessary ETL processes to optimize analysis and performance.
    • Manage analytics tools that provide actionable insights into usage, customer acquisition, operational efficiency and other key business performance metrics.
    • Design and develop a RESTful API to enable programmatic integration to other SaaS systems.
    • Architect and implement new features from scratch, partnering with our AI/ML engineers to identify data sources, gaps and dependencies.
    • Identify bugs and performance issues across the stack, including performance monitoring and testing tools to ensure data integrity and quality user experience.
    • Build a highly scalable infrastructure using SQL and AWS big data technologies.
    • Keep our data secure and compliant with international data handling rules.

What You Must Bring:

    • 2+ Professional experience shipping high-quality, production-ready code.
    • Strong computer science foundations, including data structures & algorithms, OS, computer networks, databases, algorithms, object-oriented programming. 
    • Experience in Python.
    • Experience in setting up data pipelines using relational SQL and NoSQL databases, including Postgres, Cassandra and MongoDB.
    • Experience with AWS cloud services, including S3, EC2, EMR, RDS, Redshift.
    • Proven success manipulating, processing and extracting value from large heterogeneous datasets.
    • Strong analytic skills related to working with unstructured datasets.
    • Experience with extracting and ingesting data from websites using web crawling tools.
    • Experience with big data tools, including Hadoop, Spark, Kafka, etc.
    • Experience developing scalable RESTful APIs.
    • Expertise with version control systems, such as Git.
    • Excellent english communication skills and the ability to have in-depth technical discussions with both the engineering team and business people.
    • Self-starter and comfort working in an early-stage environment.
    • Strong project management and organizational skills.

Nice To Have:

    • BSc in Computer Science, Mathematics or similar field; Master’s or PhD degree is a plus.
    • Experience with design of ETLs using Apache Airflow.
    • Experience with extracting and ingesting data from Google Analytics or Twitter.
    • Understanding of AI/ML models.
    • Proficiency in HTML, CSS and JavaScript.
    • Experience with consumer applications and data handling.
    • Familiarity with data privacy regulations and best practices.
    • Advanced degree (Master’s or PhD) in computer science.


    • Accountability: an obligation or willingness to accept responsibility or to account for one's actions while doing so with the highest regard for integrity.  
    • Leadership: able to influence others to follow you and lead the team to a brighter future. 
    • Grit. able to stick with projects and work hard through good and bad times. High pain tolerance and can perform well under stress or pressure.
    • Scrappy: Takes initiative and proactively gets things done with low resources, but doing creative things, begging, borrowing, and whatever is needed in an ambiguous environment or situation.
    • Ownership orientation: Demonstrated orientation of extreme ownership over all aspects of the company and extremely results-driven in nature.
At Factored, we are committed to providing an environment of mutual respect where equal employment opportunities are available to all applicants without regard to race, color, religion, sex, pregnancy (including childbirth, lactation and related medical conditions), national origin, age, physical and mental disability, marital status, sexual orientation, gender identity, gender expression, genetic information (including characteristics and testing), military and veteran status, and any other characteristic protected by applicable law. AI Fund believes that diversity and inclusion among our employees is critical to our success as a company, and we seek to recruit, develop and retain the most talented people from a diverse candidate pool. Selection for employment is decided on the basis of qualifications, merit, and business need.