Data Engineer (Machine Learning Focused)

Los Altos, CA
Automated Driving – Cloud Data
At Toyota Research Institute (TRI), we’re working to build a future where everyone has the freedom to move, engage, and explore with a focus on reducing vehicle collisions, injuries, and fatalities. Join us in our mission to improve the quality of human life through advances in artificial intelligence, automated driving, robotics, and materials science. We’re dedicated to building a world of “mobility for all” where everyone, regardless of age or ability, can live in harmony with technology to enjoy a better life. Through innovations in AI, we’ll…

- Develop vehicles incapable of causing a crash, regardless of the actions of the driver.
- Develop technology for vehicles and robots to help people enjoy new levels of independence, access, and mobility.
- Bring advanced mobility technology to market faster.
- Discover new materials that will make batteries and hydrogen fuel cells smaller, lighter, less expensive and more powerful.

Our work is guided by a dedication to safety – in how we research, develop, and validate the performance of vehicle technology to benefit society. As a subsidiary of Toyota, TRI is fueled by a diverse and inclusive community of people who carry invaluable leadership, experience, and ideas from industry-leading companies. Over half of our technical team carries PhD degrees. We’re continually searching for the world’s best talent ‒ people who are ready to define the new world of mobility with us!

We strive to build a company that helps our people thrive, achieve work-life balance, and bring their best selves to work. At TRI, you will have the opportunity to enjoy the best of both worlds ‒ a fun start-up environment with brilliant people who enjoy solving tough problems and the financial backing to successfully achieve our goals. Come work with TRI if you’re interested in transforming mobility through designing safer cars, enabling the elderly to age in place, or designing alternative fuel sources. Start your impossible with us.

The role of Senior Data Engineer for Machine Learning (ML) Infrastructure is at an exciting intersection between large scale deep learning and world scale data processing.

ML: As a member of the ML team, you work alongside top research scientists in the field. You are responsible for enabling cutting-edge Deep Learning to be applied to Petabyte-scale (and beyond) volumes of sensory data (including video, LIDAR, radar) coming from our cars, robots, and other data collection platforms. You interact closely with our cloud data team to design and deploy large-scale distributed infrastructure for rapid experimentation, training, and inference. You are passionate about applying cutting-edge machine learning to real-world problems in autonomous driving and robotics and about building the required frameworks and tools to do so.

Data: As much code runs inside an autonomous car or robot, even more code runs in the cloud. Services and pipelines that process data is what our machine learning, mapping, robotics, and simulation teams build to implement their own initiatives. The cloud data team is responsible for designing and implementing the set of services, libraries, tools, and dashboards that make this possible. We think about scale (“consume petabytes of driving data”), governance (“explain through data that our car did the right thing”), and cross-platform execution (“deploy an image-processing service in AWS or inside a robot”). We are looking for engineers that can make this possible.


    • Maintain and continuously improve large scale iterative labeling, experimentation, training, and deployment pipelines for modern deep learning on cameras, LIDARs, radars, and other sensors.
    • Collaborate with other software engineers and research scientists to develop high-performance frameworks and tools for deploying and managing services and data pipelines from cloud storage to GPUs.
    • Communicate, scope and design new features to meet the needs of clients inside and outside of TRI.
    • Develop/integrate labeling tools and work with teams to provide ground-truth in support of machine learning and simulation.
    • Live and breathe the software practices that produce maintainable code, including automated testing, continuous integration, code style conformity, and code review.


    • Bachelor's degree in Computer Science or equivalent.
    • 4+yrs of Experience
    • Strong communication skills. Team player. Good Listener.
    • Strong Python skills (including SciPy stack).
    • Experience with C++ is a plus.
    • Experience integrating with Cloud APIs especially AWS.
    • Experience with High-Performance Computing, GPUs, performance optimization.
    • Strong ability to write unit testable code.
    • Experience with data stores and related technologies for ingesting, indexing and analyzing large amounts of time series and video data: S3, Parquet, Alluxio, big data filesystems, Cloudera stack etc.
    • Experience with relational or NoSQL systems and integrations across different data stores.
    • Experience integrating with CI tools programmatically, especially Jenkins.
    • Experience with Docker, registries and container deployment services (e.g., AWS ECS, Kubernetes).
    • Experience with related tools and processes: Git, Continuous Integration, Code Reviews.
    • Experience with data transformation tools like OpenCV, Pandas etc. 

Qualifications Bonuses:

    • Experience working with Machine Learning, especially Computer Vision, Deep Learning a very big plus.
    • Experience building and growing image and video labeling pipelines.
    • Experience with software development on top of Deep Learning Frameworks like PyTorch (preferred), MXNet, Tensorflow.
    • Experience with big data pipeline a plus, and pipeline orchestration frameworks such as Spark, Airflow, Kafka.
TRI provides Equal Employment Opportunity without regard to the applicant's race, color, creed, gender, gender identity or expression, sexual orientation, national origin, age, physical or mental disability, medical condition, religion, marital status, genetic information, veteran status, or any other status protected under federal, state or local laws.