Research Scientist, Latent State Inference for World Models

Los Altos, CA

Human Interactive Driving – Human Interactive Driving /

Full-time /

Hybrid

At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team in Automated Driving, Energy & Materials, Human-Centered AI, Human Interactive Driving, Large Behavior Models, and Robotics.

Within the Human Interactive Driving division, the Extreme Performance Intelligent Control department is working to develop scalable, human-like driving intelligence by learning from expert human drivers. This project focuses on creating a configurable, data-driven world model that serves as a foundation for intelligent, multi-agent reasoning in dynamic driving environments. By tightly integrating advances in perception, world modeling, and model-based reinforcement learning, we aim to overcome the limitations of more compartmentalized, rule-based approaches. The end goal is to enable robust, adaptable, and interpretable driving policies that generalize across tasks, sensor modalities, and public road scenarios—delivering ground-breaking improvements for ADAS, autonomous systems, and simulation-driven software development.

We are seeking a forward-thinking Research Scientist to focus on inferring latent state representations from sensor data, powering world models, and supporting rigorous policy evaluation for autonomous vehicles. This role spans raw perception and structured representations, enabling both high-fidelity predictive modeling and reliable policy assessment in simulated or learned environments.

You will work closely with researchers developing world models and those focused on policy evaluation, ensuring that the latent states inferred from real-world sensors are semantically rich, temporally coherent, and suitable for both long-horizon prediction and counterfactual analysis.

Responsibilities

Design and train learning-based systems that transform raw multimodal sensor data (e.g., images, lidar, radar) into compact, dynamic latent states suitable for use in learned world models.
Investigate unsupervised, self-supervised, and contrastive methods to learn latent spaces that encode dynamics, semantics, and uncertainty.
Incorporate temporal information and motion consistency into latent state estimation using recurrent, filtering, or transformer-based architectures.
Combine data from heterogeneous modalities into a unified latent state representations that generalize across conditions and scenarios.
Ensure the learned representations are resilient to occlusion, sensor degradation, and distributional shift.
Collaborate on joint research agendas with world modeling and policy evaluation researchers to explore uncertainty modeling, interpretability, and representation bottlenecks.
Publish novel research, contribute to open-source tools, and engage with the academic community at major ML and robotics conferences.

Qualifications

PhD in Computer Science, Machine Learning, Robotics, or a related field.
Strong foundation in representation learning or state estimation for sequential decision-making.
Robust experience in deep generative models (e.g., VAEs, diffusion models, autoregressive models).
Solid base in perception models from large-scale real-world sensor datasets from autonomous driving, robotics, or similar domains.
Experience with latent world models, generative AI for perception, or contrastive learning.
Familiarity with structure-from-motion, Gaussian splatting, or neural radiance fields (NeRFs).
Experience with multi-modal sensor fusion, state estimation, and SLAM techniques.
Familiarity with uncertainty-aware perception, active perception, and predictive modeling.
Accomplished publication record at top-tier conferences such as NeurIPS, CVPR, ICCV, ICLR, ICRA, CoRL, or RSS.
Deep programming skills in Python and deep learning frameworks such as PyTorch or JAX.
Excellent problem-solving skills and the ability to work in a fast-paced team research environment.

Bonus Qualifications

Background building or using world models in model-based RL, planning, or simulation.
Familiarity with latent-space rollouts, policy evaluation metrics, or offline RL tools.
Knowledge working in high-dimensional, real-time environments with latency constraints.

Please submit a brief cover letter and add a link to Google Scholar to include a full list of publications when submitting your CV for this position.

The pay range for this position at commencement of employment is expected to be between $176,000 and $264,000/year for California-based roles; however, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. Note that TRI offers a generous benefits package (including 401(k) eligibility and various paid time off benefits, such as vacation, sick time, and parental leave) and an annual cash bonus structure. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.

Please reference this Candidate Privacy Notice to inform you of the categories of personal information that we collect from individuals who inquire about and/or apply to work for Toyota Research Institute, Inc. or its subsidiaries, including Toyota A.I. Ventures GP, L.P., and the purposes for which we use such personal information.

TRI is fueled by a diverse and inclusive community of people with unique backgrounds, education and life experiences. We are dedicated to fostering an innovative and collaborative environment by living the values that are an essential part of our culture. We believe diversity makes us stronger and are proud to provide Equal Employment Opportunity for all, without regard to an applicant’s race, color, creed, gender, gender identity or expression, sexual orientation, national origin, age, physical or mental disability, medical condition, religion, marital status, genetic information, veteran status, or any other status protected under federal, state or local laws.

It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability. Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records for employment.

Apply for this job