Publications
An updated list of all publications can be found on my Google Scholar profile.
Preprints

SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Q. Chen, J. Yu, M. Schwager, P. Abbeel, F. Shentu, P. Wu
arXiv | website | code
TL;DR: SARM is a stage-aware, video-based reward modeling framework that enables scalable and robust imitation learning for long-horizon tasks by deriving progress signals from natural language annotations, dramatically improving policy performance over standard behavior cloning.
Q. Chen, J. Yu, M. Schwager, P. Abbeel, F. Shentu, P. Wu
arXiv | website | code
TL;DR: SARM is a stage-aware, video-based reward modeling framework that enables scalable and robust imitation learning for long-horizon tasks by deriving progress signals from natural language annotations, dramatically improving policy performance over standard behavior cloning.

GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics
Q. Chen, N. Gao, S. Huang, J. Low, T. Chen, J. Sun, M. Schwager
arXiv | website | code
TL;DR: GRaD-Nav++ is a lightweight, fully onboard Vision-Language-Action framework that enables drones to follow natural language commands in real time using DiffRL training in a 3DGS simulator, achieving strong generalization across tasks and environments both in simulation and on real hardware.
Q. Chen, N. Gao, S. Huang, J. Low, T. Chen, J. Sun, M. Schwager
arXiv | website | code
TL;DR: GRaD-Nav++ is a lightweight, fully onboard Vision-Language-Action framework that enables drones to follow natural language commands in real time using DiffRL training in a 3DGS simulator, achieving strong generalization across tasks and environments both in simulation and on real hardware.

Dojo: A Differentiable Physics Engine for Robotics
T. Howell, S. Cleac'h, J. Brüdigam, Q. Chen, J. Sun, Z. Kolter, M. Schwager, Z. Manchester
arXiv | website | code
TL;DR: Dojo is a differentiable physics engine that solves contact dynamics with a custom interior-point method, offering stable simulation and smooth gradients for robotics tasks such as trajectory optimization, policy learning, and system identification.
T. Howell, S. Cleac'h, J. Brüdigam, Q. Chen, J. Sun, Z. Kolter, M. Schwager, Z. Manchester
arXiv | website | code
TL;DR: Dojo is a differentiable physics engine that solves contact dynamics with a custom interior-point method, offering stable simulation and smooth gradients for robotics tasks such as trajectory optimization, policy learning, and system identification.
Journal Articles

[RA-L 2023] Simultaneous Spatial and Temporal Assignment for Fast UAV Trajectory Optimization using Bilevel Optimization
Q. Chen, S. Cheng, N. Hovakimyan
IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3860–3867, 2023
link | arXiv
TL;DR: Proposes a bilevel optimization framework for jointly assigning UAV waypoints in space and time to achieve efficient navigation through constrained environments.
Q. Chen, S. Cheng, N. Hovakimyan
IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3860–3867, 2023
link | arXiv
TL;DR: Proposes a bilevel optimization framework for jointly assigning UAV waypoints in space and time to achieve efficient navigation through constrained environments.
Conferences

[CoRL 2025] ARCH: Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly
J. Sun, A. Curtis, Y. You, Y. Xu, M. Koehle, Q. Chen, S. Huang, L. Guibas, S. Chitta, M. Schwager, H. Li
arXiv | website | code
TL;DR: ARCH is a hierarchical modular framework that combines imitation learning and reinforcement learning primitives with a high-level policy to enable data-efficient, high-precision, and generalizable long-horizon robotic assembly.
J. Sun, A. Curtis, Y. You, Y. Xu, M. Koehle, Q. Chen, S. Huang, L. Guibas, S. Chitta, M. Schwager, H. Li
arXiv | website | code
TL;DR: ARCH is a hierarchical modular framework that combines imitation learning and reinforcement learning primitives with a high-level policy to enable data-efficient, high-precision, and generalizable long-horizon robotic assembly.

[CoRL 2025] ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation
S. Huang, Q. Chen, X. Zhang, J. Sun, M. Schwager
arXiv | website | code
TL;DR: A state-of-the-art 3D world model trained directly from point clouds, which enables accurate dynamics prediction across multi-object, multi-material scenarios and empowers model-based visuomotor control in robotic manipulation tasks.
S. Huang, Q. Chen, X. Zhang, J. Sun, M. Schwager
arXiv | website | code
TL;DR: A state-of-the-art 3D world model trained directly from point clouds, which enables accurate dynamics prediction across multi-object, multi-material scenarios and empowers model-based visuomotor control in robotic manipulation tasks.

[IROS 2025] GRaD-Nav: Efficiently learning visual drone navigation with Gaussian radiance fields and differentiable dynamics
Q. Chen, J. Sun, N. Gao, J. Low, T. Chen, M. Schwager
arXiv | website | code
TL;DR: We propose a vision-based drone navigation framework that leverages differentiable dynamics and Gaussian radiance fields for sample-efficient learning and robust generalization.
Q. Chen, J. Sun, N. Gao, J. Low, T. Chen, M. Schwager
arXiv | website | code
TL;DR: We propose a vision-based drone navigation framework that leverages differentiable dynamics and Gaussian radiance fields for sample-efficient learning and robust generalization.

[IROS 2025] Autotuning Bipedal Locomotion MPC with GRFM-Net for Efficient Sim-to-Real Transfer
Q. Chen, J. Li, S. Cheng, N. Hovakimyan, Q. Nguyen
arXiv | website
TL;DR: This work introduces GRFM-Net for modeling bipedal robot actuation and proposes an MPC autotuning pipeline that enables robust sim-to-real locomotion transfer.
Q. Chen, J. Li, S. Cheng, N. Hovakimyan, Q. Nguyen
arXiv | website
TL;DR: This work introduces GRFM-Net for modeling bipedal robot actuation and proposes an MPC autotuning pipeline that enables robust sim-to-real locomotion transfer.
