GRaD-Nav: Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics

1Stanford University

Arxiv Code (coming soon)


Abstract

Autonomous visual navigation is an essential ele- ment in robot autonomy. Reinforcement learning (RL) offers a promising policy training paradigm. However existing RL methods suffer from high sample complexity, poor sim-to-real transfer, and limited runtime adaptability to navigation scenar- ios not seen during training. These problems are particularly challenging for drones, with complex nonlinear and unstable dynamics, and strong dynamic coupling between control and perception. In this paper, we propose a novel framework that integrates 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning (DDRL) to train vision-based drone navigation policies. By leveraging high-fidelity 3D scene representations and differentiable simulation, our method im- proves sample efficiency and sim-to-real transfer. Additionally, we incorporate a Context-aided Estimator Network (CENet) to adapt to environmental variations at runtime. Moreover, by curriculum training in a mixture of different surrounding environments, we achieve in-task generalization, the ability to solve new instances of a task not seen during training. Drone hardware experiments demonstrate our method's high training efficiency compared to state-of-the-art RL methods, zero shot sim-to-real transfer for real robot deployment without fine tuning, and ability to adapt to new instances within the same task class (e.g. to fly through a gate at different locations with different distractors in the environment).


Overview of GRaD-Nav

arch-imag

Our GRaD-Nav architecture combines a visual+dynamics context encoder (CENet) within an Actor-Critic framework, trained end-to-end using a differentiable drone dynamics model and 3D Gaussian Splatting scene representation for photo-realistic visuals at training time. The policy transfers zero-shot to drone hardware and adapts to new navigation task instances at runtime.


Results of GRaD-Nav

Sample Efficiency

arch-imag

Sample efficiency and wall-clock time comparison benchmark of different algorithms on drone's vision-based end-to-end navigation policy training. Our method achieves over 300% sample efficiency and uses only 20% of training time to deliver a better policy compared with current RL algoruthms.

Long-episodes navigation results

arch-imag

Example success trajectories in hybrid simulation environments achieved by the proposed method. The left one is “middle gate” and the right one is “right gate”.

Taks-level generalization ability

arch-imag

Generalizable policy to fly through gates at different positions and with different distractor objects.

Hardware experiments

arch-imag

Robot hardware experiments of drone flying through middle gate.


Citations

If you find our work useful in your research, please consider citing:


@misc{chen2025gradnavefficientlylearningvisual,
      title={GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics}, 
      author={Qianzhong Chen and Jiankai Sun and Naixiang Gao and JunEn Low and Timothy Chen and Mac Schwager},
      year={2025},
      eprint={2503.03984},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2503.03984}, 
    }