Deep Learning for Autonomous Drone Navigation (RGB-D only) – How would you approach this?
Hi everyone,
I’m working on a university project and could really use some advice from people with more experience in autonomous navigation / RL / simulation.
Task:
I need to design a deep learning model that directly controls a drone (x, y, z, pitch, yaw — roll probably doesn’t make much sense here 😅). The drone should autonomously patrol and map indoor and outdoor environments.
Example use case:
A warehouse where the drone automatically flies through all aisles repeatedly, covering the full area with a minimal / near-optimal path, while avoiding obstacles.
Important constraints:
- The drone does not exist in real life
- Training and testing must be done in simulation
- Using existing datasets (e.g. ScanNet) is allowed
- Only RGB-D data from the drone can be used for navigation (no external maps, no GPS, etc.)
My current idea / approach
I’m thinking about a staged approach:
- Procedural environments Generate simple rooms / mazes in Python (basic geometries) to get fast initial results and stable training.
- Fine-tuning on realistic data Fine-tune the model on something like ScanNet so it can handle complex indoor scenes (hanging lamps, cables, clutter, etc.).
- Policy learning Likely RL or imitation learning, where the model outputs control commands directly from RGB-D input.
One thing I’m unsure about:
In simulation you can’t model everything (e.g. a bird flying into the drone). How is this usually handled? Just ignore rare edge cases and focus on static / semi-static obstacles?
Simulation tools – what should I use?
This is where I’m most confused right now:
- AirSim – seems discontinued
- Colosseum (AirSim successor) – heard there are stability / maintenance issues
- Pros: great graphics, RGB-D + LiDAR support
- Gazebo + PX4
- Unsure about RGB-D data quality and availability
- Graphics seem quite poor → not sure if that hurts learning
- Pegasus Simulator
- Looks promising, but I don’t know if it fully supports what I need (RGB-D streams, flexible environments, DL training loop, etc.)
What I care most about:
- Real-time RGB-D camera access
- Decent visual realism
- Ability to easily generate multiple environments
- Reasonable integration with Python / PyTorch
Main questions
- How would you structure the learning problem? (Exploration vs. patrolling, reward design, intermediate representations, etc.)
- What would you train the model on exactly? Do I need to create several TB of Unreal scenes for training? How to validate my model(s) properly?
- Which simulator would you recommend in 2025/2026 for this kind of project?
- Do I need ROS/ROS2?
Any insights or “don’t do this” advice would be massively appreciated 🙏
Thanks in advance!
submitted by /u/Glittering_Copy6914
[link] [comments]