Home / Blog / How to Autonomously Navigate with PX4
How to Autonomously Navigate with PX4

How to Autonomously Navigate with PX4

Written By James Strawson

GPS-denied Navigation is Key to Navigating Indoors

Do you remember when GPS was a new and exciting technology that everyone wanted to get their hands on? It was bulky and expensive but changed the way we navigated. Now, GPS has become the standard navigation feature for cars, cell phones and wearable devices.

Although the positioning and localization features from GPS are sufficient and helpful in our everyday use cases, it isn’t the most accurate navigation system for drones to use. GPS relies on clear and unmolested signals emitted by satellites to calculate a position. As the list of drone use cases grows, so do the environments drones can operate in. Some drones, such as those used for search and rescue, operate in GPS-denied areas, meaning that the signal is too weak for GPS to connect to a nearby satellite. GPS signals can be maliciously jammed or manipulated. For these reasons, drones may also use Visual Inertial Odometry (VIO) to navigate.

In instances where a drone can’t use GPS to navigate, VIO can help the drone understand where it is relative to its environment. VIO uses visual and inertial data to determine odometry, or how far the drone has travelled. What makes VIO unique from the odometer in your car is that it measures distances in 3D instead of a single distance traveled. This allows the drone to know its relative position in 3D, and therefore precisely navigate in 3D. VIO acts as the building blocks for many drone functions including obstacle avoidance, position control, and disturbance rejection. 

Keep reading to learn how to connect VIO to PX4 and about the different computer vision features your drone can achieve with VIO.

Send the right MAVLink Messages to PX4 for VIO

At ModalAI®, we created a powerful yet lightweight software architecture designed to run on our PX4 VOXL companion computer and Flight Core flight controller. To get VIO to work, a companion computer must transfer VIO data from an onboard algorithm and pass it through to the PX4 Autopilot through MAVLink packets in real time. The data that is transferred must be rotated and translated into the correct frame, constructed into the appropriate MAVlink packet and timestamped correctly. There are two primary MAVLink packets that can be used to accomplish this task.


This is a simplistic packet that allows the most fundamental position and orientation to be sent to PX4. However, a good VIO algorithm will have more information about the system state such as the body velocity and angular rates which are used by PX4’s state estimator (EKF2) for even more precise position control and accurate flight. 

To send this additional data, you must use MAVLink packet #331 “Odometry” which is what we elect to use in voxl-vision-px4 out of the box as part of the VOXL® software suite.

MAVLink Option #2 Odometry (#331)

Frames of Reference: Odometry

Whichever packet is used to send odometry data to PX4, it’s critical to ensure the data is represented in the correct reference frames. Generally, PX4 expects VIO data as a transform and rotation that describes the position of the drone’s body (body frame) with respect to a local frame which is centered on the ground where the drone is when it starts up. Each of these adheres to the convention of the X,Y, and Z axis aligning with the forward, right, and down direction relative to the drone.

This becomes slightly more complicated when adding velocity and angular rate data since the frame of reference for these derivatives are in body frame, not local frame. This means that the X, Y, and Z position of the drone is in the local frame relative to where it started out, whereas the VX, VY and VZ is the velocity of the drone relative to where the drone is pointing at the current time.

Neither of these frames of reference exactly correspond to the information that comes straight out of a VIO algorithm since the algorithm only has knowledge of the IMU and camera, not the drone’s body or the ground. To help reconcile this, it’s helpful to draw out the following diagram representing all the frames of reference used for visual odometry within voxl-vision-px4.

Each green box represents a 3D coordinate reference frame. For example, the body reference frame adheres to the aerospace convention with a coordinate system centered at the center of mass and aligned such that X points forwards, Y points to the right, and Z points down relative to the drone. Each blue box represents a transformation and can include translations or rotations. The vast majority of VIO algorithms will give you the IMU with respect to the VIO frame or imu_wrt_vio data, meaning that the vast majority of VIO algorithms will represent the position of the IMU relative to where the IMU was when it booted up. This means that the IMU used for VIO will not necessarily be lined up with the aeronautic coordinate frame for right and down or with gravity. Because of this, it is necessary to perform a rotation with vio_wrt_vioga to correct for the angle of the gravity vector inside of VIO’s frame

These reference frames and transform names described in the diagram exactly match the voxl-vision-px4 variables naming convention. To see how we perform these transforms, feel free to review our open source code here.

Create a visual landmark using AprilTags for PX4

VIO, or the ability to localize oneself in an environment, is the fundamental feature that makes further capabilities possible. Let’s look at the AprilTag relocalization and collision prevention capabilities of voxl-vision-px4 which leverage the established VIO feature.

How to do AprilTag Relocalization

AprilTag relocalization can compensate for some of the shortcomings of VIO. It allows a drone to be aware of its position in the world in a repeatable way between reboots, as opposed to simply knowing how far the drone has travelled since it booted up. It accomplishes this by visually recognizing AprilTags with its camera that are at known pre-configures fixed locations. We therefore refer to this new frame of reference as “fixed frame” since it is fixed to the world instead of changing on each startup.


In this video, you can see that the drone starts up crooked relative to the AprilTag fixed frame and flies aligned with its local frame until it sees the AprilTag and can correct back to fixed frame flight. This correction is achieved by calculating the translation rotation between local frame to fixed frame. 

The concept of where Fixed Frame is located and oriented is entirely up to the user and requires the user to define it by placing one or more AprilTags in known locations. VOXL must then be informed of the location of these AprilTags in its configuration file. A good example of this would be to accurates fly up and down warehouse aisles with an AprilTag at the end of each Aisle to correct for any odometry drift incurred during one pass from one end of the aisle to the other. 

To learn more about how to achieve AprilTag relocalization on VOXL and how to configure AprilTag locations in fixed frame, follow our publicly available documentation here.

How to do PX4 Collision Prevention

The final computer vision technique you can play with is the PX4 collision prevention feature. This feature of PX4 will prevent you from colliding with detected obstacles when flying in position mode with a radio controller. 

PX4 must still be provided with obstacle data to enable this functionality. On the VOXL platform, we support this out of the box by using forward-facing stereo cameras on-board your drone to calculate a depth map and identify obstacles.

The data that PX4 requires is an obstacle distance MAVLink packet (msg_id 330). The data structure looks like a spinning lidar sensor and lets you send in up to 72 different distances spanning around the drone in the horizontal plane. 

Our stereo cameras provide a wide 65 degrees of horizontal field of view which generate data in 3D. To pass this to PX4 which expects two-dimensional data in only the horizontal plane, we have to do more coordinate transformations like we did for VIO and AprilTag relocalization.

Voxl-vision-px4 maintains a short buffer of odometry history so that when new stereo data comes in it can go back in time to see where the drone was and how it was oriented when the stereo frames were captured before processing. Next it rotates and transforms the 3D obstacles points from that point in time to points represented in a leveled-out frame relative to how the drone is oriented at the present time. Obstacles above and below the drone are discarded and the remaining points are binned into the required MAVLink packet before being sent to PX4. 

For more information on how to configure and test this feature, follow the publicly available documentation here.

As we have seen, a large portion of robotics programming is maintaining transforms and rotations between different reference frames. The following diagram represents all of the reference frames and transforms used by voxl-vision-px4 to facilitate the three computer vision capabilities described here.


Resources to Get You Started:

VIO is a great tool drones can use when navigating in GPS-denied locations. At ModalAI, all of our products run on VOXL Vision PX4 and are pre-configured with VIO, making it easy for you to start your next project. 

Full Video Walkthrough


Share article on LinkedIn