Home / Blog / Run Five Simultaneous Neural Networks on VOXL 2 with TensorFlow Lite
Run Five Simultaneous Neural Networks on VOXL 2 with TensorFlow Lite

Run Five Simultaneous Neural Networks on VOXL 2 with TensorFlow Lite

Written by Matt Turi

Similar to a sentinel guard that keeps watch at Buckingham Palace, the VOXL 2 Sentinel development drone has unprecedented perception capabilities- enabling its user with an arsenal of six embedded image sensors for maximum surveillance. The Sentinel, powered by VOXL 2, can run five concurrent neural networks with TensorFlow Lite. TensorFlow Lite is an embedded, open source program that allows developers to run pre-trained models for machine learning or computer vision applications. 

VOXL 2 Unlocking a Dedicated Neural Processing Unit for Computer Vision

With a low power Neural Processing Unit (NPU) embedded in the Qualcomm QRB5165 and ModalAI’s voxl-tflite-server onboard, VOXL 2 can simultaneously run five different neural networks, at 30 frames per second, out of the box. Instead of running neural networks solely on the computer processing unit (CPU), VOXL 2 uses the built in TensorFlow Lite NNAPI to unlock parallel networks on the dedicated NPU and graphics processing unit (GPU), where it runs 30Hz of neural network data- freeing up CPU resources. That leaves VOXL 2’s powerful CPU horsepower for the rest of an autonomous robotics stack.

VOXL 2 Runs 5 Concurrent Imager Inputs Out of the Box with TensorFlow Lite

The VOXL SDK included with the VOXL 2 is optimized for advanced computer vision. To accelerate time to market, voxl-tflite-server is enabled with five pre-trained neural networks that developers can run with TensorFlow Lite out of the box. The number of use cases for image-based deep learning is growing, and with VOXL 2, developers have access to important visual data for their use cases. Attach a Hi-Res 4K30 imx214 or imx412 to VOXL 2 to unlock these five computer vision models:

Object Detection: Identify known objects in your robot’s FOV. Object detection uses localization and classification data to categorize and describe the location of objects. The models we provide for this task are optimized for onboard inference and use either the SSD (single-shot detector) or YOLO (you only look once) architecture to achieve such low latency. This is extremely useful onboard a drone, as it enables intelligent surveillance of a scene and can provide key information depending on the task. Object detection can be used to find and track objects from the air, as an aid in autonomous flight or exploration, or even for more specific use cases like warehouse/asset inspection. 

Image Classification: Discern the most predominant object in your robot’s FOV. Image classification is used to classify the most important features in an image, and can provide similar information to an object detector at a much faster speed. In cases where location of the object within the image is unimportant, classification models can be used for extreme efficiency. VOXL 2 comes equipped with pre-trained image classification models with over 1000 known categories in the dataset.

Depth Estimation: Build depth maps with VOXL 2 from monocular images. VOXL 2 can infer the distance between its Hi-Res 4k30 image sensor and certain objects in its field of view (FOV). Monocular depth estimation is conducted by predicting the depth value of pixels given a singular RGB image as an input. Depth estimation is a crucial computer vision feature for autonomous drones and ground robots as it allows them to perceive their environment and navigate safely and autonomously.

Pose Estimation: Identify the orientation and position of human targets. VOXL 2 can use human pose estimation to identify points in a person’s face, body, arm, and leg, with four key points per category. Pose estimation enables developers to track a person, or multiple people, in real time and monitor or study their movements. This computer vision technique is useful in applications such as tracking human movements for animation, AR/VR, sport or dance technique analysis, or security and surveillance enhancement. 

Image Segmentation: Understand what objects in your robot’s FOV consists of. Image segmentation divides portions of the images your robot detects into segments- creating a pixel-based mask of each object. By eliminating regions that don’t contain pertinent information (think of the boxes from object detection), image segmentation identifies an accurate shape of each object. Drones can use image segmentation to accurately navigate through a cluster of trees without bumping into branches.  

TensorFlow Lite on VOXL 1 vs. VOXL 2

VOXL 2 unlocks lightweight, powerful computing. See how the neural networks perform on VOXL 2 compared to previous generation VOXL 1. A dedicated NPU enables VOXL 2 to process images at an extremely fast rate; enabling low latency or no lag in computer vision applications. 


Model Task Avg. CPU Inference (ms) Avg. GPU Inference (ms) Max Frames Per Second (fps) Input Dimensions Source
MobileNet V2-SSDlite Object Detection 127.78ms 21.82ms 37.28560776 [1,300,300,3] link
MobileNet V1-SSD Object Detection 75.48ms 64.40ms 14.619883041 [1,300,300,3] link
MobileNet V1-SSD Classifier 56.70ms 56.85ms 16.47446458 [1,224,224,3] link



Model Task Avg. CPU Inference (ms) Avg. GPU Inference (ms) Avg. NNAPI Inference (ms) Max Frames Per Second (fps)  Input Dimensions Source

Object Detection 33.89ms 24.68ms 34.42ms 34.86750349 [1,300,300,3] link
Efficient Net Life4 Classifier  115.30ms 24.74ms 16.42ms 48.97159647 [1,300,300,3] link
FastDepth Monocular Depth 37.34ms 18.00ms 37.32ms 45.45454546 [1,320,320,3] link
DeepLab V3 Segmentation 63.03ms 26.81ms 61.77ms 32.45699448 [1,321,321,3] link
Movenet SinglePose Lightning Pose Estimation 24.58ms 28.49ms 24.61ms 34.98950315 [1,192,192,3] link
YoloV5 Object Detection 88.49ms 23.37ms 83.87ms 36.536335367 [1,320320,3] link
MobileNetV1-SSD Object Detection 19.56ms 21.35ms 7.72ms 85.324232082 [1,300,300,3] link
MobileNetV1 Classifier 19.66ms 6.28ms 3.98ms 125.313283208 [1,224,224,3] link

Vision-Based Drones for Mission Critical 

An autonomous drone enabled with multiple simultaneous neural networks  reduces cognitive load of the pilot on mission critical flight operations. The more data a drone can process through various image outputs allows for more enhanced and safe autonomous navigation. VOXL 2 is pre-programmed to support five simultaneous neural networks out of the box with TensorFlow Lite. To learn more about TensorFlow Lite on VOXL 2, visit: https://docs.modalai.com/voxl-tflite-server/ 

Share article on LinkedIn