Staying ahead in the era of AI

in English

Edge AI is a technology that runs AI algorithms and models directly on devices like microcontrollers, microprocessors, and sensors in industrial and automotive applications.

This technology enables real-time data processing at the source of data collection, offering faster response times, and greater bandwidth efficiency. Edge AI reduces latency, data transmission, and power consumption while enhancing security and privacy. It offers advantages over cloud computing, including speed, low latency, lower data transmission loads. Edge AI opens new possibilities for device and service creators, enabling new applications at a fraction of the cost of cloud computing. It adds support for advanced graphics, audio- and vision-based applications to the extensive set of human interface modes (touch, graphics, voice).

Edge AI Hardware

Edge AI uses specialized hardware to balance compute performance, power efficiency, and cost, ranging from low-power microcontrollers to high-performance microprocessors and dedicated AI accelerators. Traditional embedded systems have a main MCU for all tasks, but AI workloads strain resources and reduce system efficiency. Edge AI architectures offload compute-intensive tasks to dedicated co-processors or accelerators, addressing performance trade-offs and reducing the burden on the MCU. Most of the advanced MCUs are equipped to run lightweight AI models as the software frameworks has optimized and embeds DSP capabilities capable to implement Artificial Intelligence face recognition-based applications. A combination of high-performance MCUs with built-in NPU hardware acceleration and advanced AI fine-tuning workflows can accelerate the deployment of AI models in low-power MCUs.

Figure 1: Edge AI hardware and software frameworks

The AI MPU Platform application processors, based on 64-bit platforms with Arm Cortex^®-A & Cortex^®-M cores, offer higher performance, speed and extended connectivity options (Ethernet switch, TSN support, PCIe, USB3, LVDS, DSI, and parallel display interfaces). MPU-supported AI applications use AI algorithms and models to enhance functionality in vision, robotics, and edge computing, often utilizing integrated AI accelerators or AI-optimized processing capabilities.

AI Accelerator

AI accelerators are specialized neural network computation machines hardware blocks optimized for matrix multiplications and convolutions, and come in various forms, such as on-chip neural engines and external AI modules. AI accelerators like GPUs, TPUs, and FPGAs are essential for improving AI processing speed and efficiency. GPUs are designed for parallel processing and are efficient for complex tasks like deep learning and neural network training. TPUs optimize tensor operations, making them ideal for deep learning inference and training in cloud and edge AI applications. FPGAs offer customizable hardware acceleration, enabling real-time AI model execution with lower latency and power consumption, making them ideal for edge devices and industrial systems.

The cutting-edge ST Neural-ART Accelerator, a proprietary neural processing unit (NPU) that delivers exceptional efficiency in handling AI tasks. It is compatible with TensorFlow Lite, Keras, and ONNX, and will continue to expand. The ONNX format allows data scientists to use the STM32N6 for a wide range of AI applications.

Figure 2: Neural-ART Accelerator architecture

The Neural-ART Accelerator (Figure 2) is a flexible, dedicated dataflow stream processing engine that operates 8-16 bits arithmetic. It shifts from the Von Neumann architecture and offers hardware acceleration for various neural network architectures. It features embedded security and seamless integration into the MCU backbone via two 64-bit AXI interfaces. Configurable from 72 MACs to 2304 MACs, it achieves up to 4.6 TOPS at 1 to 5 TOPS/W.

The STM32N6 is the STM32 MCU based on the Arm^® Cortex^®-M55, it embeds the ST Neural-ART accelerator^™, an in-house developed neural processing unit (NPU) engineered for power-efficient edge AI applications. Clocked at 1 GHz and providing up to 600 GOPS, it enables real-time neural network inference for computer vision and audio applications.

STM32MP2 MPU product offers advanced edge AI capabilities with its NPU accelerator and the flexibility to run AI applications on either the CPU, GPU, or NPU. Additionally, it supports high-end edge computing use cases, such as machine vision, through its multimedia capabilities. This is enabled by the whole ST edge AI ecosystem offer.

The MAX78002 is a new breed of AI microcontroller-built hardware-based convolutional neural network (CNN) accelerator, execute at ultra-low power and live at the edge of the IoT. It features an Arm^® Cortex^®-M4 with FPU CPU for efficient system control. AI network updates can be made on the fly due to its SRAM-based CNN engine weight storage memory. The CNN architecture is highly flexible, allowing networks to be trained in conventional toolsets like PyTorch^® and TensorFlow^®, then converted for execution on the MAX78002 using tools provided by Analog Devices.

Multicore and Processor-based Edge AI

The eIQ^® Neutron Neural Processing Unit (NPU) is a highly scalable accelerator core architecture providing machine learning (ML) acceleration. The eIQ Neutron NPUs offer support for a wide variety of neural network types such as CNN, RNN, TCN and Transformer networks and more. ML application development with the eIQ Neutron NPU is fully supported by the eIQ machine learning software development environment.

The i.MX 9 series, part of the EdgeVerse^™ edge computing platform, integrates hardware neural processing units for machine learning acceleration at the edge. It optimizes inferences per second and performance per Watt, enabling next generation use cases in embedded systems. The i.MX 8M family combines high-performance computing with audio, voice, and video capabilities, enabling 4K HD video streaming, 1080p video encode and decode, professional audio quality, speech recognition, AI, machine vision, and edge computing. The NXP eIQ^™ Auto deep learning toolkit helps developers introduce deep learning algorithms into their applications, meeting automotive standards. The i.MX RT700 Crossover MCU, designed for the AI-Enabled Edge, consists of a high-performance main compute subsystem, a secondary sense-compute subsystem, and specialized coprocessors. Its architecture combines general-purpose cores with high-performance DSPs and a powerful NPU, enabling differentiated embedded products.

Featured AI Platforms to get you started

Raspberry Pi 5 and AI HAT+, built-in Hailo AI accelerator for cost effective and power-efficient high-performance AI applications

Ranging and Gesture Detection Nucleo pack

STM32N6570-DK discovery kit for STM32N series microcontroller

Evaluation Kit, MAX78000, Artificial Intelligence Microcontroller, Arm Cortex-M4F

AI Microcontroller MAX78000EVKIT# evaluation kit

Rapid development platform for artificial intelligence (AI) solutions using the MAX78000

Versal^™ AI core series evaluation kit for VCK190

AMD Kria^™ KD240 Drives Starter Kit

Development board for deep learning, vision and multimedia accelerators, BeagleBone^® AI-64

Edge AI/ML-Vision & Speech Kit

Low-cost and compact freedom development board featuring the i.MX93 applications processor

i.MX 91 evaluation kit (EVK) for applications processors

FRDM i.MX 91 development board for application processor

STM32MP257F-DK is a discovery kit for application processor STM32MP2 series based on the Arm^® Cortex^® A35 and M33

Complete demonstration and development platform for the STMicroelectronics STM32MP257FAI3 based application processor

Conclusion

Staying ahead in the era of AI requires harnessing the right hardware to unlock AI’s full potential. Accelerating your journey into intelligence is essential to secure your place in the world of tomorrow. Leading semiconductor producers are delivering cutting-edge AI chips, accelerators, and edge computing solutions—making the tools for building intelligent systems more powerful and accessible than ever. Ultra-efficient microcontrollers and high-performance processors are driving the next wave of AI adoption, making it crucial to adapt to AI technology and leverage the right hardware for long-term success.