ESP Audio Front-End Algorithms

Espressif's Audio Front-End Algorithms

High-performance audio algorithms to enable voice user
interfaces with Espressif SoCs

Any voice-enabled product needs to perform well in a noisy environment, and audio front-end (AFE) algorithms play an important role in building a sensitive voice-user interface (VUI). Espressif’s AI Lab has created a set of audio front-end algorithms that can offer this functionality. Customers can use these algorithms with Espressif’s powerful ESP32 and ESP32-S3 SoCs, in order to build high-performance, yet low-cost, products with a voice-user interface.

AEC

BSS

WakeNet

Acoustic Echo Cancellation (AEC)

Acoustic Echo Cancellation is achieved with an algorithm designed to remove echoes from the audio input filtered through a microphone. This is beneficial when the device is playing back some audio through its speakers.

Blind Source Separation (BSS)

The Blind Source Separation algorithm uses multiple microphones to detect the direction of the incoming audio, while enhancing the input from a certain direction. This algorithm improves the quality of the desired audio source in a noisy environment.

Noise Suppression (NS)

The Noise Suppression algorithm takes effect on single-channel audio signals. It works toward eliminating unwanted non-human noise (for example sound of vacuum cleaner or air conditioner), thus improving the audio signal that needs to be processed.

WakeNet

Espressif's wake word engine WakeNet is specially designed to provide a high performance and low memory footprint wake word detection algorithm, which enable devices to 'hear' wake words, such as “Alexa”, “Hi, lexin” and “Hi, ESP”.

Advantages

Outstanding
Acoustic Performance

Espressif's AFE algorithms deliver exceptional far-field performance. These algorithms use our proprietary wake word engine, designed to meet stringent test requirements for multi-language support.

Low-Resource Consumption

Espressif's AFE algorithms are optimized, as they take advantage of Espressif’s AI accelerator that is available in the ESP32-S3 SoC. Espressif's AFE algorithms consume around 22% of CPU, 48 KB SRAM and 1.1 MB PSRAM. This provides sufficient headroom for customer applications on the ESP32-S3 SoC.

Flexibility

Espressif's AFE algorithms offer an easy and intuitive API for customer applications, so that their performance can change as dynamically as it is required. The distance between the two microphones can be between 20-80 mm, which allows considerable flexibility for the hardware design of developers’ end-products.

Audio Front End Solution

Get Started

690 Bibo Road Block 2 Suite 204, Zhangjiang Shanghai, China

Main menu