Hardware systems for low-latency audio processing: Event-based and multichannel synchronous sampling approaches

Kiselev, Ilya

doi:10.3929/ethz-b-000502086

Download

Full text (PDF, 9.423Mb)

Open access

Author

Kiselev, Ilya

Date

2021

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 9.423Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

Neuromorphic technology is slowly maturing with a variety of useable event-driven spiking sensors and hardware implementations of spiking neural networks. Sensory processing algorithms are still under investigation and their usefulness in natural environments are still relatively unexplored compared to algorithms using conventional sensors and digital hardware. We developed hardware test beds that allow us to explore event-based sensory processing algorithms and regular sampling based algorithms in real-world conditions. The goal of my thesis is three-fold: 1) to develop a hardware test bed for implementing spiking networks together with spiking sensors to study a possibility of using multiple sensors of different modalities to improve classification performance in real-world conditions; 2) to implement a local automatic gain control mechanism to increase the input dynamic range of a spiking cochlea operating in natural environments where the sound dynamic range can be greater than 60 dB; 3) to implement a multi-microphone hardware platform that can be used for real-time beamforming as part of a wireless acoustic sensor network. The first part of the thesis describes development of a real-time hardware system that fuses information from neuromorphic spiking sensors of different modalities. The core of the system is a general purpose accelerator for spiking Deep Neural Networks (DNN) implemented on a Field-Programmable Gate Array (FPGA). We demonstrate the performance of the system on an audio-visual sensor fusion task using a Dynamic Vision Sensor (DVS) and a Dynamic Audio Sensor (DAS) spiking sensors for classification of digits from the Modified National Institute of Standards and Technology (MNIST) dataset augmented with specific audio tones for each digit. We demonstrate that reliable classification is possible with just a fraction of spikes produced by the sensors. On the other hand, processing the full stream of spikes increases the computational demand of the system proportionally to the increase of the spike rate. In addition, the spike rate of the audio sensor depends on the input signal amplitude, which makes it difficult to train classifiers to be invariant to input signals with a wide dynamic range. However, it is known that biological audio and visual processing systems can accommodate to input signals that differ by orders of magnitude, while maintaining a moderate neuron spike rate. The second part of the thesis addresses the problem of increasing spike rates in response to high amplitude signals in the spiking silicon cochlea by developing a local spike-based gain control algorithm, that constantly monitors the spike rate at the output of each channel and adapts the corresponding channel gain, so that its spike rate would not exceed a predefined threshold. We implemented this algorithm in hardware for the Dynamic Audio Sensor Low Power (DASLP) silicon cochlea and studied its performance on synthetic tests and real audio classification problem. The third part of the thesis work is carried out within a multi-partner European project, COCOHA (COgnitive COntrol of a Hearing Aid, www.cocoha.org), that aimed to develop a system for attention decoding from electroencephalogram (EEG) signals for directing the speech of an attended talker to the user of a hearing aid device. The goal of this work is to construct a synchronized distributed multi-microphone platform which can be used for general auditory scene analysis. The developed platform is composed of multi-microphone modules which can perform synchronized audio sampling at different parts of the room and transmit the audio streams with low latency to a central processing unit, where the samples from different microphones can be aligned with a sub-microsecond precision. Synchronized sampling across the ad-hoc distributed microphone array enables a variety of algorithms to be used for further processing, e.g. for tasks such as beamforming, source separation or speech enhancement. The platform was used for testing a set of beamforming algorithms in the wild. All three parts serve a common goal of enabling application of novel auditory sensing technology in practically relevant settings, by coping with challenges of real-world deployment. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000502086

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Liu, Shih-Chii
Examiner: Hahnloser, Richard H.R.
Examiner: Conradt, Jörg

Publisher

ETH Zurich

Subject

sensor fusion; Spiking deep neural networks; Event-Driven Sensors; automatic gain control; wireless acoustic sensor networks; wireless synchronization; audio source separation; beamforming

Organisational unit

03774 - Hahnloser, Richard H.R. / Hahnloser, Richard H.R.
08836 - Delbrück, Tobias (Tit.-Prof.)

Related publications and datasets

Has part: https://doi.org/10.1109/ISCAS.2016.7539099

Has part: https://doi.org/10.1109/LCN.Workshops.2017.62

Has part: https://doi.org/10.1109/ISCAS51556.2021.9401742

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Hardware systems for low-latency audio processing: Event-based and multichannel synchronous sampling approaches Mendeley CSV RIS BibTeX

Hardware systems for low-latency audio processing: Event-based and multichannel synchronous sampling approaches

Mendeley

CSV

RIS

BibTeX