Hardware systems for low-latency audio processing: Event-based and multichannel synchronous sampling approaches
dc.contributor.author
Kiselev, Ilya
dc.contributor.supervisor
Liu, Shih-Chii
dc.contributor.supervisor
Hahnloser, Richard H.R.
dc.contributor.supervisor
Conradt, Jörg
dc.date.accessioned
2021-08-25T06:14:10Z
dc.date.available
2021-08-24T20:41:35Z
dc.date.available
2021-08-25T06:14:10Z
dc.date.issued
2021
dc.identifier.uri
http://hdl.handle.net/20.500.11850/502086
dc.identifier.doi
10.3929/ethz-b-000502086
dc.description.abstract
Neuromorphic technology is slowly maturing with a variety of useable event-driven spiking sensors and hardware implementations of spiking neural networks. Sensory processing algorithms are still under investigation and their usefulness in natural environments are still relatively unexplored compared to algorithms using conventional sensors and digital hardware.
We developed hardware test beds that allow us to explore event-based sensory processing algorithms and regular sampling based algorithms in real-world conditions. The goal of my thesis is three-fold: 1) to develop a hardware test bed for implementing spiking networks together with spiking sensors to study a possibility of using multiple sensors of different modalities to improve classification performance in real-world conditions; 2) to implement a local automatic gain control mechanism to increase the input dynamic range of a spiking cochlea operating in natural environments where the sound dynamic range can be greater than 60 dB; 3) to implement a multi-microphone hardware platform that can be used for real-time beamforming as part of a wireless acoustic sensor network.
The first part of the thesis describes development of a real-time hardware system that fuses information from neuromorphic spiking sensors of different modalities. The core of the system is a general purpose accelerator for spiking Deep Neural Networks (DNN) implemented on a Field-Programmable Gate Array (FPGA). We demonstrate the performance of the system on an audio-visual sensor fusion task using a Dynamic Vision Sensor (DVS) and a Dynamic Audio Sensor (DAS) spiking sensors for classification of digits from the Modified National Institute of Standards and Technology (MNIST) dataset augmented with specific audio tones for each digit.
We demonstrate that reliable classification is possible with just a fraction of spikes produced by the sensors. On the other hand, processing the full stream of spikes increases the computational demand of the system proportionally to the increase of the spike rate. In addition, the spike rate of the audio sensor depends on the input signal amplitude, which makes it difficult to train classifiers to be invariant to input signals with a wide dynamic range. However, it is known that biological audio and visual processing systems can accommodate to input signals that differ by orders of magnitude, while maintaining a moderate neuron spike rate.
The second part of the thesis addresses the problem of increasing spike rates in response to high amplitude signals in the spiking silicon cochlea by developing a local spike-based gain control algorithm, that constantly monitors the spike rate at the output of each channel and adapts the corresponding channel gain, so that its spike rate would not exceed a predefined threshold. We implemented this algorithm in hardware for the Dynamic Audio Sensor Low Power (DASLP) silicon cochlea and studied its performance on synthetic tests and real audio classification problem.
The third part of the thesis work is carried out within a multi-partner European project, COCOHA (COgnitive COntrol of a Hearing Aid, www.cocoha.org), that aimed to develop a system for attention decoding from electroencephalogram (EEG) signals for directing the speech of an attended talker to the user of a hearing aid device. The goal of this work is to construct a synchronized distributed multi-microphone platform which can be used for general auditory scene analysis. The developed platform is composed of multi-microphone modules which can perform synchronized audio sampling at different parts of the room and transmit the audio streams with low latency to a central processing unit, where the samples from different microphones can be aligned with a sub-microsecond precision. Synchronized sampling across the ad-hoc distributed microphone array enables a variety of algorithms to be used for further processing, e.g. for tasks such as beamforming, source separation or speech enhancement. The platform was used for testing a set of beamforming algorithms in the wild.
All three parts serve a common goal of enabling application of novel auditory sensing technology in practically relevant settings, by coping with challenges of real-world deployment.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
sensor fusion
en_US
dc.subject
Spiking deep neural networks
en_US
dc.subject
Event-Driven Sensors
en_US
dc.subject
automatic gain control
en_US
dc.subject
wireless acoustic sensor networks
en_US
dc.subject
wireless synchronization
en_US
dc.subject
audio source separation
en_US
dc.subject
beamforming
en_US
dc.title
Hardware systems for low-latency audio processing: Event-based and multichannel synchronous sampling approaches
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2021-08-25
ethz.size
130 p.
en_US
ethz.code.ddc
DDC - DDC::6 - Technology, medicine and applied sciences::621.3 - Electric engineering
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.diss
27602
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02533 - Institut für Neuroinformatik / Institute of Neuroinformatics::03774 - Hahnloser, Richard H.R. / Hahnloser, Richard H.R.
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02533 - Institut für Neuroinformatik / Institute of Neuroinformatics::08836 - Delbrück, Tobias (Tit.-Prof.)
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02533 - Institut für Neuroinformatik / Institute of Neuroinformatics::03774 - Hahnloser, Richard H.R. / Hahnloser, Richard H.R.
en_US
ethz.relation.hasPart
10.1109/ISCAS.2016.7539099
ethz.relation.hasPart
10.1109/LCN.Workshops.2017.62
ethz.relation.hasPart
10.1109/ISCAS51556.2021.9401742
ethz.date.deposited
2021-08-24T20:41:41Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-08-25T06:14:23Z
ethz.rosetta.lastUpdated
2022-03-29T11:18:34Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Hardware%20systems%20for%20low-latency%20audio%20processing:%20Event-based%20and%20multichannel%20synchronous%20sampling%20approaches&rft.date=2021&rft.au=Kiselev,%20Ilya&rft.genre=unknown&rft.btitle=Hardware%20systems%20for%20low-latency%20audio%20processing:%20Event-based%20and%20multichannel%20synchronous%20sampling%20approaches
Files in this item
Publication type
-
Doctoral Thesis [30274]