Forschungsthemen/-gruppen

Dictionary Learning for Sparse Audio Inpainting

Georg Tauböck, Shristi Rajbamshi, and Peter Balazs

This is the accompanying webpage for the article entitled "Dictionary Learning for Sparse Audio Inpainting" submitted to the IEEE Journal of Selected Topics in Signal Processing.

Abstract: The objective of audio inpainting is to fill a gap in a signal, ideally, to reconstruct the original signal but, at least, to infer a meaningful surrogate signal. We propose a novel approach applying sparse modeling in the time-frequency (TF) domain. In particular, we devise a dictionary learning technique which learns the dictionary from reliable parts around the gap with the goal to obtain a signal representation with increased TF sparsity. This is based on a basis optimization technique to deform a given Gabor frame such that the sparsity of the analysis coefficients of the resulting frame is maximized. Furthermore, we modify the SParse Audio INpainter (SPAIN) for both the analysis and the synthesis model such that it is able to exploit the increased TF sparsity and – in turn – benefits from dictionary learning. Our experiments demonstrate that the developed methods achieve significant gains in terms of signal-to-distortion ratio (SDR) and objective difference grade (ODG) compared with several state-of-the-art audio inpainting techniques.

Preprint: coming soon

Software: MATLAB Simulation Scripts

 

 

 

Phonetik bei der Österreichischen Linguistiktagung

Seit der 37. Österreichischen Linguistiktagung 2009 ist das Institut für Schallforschung regelmäßig auf Österreichischen Linguistiktagungen (ÖLT) mit phonetischen Themen zu Gast. Seit 2013 wird durchgehend ein eigener Workshop zum Thema Phonetik im Rahmen der ÖLT abgehalten. Um allen Interessierten einen Ort zur Nachschau sowie Ankündigung zukünftiger Ereignisse zu bieten, werden hier Informationen zu den phonetischen Workshops der ÖLT gesammelt und zugänglich gemacht.

Aktuelles

Nächste ÖLT: 46. Österreichische Linguistiktagung in Wien COVID-19-bedingt verschoben auf 2021!

Die Organisation des geplanten Phonetik-Workshops bedankt sich bei allen Interessierten für die zahlreichen Einsendungen und hoffen auf ein Wiedersehen 2021!

Tagungswebsite 46. ÖLT 2020

 

Frühere Tagungen

Hier finden Sie Informationen zu den phonetischen Workshops bei der ÖLT der vergangenen Jahre.

45. Österreichische Linguistiktagung 2019, Salzburg

"Phonetik in und über Österreich 2019"

Organisation: Nicola Klingler, Hannah Leykum, Jan Luttenberger, Michael Pucher, Carolin Schmid (Institut für Schallforschung, Österreichische Akademie der Wissenschaften) und Johanna Fanta-Jende (Institut für Germanistik, Universität Wien)

Call for Abstracts 45. ÖLT 2019

Programm 45. ÖLT 2019 Freitag 06.12.2019

Programm 45. ÖLT 2019 Samstag 07.12.2019

44. Österreichische Linguistiktagung 2018, Innsbruck

"Phonetik und Sprachtechnologie"

Organisation: Nicola Klingler, Hannah Leykum und Michael Pucher (Institut für Schallforschung, Österreichische Akademie der Wissenschaften)

Call for Abstracts 44. ÖLT 2018

43. Österreichische Linguistiktagung 2017, Klagenfurt

"Phonetik in und über Österreich"

Leitung: Sylvia Moosmüller† (Institut für Schallforschung, Österreichische Akademie der Wissenschaften)
Koordination: Michaela Rausch-Supola (Institut für Schallforschung, Österreichische Akademie der Wissenschaften)

Call for Abstracts 43. ÖLT 2017

42. Österreichische Linguistiktagung 2016, Graz

"Phonetik & Phonologie"
Organisation: Dina El Zarka, Petra Hödl, Ralf Vollmann (Universität Graz)

Programm 42. ÖLT 2016

41. Österreichische Linguistiktagung 2014, Wien

"Phonetik in und über Österreich"
Leitung: Sylvia Moosmüller† und Carolin Schmid (Institut für Schallforschung, Österreichische Akademie der Wissenschaften)

Call for Abstracts 41. ÖLT 2014
Programm Workshop 41. ÖLT 2014

40. Österreichische Linguistiktagung 2013, Salzburg

"Arbeitsgemeinschaft Soziophonetik"
Leitung: Manfred Sellner (Universität Salzburg)

Programm 40. ÖLT 2013

Maschinelles Lernen

In den vergangenen Jahren ist maschinelles Lernen zunehmend fester Bestandteil unseres alltäglichen Lebens geworden. Ob wir ein Smartphone nutzen, (online) einkaufen, diverse Medien konsumieren, Auto fahren order vieles mehr, maschinelles Lernen (ML) und, allgemeiner, künstliche Intelligenz (KI) unterstützen, beeinflussen und analysieren uns in den unterschiedlichsten Lebenssituationen. Insbesondere Deep Learning, d.h. Lernverfahren basierend auf künstlichen neuralen Netzen, findet in vielen Bereichen Anwendung.

Auch in den Wissenschaften haben ML und KI bereits wichtige Impulse freigesetzt und es ist zu erwarten, dass sich dieser Einfluss in Zukunft auf ein noch breiteres Feld wissenschaftlicher Disziplinen ausbreitet. Dadurch steigt sowohl das Interesse an einem tieferen, wissenschaftlich fundierten Verständnis von ML Methoden, als auch die Notwendigkeit, dass Wissenschaftler unterschiedlichster Disziplinen ein ausgeprägtes Verständnis für die Anwendung und das Design solcher Methoden entwickeln.

Das Institut für Schallforschung, das anwendungsoffene Grundlagenforschung im Bereich der Akustik betreibt, stellt sich dieser Herausforderung und gründete daher die Machine Learning-Forschungsgruppe. Sie beleuchtet die unterschiedlichen Aspekte von maschinellem Lernen und künstlicher Intelligenz mit besonderem Augenmerk auf potentielle Anwendungen in der Akustik. Die Zusammenarbeit von Wissenschaftlern unterschiedlicher Disziplinen in den Bereichen ML und KI wird es dem Institut für Schallforschung nicht nur ermöglichen, in allen Bereichen der Schallforschung richtungsweisende Fortschritte zu machen, sondern auch essentielle Beiträge zu theoretischen Fragestellungen in dem hochaktuellen Forschungsfeld künstlicher Intelligenz zu leisten.


Mitarbeiter/innen

AABBA is an intellectual open group of scientists collaborating on development and applications of models of human spatial hearing

AABBA's goal is to promote exploration and development of binaural and spatial models and their applications.

AABBA members are academic scientists willing to participate in our activities. We meet annually for an open discussion and progress presentation, especially encouraging to bring in students and young scientists associated with members’ projects to our meetings. Our activities consolidate in joint publications and special sessions at international conferences. As a relevant tangible outcome, we provide validated (source) codes for published models of binaural and spatial hearing to our collection of auditory models, known as the auditory modeling toolbox (AMT).

Structure

  • Executive board: Piotr Majdak, Armin Kohlrausch, Ville Pulkki

  • Regular members:
    • Aachen: Janina Fels, ITA, RWTH Aachen
    • Bochum: Dorothea Kolossa, Ruhr-Universität Bochum
    • Cardiff: John Culling, School of Psychology, Cardiff University
    • Copenhagen: Torsten Dau & Tobias May, DTU, Lyngby
    • Dresden: Ercan Altinsoy, TU Dresden
    • Ghent: Sarah Verhulst & Alejandro Osses, Ghent University
    • Guangzhou: Bosun Xie, South China University of Technology, Guangzhou
    • Helsinki: Ville Pulkki & Nelli Salminen, Aalto University
    • Ilmenau: Alexander Raake, TU Ilmenau
    • Kosice: Norbert Kopčo, Safarik University, Košice
    • London: Lorenzo Picinali, Imperial College, London
    • Lyon: Mathieu Lavandier, Université de Lyon
    • Munich I: Werner Hemmert, TUM München
    • Munich II: Bernhard Seeber, TUM München 
    • Oldenburg I: Bernd Meyer, Carl von Ossietzky Universität Oldenburg
    • Oldenburg II: Mathias Dietz, Carl von Ossietzky Universität Oldenburg
    • Oldenburg-Eindhoven: Steven van de Par & Armin Kohlrausch, Universität Oldenburg
    • Paris: Brian Katz, Sorbonne Université
    • Patras: John Mourjopoulos, University of Patras
    • Rostock: Sascha Spors, Universität Rostock
    • Sheffield: Guy Brown, The University of Sheffield
    • Tabriz: Masoud Geravanchizadeh, University of Tabriz
    • Toulouse: Patrick Danès, Université de Toulouse
    • Troy: Jonas Braasch, Rensselaer Polytechnic Institute, Troy
    • Vienna: Bernhard Laback & Robert Baumgartner, Austrian Academy of Sciences, Wien
    • The AMT (Umbrella Project): Piotr Majdak
  • Honorary member and founder: Jens Blauert

AABBA Group 2020
AABBA group as of the 12th meeting 2020 in Vienna.

Meetings

Annual meetings are held at the beginning of each year:

  • 12th meeting: 16-17 January 2020, ViennaScheduleGroup photo
  • 11th meeting: 19-20 February 2019, Vienna. ScheduleGroup photo
  • 10th meeting: 30-31 January 2018, Vienna. Schedule. Group photo
  • 9th meeting: 27-28 February 2017, Vienna. Schedule.
  • 8th meeting: 21-22 January 2016, Vienna. Schedule.
  • 7th meeting: 22-23 February 2015, Berlin.
  • 6th meeting: 17-18 February 2014, Berlin.
  • 5th meeting: 24-25 January 2013, Berlin.
  • 4th meeting: 19-20 January 2012, Berlin.
  • 3rd meeting: 13-14 January 2011, Berlin.
  • 2nd meeting: 29-30 September 2009, Bochum.
  • 1st meeting: 23-26 March 2009, Rotterdam.

Activities

  • Upcoming: Structured Session "Binaural models: development and applications" at the Forum Acusticum 2020, Lyon.
  • Special Session "Binaural models: development and applications" at the ICA 2019, Aachen.
  • Special Session "Models and reproducible research" at the Acoustics'17 (EAA/ASA) 2017, Boston.
  • Structured Session "Applied Binaural Signal Processing" at the Forum Acusticum 2014, Krakòw.
  • Structured Session "The Technology of Binaural Listening & Understanding" at the ICA 2016, Buenos Aires.

Contact person: Piotr Majdak

The Musicality and Bioacoustics group merges music and biology to study the origins of music through cross-species studies. Like language, music is found in all cultures around the world. Even isolated cultures have music, and all musical systems share important parallels such as the use of discrete notes and a steady beat.

Here we study other animals to try and understand what aspects of music are uniquely human and why humans may have developed these abilities. Specifically, here are some active research directions of the group:

  • Cross-species tests using operant conditioning to train and test human and non-human animal sound categorization
  • Cross-species tests of the preferences for different sounds
  • Bioacoustic analysis of animal vocalizations using linguistics and computational methods

The budgerigar laboratory facilities are currently in the Department of Cognitive Biology at the University of Vienna where we also have collaborations with some of the other species that are housed there.

Staff

This web page provides resources for the figures and the implementation of inversion of frame multipliers in the research manuscript:

 

"A survey on the unconditional convergence and the invertibility of multipliers with implementation"

Diana T. Stoeva and Peter Balazs

 

Abstract:

The paper presents a survey over frame multipliers and related concepts. In particular, it includes a short motivation of why multipliers are of interest to consider, a review as well as extension of recent results, devoted to the unconditional convergence of multipliers, sufficient and/or necessary conditions for the invertibility of multipliers, and representation of the inverse via Newmann-like series and via multipliers with particular parameters. Multipliers for frames with specific structure, namely, Gabor and wavelet multipliers, are also considered. Some of the results for the representation of the inverse multiplier are implemented in Matlab codes and the implementations are described.

 

Here we provide:

- the scripts which were used to generate Fig. 1 and Fig.2 in the paper;

- implementation of Propositions 8, 9, and 11, written in Matlab-codes using the Matlab/Octave toolbox Linear Time-Frequency Analysis (LTFAT) [2] (version ... and above).

In order to run the codes, provided below, first one needs to install the toolbox LTFAT, freely available at Sourceforge.

 

I. Fig. 1 in the paper and the script, which was used to generate this figure (an illustrative example to visualize a multiplier).

 

Fig.1 An illustrative example to visualize a multiplier.

(TOP LEFT) The time-frequency representation of the music signal $f$. (TOP RIGHT) The symbol $m$, found by a (manual) estimation of the

time-frequency region of the singer's voice. (BOTTOM LEFT) The multiplication in the TF domain. (BOTTOM RIGHT) Time-frequency representation

of $M_{m,\widetilde \Psi,\Psi}f$.

 

Fig. 1 was produced via the script testGabMulExp_new.m using the original sound-file originalsignal.wav and the manually determined symbol Symbol6_BW.png.

The script also provides the modified signal (obtained when applying the symbol/mask on the original signal) and you can listen it here.

 

II. Implementation of inversion of multipliers according to Section 3.2.3 of the paper.

II.1. Implementation of Proposition 8

(a) Implementation of inversion of multipliers $M_{m,\Phi,\Psi}$ (M1) and $M_{m,\Psi,\Phi}$ (M2) for positive m according to Proposition 8 is done in the program Prop8MultiplierInversionOp.m, which involves the function Prop8InvMultOp.m.

function [TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOp(c,r,TPhi,TG,m,e)

Running the program "Prop8MultiplierInversionOp.m", the user will be required to enter the following parameters (which are the input-parameters for the function Prop8InvMultOp.m):
c - the number of the frame vectors;
r - the number of the coordinates of the frame vectors;
TPhi - the synthesis matrix (rxc) of the frame $\Phi$;
TG - the synthesis matrix (rxc) of a frame G (with the meaning of $\Psi-\Phi$);
m - the symbol of the multiplier (c numbers in a row);
e - the desired error bound.

Note:
- the program requires entries of m until positive m is entered;
- after entering TPhi, TG, and positive m, the program checks if they satisfy the assumptions of Prop. 8 and if not,
the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 8.

The implementation is done using an iterative algorithm according to Prop. 8, until one reaches the desired error-bound e.

The output of the program "Prop8MultiplierInversionOp.m'':
TPsi - the synthesis operator of $\Psi$,
M1 - the multiplier $M_{m,\Phi,\Psi}$,
M2 - the multiplier $M_{m,\Psi,\Phi}$,
M1inv - the iteratively inverted M1,
M2inv - the iteratively inverted M2,
M1invMatlab - the inversion of M1 using the matlab-command ``inv'' (for comparison reason),
M2invMatlab - the inversion of M2 using the matlab-command ``inv'' (for comparison reason),
n - the number of the iteration steps.

Note:
After presenting the output parameters, the program allows the user to
- either enter new $TG$ and new error-bound e, and repeat the inversion procedure,
- or to terminate the program by pressing zero.

A demo-file (applying "Prop8InvMultOp.m" with concrete parameters) is available in the script Prop8InvMultOpRun.m.

(b) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}f$ and $M_{m,\Psi,\Phi}^{-1}f$ for given f (and for positive m) is done in the program Prop8MultiplierInversionf.m, which involves the function Prop8InvMultf.m.

function [TPsi,M1,M2,M1invf,M2invf,n] = Prop8InvMultf(c,r,TPhi,TG,m,f,e)

The implementation goes in a similar way as in (a), requiring one more input, namely f, and using appropriate modification of the iteration steps.

A demo-file (applying Prop8InvMultf.m with concrete parameters) is available in the script Prop8InvMultfRun.m.

(c) Implementation of computation of $M_{m,\Phi,\Psi}^{-1}$ and $M_{m,\Psi,\Phi}^{-1}$ for positive $m$ and Gabor frames $\Phi$ and $\Psi$ is done in the program Prop8MultiplierInversionOpGabor.m, which involves the function Prop8InvMultOpGabor.m.

function [TPhi,TPsi,M1,M2,M1inv,M2inv,n] = Prop8InvMultOpGabor(L,a, M,gPhi,gG,m,e)

The implementation of the inversion is like the one in (a), but using $\Phi$ and $\Psi$ which are Gabor frames.

The input parameters of "Prop8MultiplierInversionOpGabor.m'':
L - the length of the transform,
a - the time-shift (should be divisor of L),
M - the number of channels (should be divisor of L and bigger or equal to a),
gPhi - the window function of the Gabor frame Phi,
gG - the window function of the Gabor frame G(with the meaning of Psi-Phi),
TPhi - the synthesis matrix ($rxc$) of the frame $\Phi$,
TG - the synthesis matrix ($rxc$) of a frame $G$ (with the meaning of $\Psi-\Phi$),
m - the symbol of the multiplier (ML/a positive numbers),
e - the desired error bound.

The output parameters of "Prop8MultiplierInversionOpGabor.m'':
TPhi - the synthesis operator of the frame Phi,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

A demo-file (applying "Prop8InvMultOpGabor.m" with concrete parameters) is available in the code Prop8InvMultOpGaborRun.m.

For the convergence rate of this algorithm, see Fig.2 below and the script which was used to generate it.


II.2. Implementation of Proposition 9

Implementation of the inversion of $M_{m,\Phi,\Phi}$, $M_{m,\Phi,\Psi}$, and $M_{m,\Psi,\Phi}$ according to Proposition 9 is done in the program Prop9MultiplierInversionOp.m, which involves the function Prop9InvMultOp.m.

function [m,TPsi,M0,M1,M2,M0inv,n0,M1inv,M2inv,n] = Prop9InvMultOp(c,r,TPhi,TG,m,e)

Running the program "Prop9MultiplierInversionOp.m'', the user will be required to enter the same parameters as the ones for "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

Note:
- the program checks whether the entered TPhi and m satisfy the assumpitons of Prop. 9
and if not, the program adjusts m to be within the settings of Prop. 9;
- the program checks whether the entered TPhi, TG, and the adjusted m satisfy the assumpitons of Prop. 9
and if not, the program adjusts TG by multiplication with an appropriate constant in order to be within the settings of Prop. 9.

The input parameters of "Prop9MultiplierInversionOp.m'' are like the ones in "Prop8MultiplierInversionOp.m" (see above, the implementation of Proposition 8(a)).

The output parameters of "Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
M0 - the multiplier $M_{m,\Phi,\Phi}$,
M0inv - the iteratively inverted M0,
n0 - the number of the iteration steps for the inversion of M0,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

A demo-file (applying Prop9InvMultOp.m with concrete parameters) is available in the code Prop9InvMultOpRun.m.

 

II.3. Implementation of Proposition 11

Implementation of the inversion of $M_{m,\Phi,\Psi}$ and $M_{m,\Psi,\Phi}$ according to Proposition 11 is done in the program Prop11MultiplierInversionOp.m, which involves the function Prop11InvMultOp.m.

function [m,TPsi,M1,M2,M1inv,M2inv,n] = Prop11InvMultOp(c,r,TPhi,TPsi,m,e)

Running the program "Prop11MultiplierInversionOp.m'', the user will be required to enter the following parameters (which are the input-parameters for the function "Prop11InvMultOp.m"):
c, r, TPhi, m, e - like the ones in "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).
TPsi - the synthesis matrix (rxc) of an aproximate dual $\Psi$ of the frame $\Phi$.

Note:
- using the entered TPhi and TPsi, the program checks whether $\Psi$ is an approximate dual of $\Phi$
and if not, the program replaces $\Psi$ with the canonical dual of $\Phi$;
- after that the program checks whether $\Phi$, $\Psi$, and m satisfy the assumptions of Prop. 11
and if not, the program adjusts m.

The output parameters of the program "Prop11MultiplierInversionOp.m'': The output parameters of ``Prop9MultiplierInversionOp.m'':
m - the symbol of the multiplier,
and the rest are like the output parameters of "Prop8MultiplierInversionOp.m'' (see above, the implementation of Proposition 8(a)).

A demo-file (applying Prop11InvMultOp.m with concrete parameters) is available in the code Prop11InvMultOpRun.m.

 

I. Fig. 2 in the paper and the script, which was used to generate this figure (the convergence rate of the algorithm in II.1.(c) above).

Fig. 2. The convergence rate of Alg. 3 using base-10 logarithmic scale in the vertical axis and a
linear scale in the horizontal axis. Here the absolute error in each iteration is plotted in red, and
the convergence value predicted in Proposition 8 is plotted in blue.

Fig. 2 was produced using the script Prop8InvMultOpGaborPlotFigure.m which involves the function Prop8InvMultOpGaborForFigure.m.

 

References:

  • [1] D. T. Stoeva and P. Balazs, "On the unconditional convergence and invertibility of multipliers", arXiv.
  • [2] Z. Průša, P. L. Søndergaard, N. Holighaus, C. Wiesmeyr, and P. Balazs, "The Large Time-Frequency Analysis Toolbox 2.0". In: Aramaki M., Derrien O., Kronland-Martinet R., Ystad S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science, vol 8905. Springer, Cham, (2014).

This is the companion Webpage of the manuscript:

Audlet Filter Banks: A Versatile Analysis/Synthesis Framework using Auditory Frequency Scales

Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak, and Olivier Derrien.

Abstract: Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis-synthesis system is the reconstruction error; it has to be kept to a minimum to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of human auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis-synthesis system for audio applications. The proposed system, named Audlet, is based on an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing.

Sound examples for the source separation experiment: click on a system's acronym to hear the corresponding reconstruction.
Reference signals: original mixture -- target

Rt β = 1 β = 1/6 1024-point STFT
1.1 trev_gfb Audlet_gfb Audlet_hann trev_gfb Audlet_gfb Audlet_hann STFT_hann
1.5 trev_gfb Audlet_gfb Audlet_hann trev_gfb Audlet_gfb Audlet_hann STFT_hann
4.0 trev_gfb Audlet_gfb Audlet_hann trev_gfb Audlet_gfb Audlet_hann STFT_hann

Baumgartner et al. (2017a)

Räumliches Hören ist wichtig, um die Umgebung ständig auf interessante oder gefährliche Geräusche zu überwachen und gezielt die Aufmerksam auf sie richten zu können. Die räumliche Trennung der beiden Ohren und die komplexe Geometrie des menschlichen Körpers liefern akustische Information über den Ort einer Schallquelle. Je nach Schalleinfallsrichtung verändert v.a. die Ohrmuschel das Klangspektrum, bevor der Schall das Trommelfell erreicht. Da die Ohrmuschel sehr individuell geformt ist (mehr noch als ein Fingerabdruck), ist auch deren Klangfärbung sehr individuell. Für die künstliche Erzeugung realistischer Hörwahrnehmungen muss diese Individualität so präzise wie nötig abgebildet werden, wobei bisher nicht geklärt ist, was wirklich nötig ist. SpExCue hat deshalb nach elektrophysiologischen Maßen und Vorhersagemodellen geforscht, die abbilden können, wie räumlich realistisch („externalisiert“) eine virtuelle Quelle empfunden wird.

Da künstliche Quellen vorzugsweise im Kopf wahrgenommen werden, eignete sich die Untersuchung dieser Klangspektren zugleich zur Erforschung einer Verzerrung in der Hörwahrnehmung: Schallereignisse, die sich dem Hörer annähern, werden intensiver wahrgenommen als jene, die sich vom Hörer entfernen. Frühere Studien zeigten diese Verzerrung ausschließlich durch Lautheitsänderungen (zunehmende/abnehmende Lautheit wurde verwendet um sich nähernde/entfernende Schallereignisse zu simulieren). Es war daher unklar, ob die Verzerrung wirklich auf Wahrnehmungsunterschiede gegenüber der Bewegungsrichtung oder nur auf die unterschiedlichen Lautstärken zurück zu führen sind. Unsere Studie konnte nachweisen, dass räumliche Änderungen der Klangfarbe diese Verzerrungen (auf Verhaltensebene und elektrophysiologisch) auch bei gleichbleibender Lautstärke hervorrufen können und somit von einer allgemeinen Wahrnehmungsverzerrung auszugehen ist.

Des Weiteren untersuchte SpExCue, wie die Kombination verschiedener räumlicher Hörinformation die Aufmerksamkeitskontrolle in einer Spracherkennungsaufgabe mit gleichzeitigen Sprechern, wie z.B. bei einer Cocktailparty, beeinflusst. Wir fanden heraus, dass natürliche Kombinationen räumlicher Hörinformation mehr Gehinraktivität in Vorbereitung auf das Testsignal herrufen und dadurch die neurale Verarbeitung der zu folgenden Sprache optimiert wird.

SpExCue verglich außerdem verschiedene Ansätze von Berechnungsmodellen, die darauf abzielen, die räumliche Wahrnehmung von Klangänderungen vorherzusagen. Obwohl viele frühere experimentelle Ergebnisse von mindestens einem der Modellansätze vorhergesagt werden konnten, konnte keines von ihnen all diese Ergebnisse erklären. Um das zukünftige Erstellen von allgemeingültigeren Berechnungsmodellen für den räumlichen Hörsinn zu unterstützen, haben wir abschließend ein konzeptionelles kognitives Modell dafür entwickelt.

Funding

Erwin-Schrödinger Fellowship from Austrian Science Funds (FWF, J3803-N30) awarded to Robert Baumgartner. Duration: May 2016 - November 2017.

Follow-up funding provided by Facebook Reality Labs, since March 2018. Project Investigator: Robert Baumgartner.

Publications

  • Baumgartner, R., Reed, D.K., Tóth, B., Best, V., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias, in: Proceedings of the National Academy of Sciences of the USA 114, 9743-9748. (article)
  • Baumgartner, R., Majdak, P., Colburn H.S., Shinn-Cunningham B. (2017): Modeling Sound Externalization Based on Listener-specific Spectral Cues, presented at: Acoustics ‘17 Boston: The 3rd Joint Meeting of the Acoustical Society of America and the European Acoustics Association. Boston, MA, USA. (conference)
  • Deng, Y., Choi, I., Shinn-Cunningham, B., Baumgartner, R. (2019): Impoverished auditory cues limit engagement of brain networks controlling spatial selective attention, in: Neuroimage 202, 116151. (article)
  • Baumgartner, R., Majdak, P. (2019): Predicting Externalization of Anechoic Sounds, in: Proceedings of ICA 2019. (proceedings)
  • Majdak, P., Baumgartner, R., Jenny, C. (2019): Formation of three-dimensional auditory space, in: arXiv:1901.03990 [q-bio]. (preprint)

This page provides resources for the research article:

"Frame Theory for Signal Processing in Psychoacoustics"

by Peter Balazs, Nicki Holighaus, Thibaud Necciari, and Diana Stoeva

to appear in the book "Excursions in Harmonic Analysis" published by Springer.

Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for scientists in audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.

The present ZIP archive features Matlab/Octave scripts that will allow to reproduce the results presented in Figures 7, 10, and 11 of the article.

IMPORTANT NOTE: The Matlab/Octave toolbox Large Time-Frequency Analysis (LTFAT, version 1.2.0 and above) must be installed to run the codes. This toolbox is freely available at Sourceforge.

If you encounter any issue with the files, please do not hesitate to contact the authors.