S&T cooperation project 'Amadee' Austria-France 2013-14, "Frame Theory for Sound Processing and Acoustic Holophony", FR 16/2013

Project Partner: The Institut de recherche et coordination acoustique/musique (IRCAM)

The FWF project "Time-Frequency Implementation of HRTFs" has started.

Principal Investigator: Damian Marelli

Co-Applicants: Peter Balazs, Piotr Majdak

Weiterlesen: Start des FWF-Projekts "Time-Frequency Implementation of HRTFs"

In signal processing, synthesis is important in addition to analysis. This is especially true for the modification of data. For the Short-Time Fourier Transformation, the synthesis is often done using a simple overlap add (OLA), which is the sum of the outputs of the filter. Also, the output is re-weighted with the analysis window, such as occurs when using the phase vocoder. It is often presumed that with standard windows this will give satisfactory results.

Aside from Gabor frame theory, if the well-known construction of synthesis windows was possible, it would guarantee perfect reconstruction. However, this method is not used often in signal processing algorithms.

In this project, we will systematically investigate if and for which parameters the respective OLA synthesis with the original window gives good reconstruction. We will compare it to the reconstruction with the dual window, introducing and motivating it as perfect reconstruction overlap add (PROLA). We will show that this method is always preferable to others and that it can be calculated very efficiently.

This is currently being implemented in STx. There the phase vocoder will have the option to guarantee perfect reconstruction, either with dual or tight windows.

Department of Mathematics, University of Wisconsin-Eau Claire

The Short-Time Fourier Transform (STFT), in its sampled version (the Gabor transform), is a well known, valuable tool for displaying the energy distribution of a signal over the time-frequency plane. The equivalence between Gabor analysis and certain filter banks is a well-known fact. The main task is how to find a Gabor analysis-synthesis system with perfect (or depending on the application, satisfactorily accurate) reconstruction in a numerically efficient way. This is done by using the dual Gabor frame, which implies the need to invert the Gabor frame operator.

This project incorporates an application of the general idea of preconditioning in the context of Gabor frames. While most (iterative) algorithms aim at a relatively costly exact numeric calculation of the inverse Gabor frame matrix, we will use a "cheap method" to find an approximation. The inexpensive method will be based on (double) preconditioning using diagonal and circulant preconditioners. As a result, good approximations of the true dual Gabor atom can be obtained at low computational costs.

For a number of applications, such as time stretching without changing the frequency content in audio processing or more complex modifications like psychoacoustical masking, the time domain signal needs to be reconstructed using the time-frequency domain coefficients.

H. G. Feichtinger et al., NuHAG, Facultyof Mathematics, University of Vienna

- P. Balazs, H.G. Feichtinger, M. Hampejs, G. Kracher; "Double Preconditioning for Gabor Frames"; IEEE Transactions on Signal Processing, Vol. 54, No.12, December 2006 (2006), preprint
- P. Balazs, H.G. Feichtinger, M. Hampejs, G. Kracher; "Double Preconditioning for the Gabor Frame Operator"; Proceedings ICASSP '06, May 14-19, Toulouse, DVD (2006)

This project is part of a project cluster that investigates time-frequency masking in the auditory system, in cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille. While other subprojects study the spread of masking across the time-frequency plane using Gaussian-shaped tones, this subproject investigates how multiple Gaussian maskers distributed across the time-frequency plane create masking that adds up at a given time-frequency point. This question is important in determining the total masking effect resulting from the multiple time-frequency components (that can be modeled as Gaussian Atoms) of a real-life signal.

Both the maskers and the target are Gaussian-shaped tones with a frequency of 4 kHz. A two-stage approach is applied to measure the additivity of auditory masking. In the first stage, the levels of the maskers are adjusted to cause the same amount of masking in the target. In the second stage, various combinations of those maskers are tested to study their additivity.

In the first study, the maskers are spread either in time OR in frequency. In the second study, the maskers are spread in time AND in frequency.

New insight into the coding of sound in the auditory system could help to design more efficient audio codecs. These codecs could take the additivity of time-frequency masking into account.

WTZ (project AMADEUS)

- Laback, B., Balazs, P., Toupin, G., Necciari, T., Savel, S., Meunier, S., Ystad, S., Kronland-Martinet, R. (2008). Additivity of auditory masking using Gaussian-shaped tones, presented at Acoustics? 08 conference, Paris.

A Gaussian Atom is suitable as an ideal atom for the time frequency representation of the human audio perception. This is not only because of the Gaussian Atom's special mathematic features, but also because of results from existing psychoacoustic studies. Developing a time-frequency mask (occlusion) requires testing the time-frequency masking effects of this atom. So far, short-tape limited signals have not been investigated in masking experiments. Relatively few psychoacoustic experiments have been explored completely, and these have been combined with time-frequency effects.

In cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille, an experimental protocol was developed for testing the time-frequency method of a singular Gaussian atom. Experiments were made for the first time in 2006, and gave the first results concerning the hearing threshold and the existence of such a signal. The experiments that included the masking threshold began as a PhD project before the end of 2006 in Marseille.

Efficient implementation of a masking filter offers many applications:

- Sound / Data Compression
- Sound Design
- Back-and-Foreground Separation
- Optimization of Speech and Music Perception

After completing the testing phase, the algorithms are to be implemented in S_TOOLS-STx.

- Amadée: Time Frequency Representations and Auditory Perception
- Cotutelle de thèse
- Experiments studying additivity of masking for multiple maskers

WTZ (project AMADEUS)

- Laback, B., Balazs, P., Toupin, G., Necciari, T., Savel, S., Meunier, S., Ystad, S., Kronland-Martinet, R. (2008). Additivity of auditory masking using Gaussian-shaped tones, presented at Acoustics? 08 conference.

The identification of the parameters of the vocal tract system can be used for speaker identification.

A preferred speech coding technique is the so-called Model-Based Speech Coding (MBSC), which involves modeling the vocal tract as a linear time-variant system (synthesis filter). The system's input is either white noise or a train of impulses. For coding purposes, the synthesis filter is assumed to be time-invariant during a short time interval (time slot) of typically 10-20 msec. Then, the signal is represented by the coefficients of the synthesis filter corresponding to each time slot.

A successful MBSC method is the so-called Linear Prediction Coding (LPC). Roughly speaking, the LPC technique models the synthesis filter as an all-pole linear system. This all-pole linear system has coefficients obtained by adapting a predictor of the output signal, based on its own previous samples. The use of an all-pole model provides a good representation for the majority of speech sounds. However, the representation of nasal sounds, fricative sounds, and stop consonants requires the use of a zero-pole model. Also, the LPC technique is not adequate when the voice signal is corrupted by noise.

We propose a method to estimate a zero-pole model which is able to provide the optimal synthesis filter coefficients, numerically efficient and optimal when minimizing a logarithm criterion.

In order to evaluate the perceptual relevance of the proposed method, we used the model estimated from a speech signal to re-synthesis it:

- D. Marelli, P. Balazs, "On Pole-Zero Model Estimation Methods Minimizing a Logarithmic Criterion for Speech Analysis", IEEE Transactions on Audio, Speech and Language Processing, Vol. 18 (2), pp. 237 - 248 (2010)

Gabor multipliers are an efficient time-variant filtering tool used implicitly in many engineering applications in signal processing. For these operators, the result of a Gabor transform (the sampled version of the Short Time Fourier Transform) is multiplied by a fixed function (the time-frequency mask or symbol). The result is then synthesized.

Transforms other than the Gabor transform, particularly the wavelet transform, are more suitable for certain applications. The concept of multipliers can easily be extended in this case. This results in the introduction of operators called wavelet multipliers, which will be investigated in detail in this project. The project aims to precisely define wavelet multipliers' mathematical properties and optimize their use in applications.

The problem will be approached using modern wavelet theory, harmonic analysis tools, and numeric tools. Formulation and demonstration of analytic statements will be conducted jointly with systematic numeric experiments in order to study the properties of wavelet multipliers.

The following topics will be investigated in the project:

- Eigenvalues and eigenvectors of wavelet multipliers
- Invertibility and injectivity of wavelet multipliers
- Reproducing kernel invariance
- Discretization and implementation of wavelet multiplier
- Best approximation of operators by wavelet multipliers and identification of wavelet multipliers

The applications of wavelet multipliers in signal processing are numerous and include any application requiring time-variant filtering. Some applications of wavelet multipliers will be investigated further in the parallel projects:

- Mathematical Modeling of Auditory Time-Frequency Masking Functions
- Improvement of Head-Related Transfer Function Measurements
- Advanced Method of Sound Absorption Measurements

- Anaïk Olivero: "Expérimentation des multiplicateurs temps-échelle" (On the time-scale multipliers) Master thesis under the supervision of R. Kronland-Martinet and B. Torrésani, June 2008

Measuring sound absorption is essential to performing acoustic measurements and experiments under controlled acoustic conditions, especially considering the acoustic influence of room boundaries.

So-called "in-situ" methods allow measurement of the reflection and absorption coefficients under real conditions in a single measurement procedure. The method proposed captures the direct signal and reflections in one measurement. These reflections not only include the direct, interesting one, but also others from the surroundings. To separate the reflections coming from the tested surface, the influence of the direct signal and other reflections must be cancelled.

One known separation method uses a time-windowing technique to separate the direct signal from the reflections. When the impulse response of the direct signal and reflections overlap in time, this method is no longer satisfactory. Frequency-dependent windowing is necessary to separate the different parts of the signal. However, in the wavelet domain, it is possible to observe separation of the interesting reflection.

The objective of this project is to study how the use of wavelet multipliers could improve the efficiency of the in-situ methods in this context .

A demonstrator system will be built to acquire the necessary measurements for the evaluation of absorption coefficients. This demonstrator will be used to evaluate the usefulness of the new methods in a semi-anechoic room.

A systematic numeric study will be carried out on the acquired signals, in order to manually determine the symbol of a wavelet multiplier for the extraction of the reflected signal. The best parameters for optimal separation will then be investigated. This, in combination with the use of physical models, will help design a semi-automatic method for the calculation of the optimal multiplier symbol.

The improved measurement method will be available for in-situ measurement of reflection and absorption coefficients

It is known in psychoacoustics that not all information contained in a "real world" acoustic signal is processed by the human auditory system. More precisely, it turns out that some time-frequency components mask (overshadow) other components that are close in time or frequency.

In the software S_TOOLS-ST^{x} developed by the Institute, an algorithm based on simultaneous masking has been implemented. This algorithm removes perceptually irrelevant time-frequency components. In this implementation, the model is described as a Gabor multiplier with an adaptive symbol.

In this project, the masking model will be extended to a true time-frequency model, incorporating frequency and temporal masking.

Experiments have been conducted (in cooperation with the Laboratory for Mechanics and Acoustics / CNRS Marseille) to test the time-frequency masking properties of a single Gaussian atom, and to study the additivity of these masking properties for several Gaussian atoms.

The results of these experiments will be used, in combination with theoretical results obtained in the parallel projects studying the mathematical properties of frame multipliers, to approximate or identify the masking model by wavelet and Gabor multipliers.

The obtained model will then be validated by appropriate psychoacoustical experiments.

Efficient implementation of a masking filter offers many applications:

- Sound / Data Compression
- Sound Design
- Back-and-Foreground Separation
- Optimization of Speech and Music Perception

After completing the testing phase, the algorithms are to be implemented in S_TOOLS-STx.

- P. Balazs, B. Laback, G. Eckel, W. Deutsch, "Introducing Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking", IEEE Transactions on Audio, Speech and Language Processing (2009), in press
- B. Laback, P. Balazs, G. Toupin, T. Necciari, S. Savel, S. Meunier, S. Ystad and R. Kronland-Martinet, "Additivity of auditory masking using Gaussian-shaped tones", Acoustics'08, Paris, 29.06.-04.07.2008 (03.07.2008)
- B. Laback, P. Balazs, T. Necciari, S. Savel, S. Ystad, S. Meunier and R. Kronland-Martinet, "Additivity of auditory masking for Gaussian-shaped tone pulses", preprint