About me
I'm a research engineer and creative technologist based in Thuringia, Germany. I studied media engineering (B. Eng.) at the Hochschule der Medien in Stuttgart and shifted my focus towards machine learning in my master studies (M. Sc.) at TU Ilmenau.
Publications
-
An Introduction to Unsupervised Domain Adaptation in Sound and Music Processing
Link →Abstract
Common machine learning models require large amounts of training data with samples representing the intended application scenario. However, these models often do not generalize well to novel data distributions caused by variations of the expected conditions. Such a lack of robustness can lead to a significant decrease in the model performance. This issue is known as domain shift and can be caused in the case of audio data by deviations of microphone characteristics or acoustic environments between data from the source domain (training data) and target domain (test data). Unsupervised domain adaptation (UDA) aims to restore the model performance by transferring knowledge from labeled samples of the source domain to unlabeled samples of a related target domain. We first provide an overview over basics and general approaches of UDA. Then, we study UDA for two audio analysis tasks: sound event detection (SED) and automatic music transcription (AMT) of piano music. Our results show that domain shift caused by microphone mismatch has a greater impact on the model performance for SED than AMT. As a possible cause we suspect that while SED analyzes the full spectral envelope, AMT examines only the harmonic peaks whose positions are less affected by domain shift.
-
Multi-pitch Estimation meets Microphone Mismatch: Applicability of Domain Adaptation
Link →Abstract
The performance of machine learning (ML) models is known to be affected by discrepancies between training (source) and real-world (target) data distributions. This problem is referred to as domain shift and is commonly approached using domain adaptation (DA) methods. As one relevant scenario, automatic piano transcription algorithms in music learning applications potentially suffer from domain shift since pianos are recorded in different acoustic conditions using various devices. Yet, most currently available datasets for piano transcription only cover ideal recording situations with high-quality microphones. Consequently, a transcription model trained on these datasets will face a mismatch between source and target data in real-world scenarios. To address this issue, we employ a recently proposed dataset which includes annotated piano recordings covering typical real-life recording settings for a piano learning application on mobile devices. We first quantify the influence of the domain shift on the performance of a deep learning-based piano multi-pitch estimation (MPE) algorithm. Then, we employ and evaluate four unsupervised DA methods to reduce domain shift. Our results show that the studied MPE model is surprisingly robust to domain shift in microphone mismatch scenarios and the DA methods do not notably improve the transcription performance.
-
A Benchmark Dataset to Study Microphone Mismatch Conditions for Piano Multipitch Estimation on Mobile Devices
Link →Abstract
In this paper, we present the IDMT-PIANO-MM dataset, which allows to evaluate piano transcription algorithms under microphone mismatch conditions. In particular, we discuss specific constraints that these algorithms need to face when being used in music learning applications on mobile devices. Then, we describe the dataset w.r.t. recording locations and devices as well as the recorded music pieces. We intend this dataset to be a public benchmark to evaluate the robustness of AI-based MPE models within realistic microphone-mismatch conditions, which are to be expected with the large number of potential users of music learning applications.
-
A Novel Dataset for Time-Dependent Harmonic Similarity between Chord Sequences
Link →Abstract
State-of-the-art algorithms in many music information retrieval (MIR) tasks such as chord recognition, multipitch estimation, or instrument recognition rely on deep learning algorithms, which require large amounts of data to be trained and evaluated. In this paper, we present the IDMT-SMT-CHORD-SEQUENCES dataset, which is a novel synthetic dataset of 15,000 chord progressions played on 45 different musical instruments. The dataset is organized in a triplet fashion and each triplet includes one ""anchor"" chord sequence as well as one corresponding similar and dissimilar chord progression. The audio files are synthesized from MIDI data using FluidSynth with a selected sound font. Furthermore, we conducted a benchmark experiment on time-dependent harmonic similarity based on learnt embedding representations. The results show that a convolutional neural network (CNN), which considers the temporal context of a chord progression, outperforms a simpler approach based on temporal averaging of input features.
-
Comparison of material models in modern physically based rendering pipelines
Link →Abstract
The appearance of materials results from a complex interaction of light, material properties and the geometric shape of an object. In computer graphics, various models were developed to describe these correlations. Modern rendering pipelines commonly adapt the philosophy of physically based rendering (PBR). This study examines if the reproduction of materials differs across modern PBR tools, and compares the intuitiveness of material design, the quality and range of reproducible materials. A sequential rendering framework was developed to evaluate the visual influences of four selected parameters on material appearance. The rendered images are qualitatively compared based on material charts, scanline plots and difference images. The examined rendering tools mostly yield similar results, with the main differences caused by disparate rendering methods. Still, subtle variations between the tools are noticable, indicating the individual strengths and flaws of each renderer in terms of intuitiveness and physical accuracy.
Music
Your music projects and compositions go here.
Interactive Installations
-
Applied Magic
2019 - 2020
An interactive installation for celebrating the 40th anniversary of the Audiovisual Media course at Stuttgart Media University.
-
Schlemmer x Beats
2019 - 2020
An interactive Techno Art Club at the Stuttgarter Staatsgalerie based on Oskar Schlemmer's "Triadic Ballet".
Games
-
Of Ships & Scoundrels
2018 - 2021
A pirate strategy game by KORION Interactive. I contributed as game developer.
-
Fynn's Journey: Heading North
2019 - 2020
A sidescroller adventure game which I assisted as developer and technical artist.
-
Fermata
2019
Our result of the Music Game Jam at the ITFS 2019 in cooperation with Stuttgart Chamber Orchestra.
-
(H) - The Narrative Exploration Game
2018
An atmospheric psycho-horror game about getting home.