Integration of Automatic Frequency Range Recognition and Multichannel Direction Estimation for Few-Sample Birdsong Event Detection

Black bulbul, a common bird on campus. Left: 15-sec spectrogram of black-bulbul vocalization, recorded along a hiking trail in NTHU. The red box shows an automatically detected frequency range of the bird's vocalization. Combining with the few-shot event detection methods developed by our lab, the system can achieve continuous monitoring of bird calls in the wild. Right: A black bulbul that appeared on the westside of NTHU campus (photo taken on June 18, 2022 by Prof. Yi-Wen Liu).

Sustainable Development Goals

Abstract/Objectives

Birds are vital to ecosystems, and their sounds serve as important indicators of environmental change and biodiversity. Traditional monitoring of bird vocalizations relies on manual methods, which can be labor-intensive and ineffective in complex sound environments. This study presents a new approach using multichannel sound signal processing technology to automate bird sound detection and analysis. It involves recording bird calls in natural settings, creating a labeled dataset for training models. The research also introduces a technique for directional estimation and spatial clustering to identify and group multiple bird sounds, aiding in understanding species distribution. Additionally, the study employs a few-shot learning framework to tackle issues related to limited labeled data and proposes a frequency selection strategy to improve bird classification by focusing on relevant frequency bands while excluding non-discriminative ones.

Results/Contributions

Birds play a crucial role in ecosystems, and their vocal behavior is often regarded as an indicator of environmental changes, biodiversity, and species activity, making them important subjects for ecological monitoring and conservation research. Traditional bird sound monitoring has primarily relied on manual listening and monophonic recording devices. These methods face challenges when dealing with complex soundscapes (such as multiple birds calling simultaneously or ambient noise interference), including difficulties in identification, high labor costs, and a lack of spatial information. This study introduces multichannel sound signal processing technology, aiming to utilize the directionality and localization capabilities provided by spatial audio for the automated detection and analysis of bird sounds. It also includes recording bird vocalizations in natural environments to create a dataset, manually annotating the segments of bird calls along with their time labels to serve as the foundational data for model training and evaluation. For the recorded environments with multiple sound sources, the study further proposes a method that integrates direction estimation and spatial clustering to discern and cluster the directional information of multiple concurrently calling birds, thereby estimating the possible number of individuals present in the current field and their spatial distribution. Moreover, in the design of the bird sound recognition model, this research adopts a few-shot learning framework to address the common issue of scarce labeled data in traditional monitoring systems. It also presents a frequency range selection strategy, which, by analyzing the distribution characteristics of bird vocalizations in the spectrum, focuses on frequency bands with identification value while excluding non-discriminatory frequency areas during the feature extraction phase. This strategy effectively enhances the performance of few-shot learning in bird classification tasks.

Keywords

birdsecosystemsound monitoringenvironmental changesbiodiversityautomated detectionspatial audiomulti-channellabeled datafew-shot learningfrequency range