Integration of Automatic Frequency Range Recognition and Multichannel Direction Estimation for Few-Sample Birdsong Event Detection
Sustainable Development Goals
Abstract/Objectives
Results/Contributions
Birds play a crucial role in ecosystems, and their vocal behavior is often regarded as an indicator of environmental changes, biodiversity, and species activity, making them important subjects for ecological monitoring and conservation research. Traditional bird sound monitoring has primarily relied on manual listening and monophonic recording devices. These methods face challenges when dealing with complex soundscapes (such as multiple birds calling simultaneously or ambient noise interference), including difficulties in identification, high labor costs, and a lack of spatial information. This study introduces multichannel sound signal processing technology, aiming to utilize the directionality and localization capabilities provided by spatial audio for the automated detection and analysis of bird sounds. It also includes recording bird vocalizations in natural environments to create a dataset, manually annotating the segments of bird calls along with their time labels to serve as the foundational data for model training and evaluation. For the recorded environments with multiple sound sources, the study further proposes a method that integrates direction estimation and spatial clustering to discern and cluster the directional information of multiple concurrently calling birds, thereby estimating the possible number of individuals present in the current field and their spatial distribution. Moreover, in the design of the bird sound recognition model, this research adopts a few-shot learning framework to address the common issue of scarce labeled data in traditional monitoring systems. It also presents a frequency range selection strategy, which, by analyzing the distribution characteristics of bird vocalizations in the spectrum, focuses on frequency bands with identification value while excluding non-discriminatory frequency areas during the feature extraction phase. This strategy effectively enhances the performance of few-shot learning in bird classification tasks.