Federated Learning Using Multi-Modal Sensors with Heterogeneous Privacy Sensitivity Levels

Sustainable Development Goals

Abstract/Objectives

This article discusses the challenges of using multi-modal sensors, such as RGB cameras and microphones, in centralized learning due to privacy concerns. To address these issues, the authors propose a new approach called Heterogeneous Privacy Federated Learning (HPFL). This method allows less privacy-invasive data, like thermal images and mmWave point clouds, to be shared with a server while keeping more sensitive data, such as RGB images and audio, private. Users can also define their privacy preferences for each sensor type. Extensive experiments in semantic segmentation and emotion recognition show that HPFL significantly improves classification accuracy, outperforming traditional Federated Learning methods. Specifically, it achieves an 18.20% improvement in foreground accuracy for semantic segmentation and a 4.20% boosted F1-score for emotion recognition, with even greater improvements for users with varying privacy concerns. The results indicate HPFL's effectiveness in balancing data privacy and model performance.

Results/Contributions

Data from multi-modal sensors, such as Red-Green-Blue (RGB) cameras, thermal cameras, microphones, and mmWave radars, have gradually been adopted in various classification problems for better accuracy. Some sensors, like RGB cameras and microphones, however, capture privacy-invasive data, which are less likely to be used in centralized learning. Although the Federated Learning (FL) paradigm frees clients from sharing their sensor data, doing so results in reduced classification accuracy and increased training time. In this article, we introduce a novel Heterogeneous Privacy Federated Learning (HPFL) paradigm to better capitalize on the less privacy-invasive sensor data, such as thermal images and mmWave point clouds, by uploading them to the server for closing the performance gap between FL and centralized learning. HPFL not only allows clients to keep the more privacy-invasive sensor data private, such as RGB images and human voices, but also gives each client total freedom to define the levels of their privacy concern on individual sensor modalities. For example, more sensitive users may prefer to keep their thermal images private, while others do not mind sharing these images. We carry out extensive experiments to evaluate the HPFL paradigm using two representative classification problems: semantic segmentation and emotion recognition. Several key findings demonstrate the merits of HPFL: (i) compared to FedAvg, it improves foreground accuracy by 18.20% in semantic segmentation and boosts the F1-score by 4.20% in emotion recognition, (ii) with heterogeneous privacy concern levels, it achieves an even larger F1-score improvement of 6.17–16.05% in emotion recognition, and (iii) it also outperforms the state-of-the-art FL approaches by 12.04–17.70% in foreground accuracy and 2.54–4.10% in F1-score. © 2024 Copyright held by the owner/author(s) Publication rights licensed to ACM.

Keywords

multimodal sensorsprivacyfederated learningheterogeneous privacy federated learningexperimental resultssemantic segmentationemotion recognitionforeground accuracyF1 score

Contact Information

徐正炘

chsu@cs.nthu.edu.tw