Privacy preserving crowd counting using deep learning on range-doppler maps

(2025)

Files

Hilgers_17742000_2025.pdf
  • Open access
  • Adobe PDF
  • 16.09 MB

Details

Supervisors
Faculty
Degree label
Abstract
Estimating crowd density in public spaces is crucial for applications such as event safety, infrastructure planning, and smart building management. However, most current solutions rely on camera-based systems that raise serious privacy concerns. In this thesis, we explore a privacy-preserving alternative by leveraging Frequency-Modulated Continuous Wave (FMCW) radar technology. Specifically, we use Range-Doppler Maps (RDMs) as input to a deep learning model capable of estimating the number of people in a scene without any visual data. We constructed a custom dataset of over 50,000 RDMs, recorded in a real-world indoor environment and annotated using synchronized camera images processed with a YOLO-based (a state-of-the-art object detection algorithm) people detector. Crucially, this visual information is used solely for pseudo-labeling and never enters the model during inference. Our proposed method uses a ResNet-18 convolutional neural network that is trained to predict the number of people in a crowd based on a 625-millisecond temporal sequence of signals. To evaluate the generalization capacity of our model, we also collected a second, independent dataset of 30,000 RDMs in a different environment with distinct background dynamics. This served as an out-of-distribution (OOD) generalization test, an evaluation that assesses how well a machine learning model performs on data that differs from the distribution it was trained on, and further validated the robustness of our approach. Experiments show that our approach achieves a mean absolute error of approximately 7% on a first test set recorded in the same location and under similar conditions as the training data, and about 13% on a second, fully independent test set captured in a different environment. The use of temporal stacks enhances robustness to signal noise and transient motion. Despite challenges such as static individuals being poorly reflected in radar or occasional labeling inaccuracies, the model demonstrates strong generalization and practical potential. By combining radar sensing with deep learning, this work contributes a promising and non-intrusive solution for crowd counting, offering a compelling compromise between accuracy and privacy preservation. Finally, research perspectives are outlined to go beyond simple counting and explore richer scene understanding from radar signals.