Choice 2: Head localization#
In this project, you will build a convolutional neural network that estimates the number of individuals in a crowded scene by directly localizing their head positions. Your implementation will follow the approach described in “Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework” by Song et al. (2021). This approach formulates crowd counting as a direct localization problem, predicting head positions through a combination of feature extraction and point regression. The model is trained using a loss function that matches predicted points to ground-truth annotations using the Hungarian algorithm, allowing it to effectively learn to identify and count individuals in dense crowds.
Difficulty |
Suggested Tutorials |
|---|---|
Hard |
Requirements: A GPU with at least 8GB of memory is recommended. Training on a CPU is possible but will be significantly slower.

Grading
The project will be graded based on the following criteria. Points for each activity are awarded based on quality and completeness (partial credit possible).
Activity |
Points (max) |
|---|---|
Preprocessing pipeline |
4 |
P2PNet architecture |
4 |
Loss function with hungarian assignment |
2 |
Training (with frozen backbone) |
2 |
Fine-tuning the entire network |
2 |
Performance evaluation |
3 |
Presentation (clarity & demo) |
3 |
Total |
0-20 |