Choice 1: Density Map#
In this project, you will build a convolutional neural network that estimates the number of individuals in a crowded scene by analyzing the spatial density of people in an image. Your implementation will follow the approach described in the paper “CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes” by Li et al. (2018). CSRNet extends a pretrained VGG16 backbone (frontend) with dilated convolutional layers (backend), enabling the model to capture larger contextual regions without losing spatial resolution. The model is trained to generate a density map, where the integral over its values corresponds to the estimated count of individuals in the input image.
Difficulty |
Suggested Tutorials |
|---|---|
Easy |
Requirements: A GPU with at least 8GB of memory is recommended. Training on a CPU is possible but will be significantly slower.

Grading
The project will be graded based on the following criteria. Points for each activity are awarded based on quality and completeness (partial credit possible).
Activity |
Points (max) |
|---|---|
Preprocessing pipeline |
4 |
CSRNet architecture |
4 |
Loss function |
2 |
Training (with frozen frontend) |
2 |
Fine-tuning the entire network |
2 |
Performance evaluation |
3 |
Presentation (clarity & demo) |
3 |
Total |
0-20 |