Choice 1: Density Map#
In this project, you will build a convolutional neural network that estimates the number of individuals in a crowded scene by analyzing the spatial density of people in an image. Your implementation will follow the approach described in the paper “CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes” by Li et al. (2018). CSRNet extends a pretrained VGG16 backbone (frontend) with dilated convolutional layers (backend), enabling the model to capture larger contextual regions without losing spatial resolution. The model is trained to generate a density map, where the integral over its values corresponds to the estimated count of individuals in the input image.
Difficulty |
Suggested Tutorials |
|---|---|
Easy |
Requirements: A GPU with at least 8GB of memory is recommended. Training on a CPU is possible but will be significantly slower.

Grading
The project will be graded based on the following criteria. Points for each activity are awarded based on quality and completeness (partial credit possible).
Activity |
Points (max) |
|---|---|
Build a preprocessing pipeline for the ShangaiTech dataset |
4 |
Implement the CSRNet architecture on top of VGG16 |
4 |
Train the backend while keeping the frontend frozen |
2 |
Fine-tune the entire network end-to-end |
3 |
Evaluate model performance using MAE |
4 |
Presentation (clarity & demo) |
3 |
Total |
0-20 |