Object Localization#
Object localization is a core problem in computer vision that extends beyond classification. In addition to determining whether an object is present in an image, localization also predicts its spatial location using a bounding box. In this project, you will start with a simplified setting: detecting whether an image contains an object of a single predefined category, and if so, predicting its bounding box coordinates. Once the model is trained and its performance evaluated, you will progressively move toward more complex settings.
In the next stage of the project, you will deal with multiple objects per image, requiring the model to identify and localize each instance separately. Finally, you will generalize further to handle multiple object categories, making the task closer to object detection. At each stage, you will assess how the increased complexity affects model design, training, and performance, gaining hands-on experience with the challenges of scaling object localization to full object detection.
Difficulty |
Suggested Tutorials |
---|---|
Medium |
Grading
The project will be graded based on the following criteria. Points for each activity are awarded based on quality and completeness (partial credit possible).
Activity |
Points (max) |
---|---|
Build and evaluate a model for single-class, single-object localization |
10 |
Extend the model to handle multiple objects per image |
5 |
Extend the model to handle multiple categories |
2 |
Presentation (clarity & demo) |
3 |
Total |
0-20 |