Extension to multiple objects

Extension to multiple objects#

In the first part of the project, you trained a model that detects whether an object of a chosen class is present in an image and, if so, predicts a single bounding box around it. While this is a useful starting point, real-world images often contain several objects, sometimes overlapping or appearing at different scales.

The goal of this second part is to extend the model to handle multiple bounding boxes per image. This requires changes to both the dataset format and the model architecture:

The dataset must now return a variable number of bounding boxes per image.
The model must be able to predict multiple boxes rather than a single one.
The loss function and evaluation must account for matching predicted boxes with ground-truth boxes.

To simplify the task at this stage, you will still assume that all objects belong to the same class.

Overview#

TODO…

Preparing the dataset#

Previously, the dataset was filtered to ensure that at most one object of the target class appeared in each image. You will now keep all images. Some may contain several objects of the chosen class, while others may contain none. The Pascal VOC Detection dataset already provides the required annotations. Each image is associated with a list of objects, and each object includes the class label, and a bounding box defined by its corner coordinates.

To prepare the dataset for this task, you should do the following.

Select a target class (e.g., “dog” or “car”).
Keep all images, regardless of whether the class is present.
For each image, collect all bounding boxes corresponding to the chosen class. If the class is absent, return an empty list of bounding boxes.
Return the following for each sample:
- the image,
- a binary label (1 if at least one object of the class is present, 0 otherwise),
- a list of bounding boxes, which may be empty.

Because the number of objects varies across images, the dataset will return bounding boxes as a variable-length list or tensor. You will need to handle this variability in the data loader and later in the model.

Note

A useful way to check your implementation is to visualize several images with their bounding boxes to confirm that multiple instances are correctly extracted and aligned.

Bounding box preprocessing#