Image retrieval#
After training, the neural networks are used to perform sketch-based image retrieval. The goal is to find the most relevant photos in the dataset given a sketch as a query. This is done by comparing learned feature embeddings. The retrieval process works as follows.
Embed the gallery photos. First, compute and store the embeddings of all photos in the dataset using the photo network. This set of embeddings serves as the retrieval gallery.
Embed the query sketch. For each sketch you want to use as a query, pass it through the sketch network to obtain its embedding.
Compute distances. Compare the query sketch embedding to every photo embedding in the gallery using Euclidean distance or cosine similarity, depending on the loss used during training.
Rank the photos. Sort the gallery photos by increasing distance or decreasing similarity to the query sketch embedding.
Return top-k results. The top-k ranked photos are returned as the retrieval results. These can be evaluated against ground-truth labels for accuracy or visualized for inspection.
For performance evaluation, this retrieval process must be restricted to sketches and photos in the test set, which should have been separated from the training set during the dataset preparation phase. This ensures that evaluation reflects the model’s ability to generalize to unseen sketches and photos.
Part 1 - Qualitative evaluation#
It is important to visually inspect the retrieval results to understand how well the model performs. This can be done by plotting the top-k retrieved photos for a set of query sketches. You should obtain something like the following figure.
Part 2 - Quantitative evaluation#
You can evaluate the performance of the image retrieval system using quantitative metrics computed on the test set. The most common metrics for evaluating image retrieval systems are the following.
Precision@K: The proportion of images correctly retrieved in the top K results. For example, if you retrieve K = 5 images and 3 of them are relevant to the query, the Precision@5 is 3/5 = 0.6. This metric focuses on the accuracy of the top K retrieved items, but does not consider whether the retrieval system finds all relevant results. A high Precision@K means that most of the top K retrieved results are relevant.
Recall@K: The proportion of images correctly retrieved in the top K results out of all relevant images in the retrieval set. For example, if there are 10 relevant images in the dataset, and our system retrieves 3 relevant images in the top K = 5 results, the Recall@5 is 3/10 = 0.3. This metric focuses on the ability of the retrieval system to find all relevant results, but does not penalize irrelevant items in the top K. A high Recall@K means that the retrieval system successfully finds a large fraction of all relevant items.