Training

Training#

The goal of this project is to learn a similarity metric that can be used to compare images based on their dominant colors. This involves training a neural network to learn a mapping from images to a shared embedding space, where images with similar dominant colors are close together in that space. Specifically, the neural network receives the features extracted from the images as input, processes them through a series of fully-connected layers, and outputs embedding vectors of a fixed length.

Loss function#

The loss function used in this project is the triplet loss, which takes the following inputs.

Anchor: The embedding of an image.
Positive: The embedding of an image with similar dominant colors to the anchor.
Negative: The embedding of an image with dissimilar dominant colors from the anchor.

The triplet loss explicitly encourages the neural network to position the anchor embedding closer to the positive embedding than to the negative embedding by at least a fixed margin. This encourages the model to learn an embedding space in which images sharing similar dominant colors are grouped together, facilitating effective image retrieval.

Note

Refer to the Triplet Loss tutorial for a detailed explanation of the triplet loss and its implementation.

Implementation details#

The dimensionality of the embedding space is chosen based on the complexity of the task. Typical dimensions range from 100 to 500.
A fully connected network is used to generate embeddings from image features. A common architecture might consist of two hidden layers with 512 and 256 units, respectively, followed by a final layer that outputs the embedding vector. The hidden layers can use ReLU activation functions, but the final layer should not have an activation function to allow for a continuous range of output values.
Optionally, you may include batch normalization layers to stabilize training, and dropout layers to prevent overfitting.
The output of the network is normalized to produce unit-length vectors, ensuring that all embeddings are comparable in the shared space. Make sure to apply the normalization along the last dimension of the output tensor.

Training loop#

The training process is similar to the one explained in the Triplet Loss tutorial.

You have a classification dataset with features and labels that are loaded in batches.
The features are passed through the neural network to generate embeddings.
The triplet loss receives the embeddings and the corresponding labels. Triplets of anchor, positive, and negative samples are generated based on the labels.

Your job here is to adapt the code from the triplet loss tutorial and choose the appropriate hyperparameters for the training process, such as the batch size, learning rate, and number of epochs. You also need to choose/compare the mining strategy for generating triplets (batch-hard or batch-all).