Dataset#

The dataset has been manually curated using TinEye Multicolr Search, a tool that allows users to search for Creative Commons images on Flickr using up to five chosen colors. The dataset contains 16’632 images retrieved using a query based on a combination of 1–3 colors from a fixed color palette. Each image is labeled with the colors used in the query. The color palette is shown below.

Red Orange Yellow Green
Cyan Blue Violet Pink
White Gray Black

Download the images#

The dataset is split into three parts due to the size of the images. Download the following files and unzip them into a folder of your choice.

You can use any unzip tool that supports multi-part archives, such as 7-Zip.

Load the dataset#

Assuming you have unzipped the files into a folder called .data, you can load the dataset using the ImageFolder class from the TorchVision library. The dataset is organized into subfolders, each containing images of a specific color. The subfolder names are the color names, and the images are in JPEG format.

from torchvision.datasets import ImageFolder

dataset = ImageFolder('.data/images/')

print("Number of images:", len(dataset))
print("Number of classes:", len(dataset.classes))
Number of images: 16632
Number of classes: 231

Let’s visualize some of the images.

../../_images/f1b76b163de737ad55bec1e1b76c4fd3ebfbee954eda66442766435a8a58d31c.png ../../_images/1dce876e5e39bd452aee69277ba16b1679b5709be835f65695b88327dd2b0469.png ../../_images/da95b7c7b50ebf34abd82aae6eb2aa18d2578c81057394a4e05ba9727a8d3f09.png ../../_images/b10189adfc609bdc6930bb2c4d3b8a88b9e5f7e9a7913340550691d5cc517243.png