Dataset#

This project is based on the Large-scale CelebFaces Attributes dataset, which contains around 200k celebrity face images with 40 annotated attributes for each image. The dataset is divided into a training set of 162770 images, a validation set of 19867 images, and a test set of 19962 images. The images are in JPEG format and have a resolution of 178x218 pixels.

CelebA

Loading the data#

The CelebA Dataset is available in PyTorch through the TorchVision library. You can download the images by setting download=True when creating the dataset. This requires the gdown package to be installed.

from torchvision.datasets import CelebA

dataset = CelebA(".data", split="train", download=True)
dataset
Dataset CelebA
    Number of datapoints: 162770
    Root location: .data
    Target type: ['attr']
    Split: train

Each sample in the dataset is a tuple containing the image and the attributes. The attributes are encoded as a 40-dimensional multi-hot vector, where each element is either 0 or 1 indicating the absence or presence of the corresponding attribute.

image, attributes = dataset[0]

print(image, '\n')
print(attributes)
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=178x218 at 0x144DE1E80> 

tensor([0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,
        1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1])

The dataset contains a list of attribute names, which can be used to interpret the attribute vectors. Let’s plot a few samples from the dataset to see what the images look like.

Hide code cell source
import matplotlib.pyplot as plt
import textwrap

n = 4
fig, axes = plt.subplots(n, 2, figsize=(5, 6), tight_layout=True)

for i in range(n):
    img, label = dataset[i]
    attr = [dataset.attr_names[j] for j in range(len(label)) if label[j] == 1]
    attr_str = ', '.join(attr)
    wrapped_text = "\n".join(textwrap.wrap(attr_str, width=35))
    axes[i, 1].text(0, 0.5, wrapped_text, fontsize=11, va="center")
    axes[i, 1].axis("off")
    axes[i, 0].imshow(img)
    axes[i, 0].axis("off")
../../_images/516048a45b6725eeb5c262b08a30ed7270e7b07d588f8b548a11ae8542a74463.png

Attribute names#

The full list of attributes in the CelebA dataset is displayed below. The attributes are grouped into categories, which have no importance in the dataset.

Category

Attributes

Hair

Bald, Bangs, Black Hair, Blond Hair, Brown Hair, Gray Hair, Receding Hairline, Straight Hair, Wavy Hair

Eyes

Arched Eyebrows, Bags Under Eyes, Bushy Eyebrows, Narrow Eyes

Beard

5 O’Clock Shadow, Goatee, Mustache, No Beard, Sideburns

Mouth

Big Lips, Mouth Slightly Open, Smiling, Wearing Lipstick

Face

Big Nose, Chubby, Double Chin, High Cheekbones, Oval Face, Pointy Nose

Skin

Pale Skin, Rosy Cheeks, Heavy Makeup

Accessories

Eyeglasses, Wearing Earrings, Wearing Hat, Wearing Necklace, Wearing Necktie

Misc

Attractive, Blurry, Male, Young