实战系列: 使用 PyTorch 检测人脸特征点
人脸对齐 PyTorch 实战
有没有想过像 Instagram 之类的软件如何将惊人的滤镜应用到你的脸上呢?该软件可检测到人脸上的关键点,并投影一个遮罩。
本教程将讲述如何使用 PyTorch 来构建具有类似功能的软件。
1DLIB 数据集下载
在本教程中,我们将使用 DLib 官方数据集,其中包含 6666 张不同尺寸的图像。此外,labels_ibug_300W_train.xml
(随数据集一起提供)包含每张人脸上 68 个标记点的坐标。下面的脚本将下载数据集并将其解压缩到 Colab Notebook 中。
%%capture
if not os.path.exists('/content/ibug_300W_large_face_landmark_dataset'):
!wget http://dlib.net/files/data/ibug_300W_large_face_landmark_dataset.tar.gz
!tar -xvzf 'ibug_300W_large_face_landmark_dataset.tar.gz'
!rm -r 'ibug_300W_large_face_landmark_dataset.tar.gz'
2数据集可视化
file = open('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.pts')
points = file.readlines()[3:-1]
landmarks = []
for point in points:
x,y = point.split(' ')
landmarks.append([floor(float(x)), floor(float(y[:-1]))])
landmarks = np.array(landmarks)
plt.figure(figsize=(10,10))
plt.imshow(mpimg.imread('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.jpg'))
plt.scatter(landmarks[:,0], landmarks[:,1], s = 5, c = 'g')
plt.show()
这是一张来自数据集的样本图像。可以发现,面部只占整个图像的很小一部分。如果我们将整个图像输入神经网络,它还将需要额外处理背景(不相关的信息),这样会使得模型难以学习。因此,我们需要裁剪图像并仅输入脸部给网络。
3数据预处理
为了防止神经网络过拟合训练数据集,我们需要对数据集进行随机变换。我们将以下操作应用于训练和验证数据集:
由于面部只占整个图像的一小部分,因此裁剪图像并仅使用面部进行训练。 将裁剪的脸部调整为(224x224)图像的大小。 随机更改脸部的亮度和饱和度。 经过以上三个变换后,随机旋转面部。 将图像和标记点转换为 PyTorch 张量,并在 [-1,1] 之间对其进行归一化。
class Transforms():
def __init__(self):
pass
def rotate(self, image, landmarks, angle):
angle = random.uniform(-angle, +angle)
transformation_matrix = torch.tensor([
[+cos(radians(angle)), -sin(radians(angle))],
[+sin(radians(angle)), +cos(radians(angle))]
])
image = imutils.rotate(np.array(image), angle)
landmarks = landmarks - 0.5
new_landmarks = np.matmul(landmarks, transformation_matrix)
new_landmarks = new_landmarks + 0.5
return Image.fromarray(image), new_landmarks
def resize(self, image, landmarks, img_size):
image = TF.resize(image, img_size)
return image, landmarks
def color_jitter(self, image, landmarks):
color_jitter = transforms.ColorJitter(brightness=0.3,
contrast=0.3,
saturation=0.3,
hue=0.1)
image = color_jitter(image)
return image, landmarks
def crop_face(self, image, landmarks, crops):
left = int(crops['left'])
top = int(crops['top'])
width = int(crops['width'])
height = int(crops['height'])
image = TF.crop(image, top, left, height, width)
img_shape = np.array(image).shape
landmarks = torch.tensor(landmarks) - torch.tensor([[left, top]])
landmarks = landmarks / torch.tensor([img_shape[1], img_shape[0]])
return image, landmarks
def __call__(self, image, landmarks, crops):
image = Image.fromarray(image)
image, landmarks = self.crop_face(image, landmarks, crops)
image, landmarks = self.resize(image, landmarks, (224, 224))
image, landmarks = self.color_jitter(image, landmarks)
image, landmarks = self.rotate(image, landmarks, angle=10)
image = TF.to_tensor(image)
image = TF.normalize(image, [0.5], [0.5])
return image, landmarks
4数据集类
现在我们已经准备好转换,让我们编写数据集类
。labels_ibug_300W_train.xml
包含图像路径、标记点和人脸包围盒(用于裁剪脸部)坐标。我们会将这些值存储在列表中,以便在训练期间轻松访问它们。在本教程中,将在灰度图像上训练神经网络。
class FaceLandmarksDataset(Dataset):
def __init__(self, transform=None):
tree = ET.parse('ibug_300W_large_face_landmark_dataset/labels_ibug_300W_train.xml')
root = tree.getroot()
self.image_filenames = []
self.landmarks = []
self.crops = []
self.transform = transform
self.root_dir = 'ibug_300W_large_face_landmark_dataset'
for filename in root[2]:
self.image_filenames.append(os.path.join(self.root_dir, filename.attrib['file']))
self.crops.append(filename[0].attrib)
landmark = []
for num in range(68):
x_coordinate = int(filename[0][num].attrib['x'])
y_coordinate = int(filename[0][num].attrib['y'])
landmark.append([x_coordinate, y_coordinate])
self.landmarks.append(landmark)
self.landmarks = np.array(self.landmarks).astype('float32')
assert len(self.image_filenames) == len(self.landmarks)
def __len__(self):
return len(self.image_filenames)
def __getitem__(self, index):
image = cv2.imread(self.image_filenames[index], 0)
landmarks = self.landmarks[index]
if self.transform:
image, landmarks = self.transform(image, landmarks, self.crops[index])
landmarks = landmarks - 0.5
return image, landmarks
dataset = FaceLandmarksDataset(Transforms())
5训练变换可视化
image, landmarks = dataset[0]
landmarks = (landmarks + 0.5) * 224
plt.figure(figsize=(10, 10))
plt.imshow(image.numpy().squeeze(), cmap='gray');
plt.scatter(landmarks[:,0], landmarks[:,1], s=8);
注意: landmarks = landmarks - 0.5
对标记点作零中心处理,因为神经网络更容易学习零中心数据。预处理后的数据集输出将类似于以下内容(标记点已绘制在图像上)。
将数据集分为训练数据集和验证数据集,
# split the dataset into validation and test sets
len_valid_set = int(0.1*len(dataset))
len_train_set = len(dataset) - len_valid_set
print("The length of Train set is {}".format(len_train_set))
print("The length of Valid set is {}".format(len_valid_set))
train_dataset , valid_dataset, = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set])
# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)
The length of Train set is 6000
The length of Valid set is 666
测试输入数据的大小,
images, landmarks = next(iter(train_loader))
print(images.shape)
print(landmarks.shape)
torch.Size([64, 1, 224, 224])
torch.Size([64, 68, 2])
6神经网络架构
我们将使用 ResNet18 作为基本框架。我们需要修改第一层和最后一层以适合本项目的目的。在第一层中,我们将使神经网络接受灰度图像的输入通道数为 1。同样,在最后一层中,模型的输出通道数应等于
class Network(nn.Module):
def __init__(self,num_classes=136):
super().__init__()
self.model_name='resnet18'
self.model=models.resnet18()
self.model.conv1=nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.model.fc=nn.Linear(self.model.fc.in_features, num_classes)
def forward(self, x):
x=self.model(x)
return x
7训练神经网络
我们将使用预测和真实标记点之间的均方误差作为损失函数。请记住,学习率应保持较低以避免梯度爆炸。每当验证损失达到新的最小值时,将保存网络权重。训练至少 20 个 epochs 以获得最佳性能。
torch.autograd.set_detect_anomaly(True)
network = Network()
network.cuda()
criterion = nn.MSELoss()
optimizer = optim.Adam(network.parameters(), lr=0.0001)
loss_min = np.inf
num_epochs = 10
start_time = time.time()
for epoch in range(1,num_epochs+1):
loss_train = 0
loss_valid = 0
running_loss = 0
network.train()
for step in range(1,len(train_loader)+1):
images, landmarks = next(iter(train_loader))
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
predictions = network(images)
# clear all the gradients before calculating them
optimizer.zero_grad()
# find the loss for the current step
loss_train_step = criterion(predictions, landmarks)
# calculate the gradients
loss_train_step.backward()
# update the parameters
optimizer.step()
loss_train += loss_train_step.item()
running_loss = loss_train/step
print_overwrite(step, len(train_loader), running_loss, 'train')
network.eval()
with torch.no_grad():
for step in range(1,len(valid_loader)+1):
images, landmarks = next(iter(valid_loader))
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
predictions = network(images)
# find the loss for the current step
loss_valid_step = criterion(predictions, landmarks)
loss_valid += loss_valid_step.item()
running_loss = loss_valid/step
print_overwrite(step, len(valid_loader), running_loss, 'valid')
loss_train /= len(train_loader)
loss_valid /= len(valid_loader)
print('\n--------------------------------------------------')
print('Epoch: {} Train Loss: {:.4f} Valid Loss: {:.4f}'.format(epoch, loss_train, loss_valid))
print('--------------------------------------------------')
if loss_valid < loss_min:
loss_min = loss_valid
torch.save(network.state_dict(), '/content/face_landmarks.pth')
print("\nMinimum Validation Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs))
print('Model Saved\n')
print('Training Complete')
print("Total Elapsed Time : {} s".format(time.time()-start_time))
Valid Steps: 84/84 Loss: 0.0058
--------------------------------------------------
Epoch: 1 Train Loss: 0.0220 Valid Loss: 0.0058
--------------------------------------------------
Minimum Validation Loss of 0.0058 at epoch 1/10
Model Saved
Valid Steps: 84/84 Loss: 0.0051
--------------------------------------------------
Epoch: 2 Train Loss: 0.0053 Valid Loss: 0.0051
--------------------------------------------------
Minimum Validation Loss of 0.0051 at epoch 2/10
Model Saved
Valid Steps: 84/84 Loss: 0.0043
--------------------------------------------------
Epoch: 3 Train Loss: 0.0046 Valid Loss: 0.0043
--------------------------------------------------
Minimum Validation Loss of 0.0043 at epoch 3/10
Model Saved
Valid Steps: 84/84 Loss: 0.0028
--------------------------------------------------
Epoch: 4 Train Loss: 0.0034 Valid Loss: 0.0028
--------------------------------------------------
Minimum Validation Loss of 0.0028 at epoch 4/10
Model Saved
Valid Steps: 84/84 Loss: 0.0022
--------------------------------------------------
Epoch: 5 Train Loss: 0.0024 Valid Loss: 0.0022
--------------------------------------------------
Minimum Validation Loss of 0.0022 at epoch 5/10
Model Saved
Valid Steps: 84/84 Loss: 0.0019
--------------------------------------------------
Epoch: 6 Train Loss: 0.0019 Valid Loss: 0.0019
--------------------------------------------------
Minimum Validation Loss of 0.0019 at epoch 6/10
Model Saved
Valid Steps: 84/84 Loss: 0.0015
--------------------------------------------------
Epoch: 7 Train Loss: 0.0017 Valid Loss: 0.0015
--------------------------------------------------
Minimum Validation Loss of 0.0015 at epoch 7/10
Model Saved
Valid Steps: 84/84 Loss: 0.0015
--------------------------------------------------
Epoch: 8 Train Loss: 0.0015 Valid Loss: 0.0015
--------------------------------------------------
Valid Steps: 84/84 Loss: 0.0017
--------------------------------------------------
Epoch: 9 Train Loss: 0.0013 Valid Loss: 0.0017
--------------------------------------------------
Valid Steps: 84/84 Loss: 0.0013
--------------------------------------------------
Epoch: 10 Train Loss: 0.0013 Valid Loss: 0.0013
--------------------------------------------------
Minimum Validation Loss of 0.0013 at epoch 10/10
Model Saved
Training Complete
Total Elapsed Time : 5314.008787870407 s
8测试集上预测
使用下面的代码片段预测看不见的图像中的标记点。
import time
import cv2
import os
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import imutils
import torch
import torch.nn as nn
from torchvision import models
import torchvision.transforms.functional as TF
#######################################################################
image_path = 'pic.jpg'
weights_path = 'face_landmarks.pth'
frontal_face_cascade_path = 'haarcascade_frontalface_default.xml'
#######################################################################
class Network(nn.Module):
def __init__(self,num_classes=136):
super().__init__()
self.model_name='resnet18'
self.model=models.resnet18(pretrained=False)
self.model.conv1=nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.model.fc=nn.Linear(self.model.fc.in_features,num_classes)
def forward(self, x):
x=self.model(x)
return x
#######################################################################
face_cascade = cv2.CascadeClassifier(frontal_face_cascade_path)
best_network = Network()
best_network.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu')))
best_network.eval()
image = cv2.imread(image_path)
grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
display_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
height, width,_ = image.shape
faces = face_cascade.detectMultiScale(grayscale_image, 1.1, 4)
all_landmarks = []
for (x, y, w, h) in faces:
image = grayscale_image[y:y+h, x:x+w]
image = TF.resize(Image.fromarray(image), size=(224, 224))
image = TF.to_tensor(image)
image = TF.normalize(image, [0.5], [0.5])
with torch.no_grad():
landmarks = best_network(image.unsqueeze(0))
landmarks = (landmarks.view(68,2).detach().numpy() + 0.5) * np.array([[w, h]]) + np.array([[x, y]])
all_landmarks.append(landmarks)
plt.figure()
plt.imshow(display_image)
for landmarks in all_landmarks:
plt.scatter(landmarks[:,0], landmarks[:,1], c = 'c', s = 5)
plt.show()
OpenCV Harr 级联分类器用于检测图像中的脸部。使用 Haar 级联进行对象检测是一种基于机器学习的方法,其中使用一组输入数据来训练级联函数。OpenCV 包含许多针对面部、眼睛、行人等已经训练好的分类器。这里我们将使用脸部分类器,你需要为其下载预训练的分类器 XML 文件并将其保存到工作目录中。
然后裁剪输入图像中检测到的脸部区域,将其调整为
然后将裁剪后的面部预测到的标记点覆盖到原始图像上,结果是下图所示,
同样,对多个人脸作标记点检测的效果,
可以看到 OpenCV Harr 级联分类器检测到了多个脸部(包括误报的拳头,也被预测为脸部)。
你只需要训练自己的神经网络即可检测任何图像中的面部标记点。尝试在你的摄像头视频上预测脸部标志吧。完整代码请戳此处[1]。
⟳参考资料⟲
代码地址: https://colab.research.google.com/drive/1-28T5nIAevrDo6MwN0Qi_Cgdy9TEiSP_?usp=sharing
[2]英文链接: https://towardsdatascience.com/face-landmarks-detection-with-pytorch-4b4852f5e9c4