Person Tracking and Reidentification for Multicamera Indoor Video Surveillance Systems

S. Yea,*, R. P. Bohushb,**, H. Chena,***, I. Yu. Zakharavab,****, and S. V. Ablameykoc,d,*****

a Zhejiang Shuren University, Hangzhou, 310015 China

b Polotsk State University, Novopolotsk, 211440 Belarus

c Belarusian State University, Minsk, 220030 Belarus

d United Institute for Informatics Problems, National Academy of Sciences of Belarus, Minsk, 220012 Belarus

Correspondence to: * e-mail: zjsruysp@163.com
Correspondence to: ** e-mail: bogushr@mail.ru
Correspondence to: *** e-mail: eric.hf.chen@hotmail.com
Correspondence to: **** e-mail: i.zakharova@psu.by
Correspondence to: ***** e-mail: ablameyko@bsu.by

Received 23 June, 2020

Abstract—For practical use, the relevance of indoor surveillance from multiple cameras to track the movement of people and reidentify them in video sequences is constantly increasing. This is a complex task due to the effect of uneven illumination, background inhomogeneity, overlap, uncertainty of the trajectories of people, and the similarity of their visual features. The article presents an approach to track people by video sequences and reidentify them in multicamera video surveillance systems that are used indoors. At the first step, people are detected using a YOLO v4 convolution neural network (CNN) and described by a rectangular area. Further, the search for the face area and the calculation of its features are carried out, which in the developed method are used when accompanying a person in a video sequence and during his intercamera reidentification. This approach improves the accuracy of tracking with a complex movement trajectory and multiple intersections of people with similar characteristics. The search for faces is carried out on the detected areas based on the multitasking MTCNN, and the MobileFaceNetwork model is used to form the vector of the features of the face. Human features are generated using a modified CNN based on ResNet34 and an HSV color tone channel histogram. The correspondence between people on different frames is established based on the analysis of the spatial coordinates of faces and people, as well as their CNN features, using the Hungarian algorithm. To ensure the accuracy of intercamera tracking, reidentification is performed based on the facial features. Five test video sequences of different numbers of people captured indoors with a fixed video camera were used to test and compare different approaches. The obtained experimental results confirmed the strength of the characteristics of the proposed approach.

Keywords: tracking people, face recognition, internal video surveillance, convolution neural networks

DOI: 10.1134/S1054661820040136