Mobile face detection algorithm inference traits
Keywords:face search, runtime, mobile devices, neural networks, edge computing
An ever-growing number of applications uses mobile face detection. However, most of the modern research papers focus on increasing detection quality while paying no attention to detection time. This means that many of the state-of-the-art algorithms are inapplicable on mo-bile due to excessively large detection time. The goal of this this paper is to adapt 5 of the face detection algorithms for inference on mobile devices and analyze their performance characteristics. These algorithms include es-tablished methods: Haar Cascades, LBP, HOG, as well as, novel neural-network-based algo-rithms: MTCNN, BlazeFace. The main research material. We conduct the experiments on three scenes typical for mobile face recognition systems: when there are no faces, 1 or 2 faces. For testing we have im-plemented an Android application. 2 widespread processors, namely Snapdragon 800 and 845, were selected for time measurements. Having tested the algorithms, we note that all them can run at real-time speeds for images of size 128x128 and only 2 of them (LBP, HOG) on 256x256 on the faster Snapdragon 845. On the slower Snapdragon 800 only BlazeFace, LBP, HOG can run at resolutions not higher than 128x128. We suggest not using Haar or LBP cascades in practice as their accuracy is quite low. Conclusions. Based on the research conducted, we suggest that for the practical use-cases the best algorithms are: 1) BlazeFace, which has stable and accurate predictions, how-ever, the method accepts only two image resolutions as input, in addition, higher inference time for empty images than for images with faces is untypical; 2) MTCNN, thanks to the cascaded architecture, conserves the resources when input frames have no faces. This algorithm is also the most adaptive and can run at resolutions as low as 32x32 given that the faces are quite large; 3) in case if inference time is of the most importance, we suggest using HOG-based algorithm. In this paper we have also shown that cascaded algorithm architecture dynamically changes execution time depending on image content and its complexity, which follows how we, humans, think. We hope that the novel practical results obtained, will increase the use of the above-described methods in mobile applications and will boost the development of the algorithm modifications.
Khabarlak K. Fast Facial Landmark Detection and Applications: A Survey / K. Khabarlak, L. Koriashkina // arXiv:2101.10808 [cs]. – 2021.
Khabarlak K.S. Mobile Access Control System Based on RFID Tags and Facial Information / K.S. Khabarlak, L.S. Koriashkina // Bulletin of National Technical University “KhPI”. Series: System Analysis, Control and Information Technologies. – 2020. – № 2 (4). – P. 69-74.
Viola P. Rapid object detection using a boosted cascade of simple features / P. Viola, M. Jones // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. – 2001. – Vol. 1. – P. I-I.
Bradski G. The OpenCV Library / G. Bradski // Dr. Dobb’s Journal of Software Tools. – 2000.
Dalal N. Histograms of oriented gradients for human detection / N. Dalal, B. Triggs // 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). – 2005. – Vol. 1. – P. 886-893 vol. 1.
King D.E. Dlib-ml: A Machine Learning Toolkit / D.E. King // Journal of Machine Learning Research. – 2009. – Vol. 10. – P. 1755-1758.
Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks / K. Zhang [et al.] // IEEE Signal Processing Letters. – 2016. – Vol. 23. – № 10. – P. 1499-1503.
BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs / V. Bazarev-sky [et al.] // arXiv:1907.05047 [cs]. – 2019.
SSD: Single Shot MultiBox Detector / W. Liu [et al.] // Computer Vision – ECCV 2016. – Cham: Springer International Publishing, 2016. – P. 21-37.
MobileNetV2: Inverted Residuals and Linear Bottlenecks / M. Sandler [et al.] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. – 2018. – P. 4510-4520.
This work is licensed under a Creative Commons Attribution 4.0 International License.