Development of an intelligent photo and video analytics system for action recognition of a person or a group of people

Competition name:

Zhas galym 2022-2024, MSHE RK

Project coordinator:

Islamgozhayev T.U., PhD, assistant professor, Department of Computing and Data Science, Astana IT University

Researcher ID Web of Science – D-6524-2015

ORCID – 0000-0001-7891-242X

Scopus ID – 56826222900

Total amount financed:

18 745 180 tenge

Project aim

The goal of the project is to research and develop an action recognition system based on images from CCTV cameras to solve the problems of object detection and action classification.

Objectives

  1. Research and analysis of existing systems, approaches, methods, algorithms for solving problems of object recognition and action classification;
  2. Development of object definition functionality, including neural network models and pre- and post-processing of images from CCTV cameras;
  3. Research and development of methods for classifying the actions of objects in video streams;
  4. Development of fast image processing functionality using TensorRT technologies for GPUs, as well as OpenVINO for CPUs;
  5. Development of a notification module for suspicious or anomalous actions;
  6. Testing the indicators of the proposed models, methods and algorithms under different conditions.

Expected results

In the process of solving problems, the following results are expected to be obtained:

  1. a concept and architecture of a system for monitoring the area in real time will be proposed;
  2. neural network models will be developed to detect and classify the actions of a person or a group of people using computer vision and machine learning models;
  3. modular software will be created, including an abnormal activity alert system;
  4. based on the research results, articles will be published in domestic and international journals and conferences;
  5. an assessment and testing of the developed platform will be made and documentation on its use will be prepared.

Upon completion, publication is expected:

– at least 2 (two) articles in journals from the first three quartiles by impact factor in the Web of Science database or with a CiteScore percentile in the Scopus database of at least 50.

Current results

In general, the task of recognition of actions is divided into the following subtasks shown in Figure 1.

Figure 1. Action recognition pipeline

At the moment, a module for determining objects (people) from the streaming video data of CCTV cameras has been developed, thus the first three subtasks of the problem have been solved. For this purpose, the YOLO architecture was used, followed by additional training using additionally collected and processed data sets in the amount of 1000 images. As a result, we managed to improve the model for determining people for the environment we need (premises, hangars, workplaces, etc.). In particular, we conducted a search and analysis of alternative works describing models and methods of image processing, finding objects and classifying actions. As a result, the YOLO7 algorithm with a high rate in finding objects was chosen as a model and algorithm for finding a person in the frame. To classify actions at a given point in time, we chose an algorithm for extracting a skeleton from an image of a person or people, analyzing the position of the limbs and further classifying actions based on these data. The latter approach greatly complicates the task, but this method allows you to create large datasets from various sources and domains. Examples of finding objects are shown in Figures 2 and 3.

Figure 2. An example of person detection (from public access video)

Figure 3. An example of person detection (from public access video)

Publications list

2022

Publication (off plan)

Kozhirbayev, Z., Islamgozhayev, T., Yessenbayev, Z., & Sharipbay, A. (2022). Preliminary tasks of unsupervised speech recognition based on unaligned audio and text data. In 2022 International Conference on Engineering & MIS (ICEMIS) (pp. 1-3)

Published

2023

Publication in a Scopus journal (off plan)

Zhanibek Kozhirbayev *, Talgat Islamgozhayev. Cascade Speech Translation for the Kazakh language. MDPI Applied Sciences, Acoustic and Vibrations.

Accepted

Due date – August, 2023