Competition name:
Zhas galym 2022-2024, MSHE RK
Project coordinator:
Islamgozhayev T.U., PhD, assistant professor, Department of Computing and Data Science, Astana IT University
Researcher ID Web of Science – D-6524-2015
ORCID – 0000-0001-7891-242X
Scopus ID – 56826222900
Total amount financed:
18 745 180 tenge
The goal of the project is to research and develop an action recognition system based on images from CCTV cameras to solve the problems of object detection and action classification.
In the process of solving problems, the following results are expected to be obtained:
Upon completion, publication is expected:
– at least 2 (two) articles in journals from the first three quartiles by impact factor in the Web of Science database or with a CiteScore percentile in the Scopus database of at least 50.
In general, the task of recognition of actions is divided into the following subtasks shown in Figure 1.
Figure 1. Action recognition pipeline
At the moment, a module for determining objects (people) from the streaming video data of CCTV cameras has been developed, thus the first three subtasks of the problem have been solved. For this purpose, the YOLO architecture was used, followed by additional training using additionally collected and processed data sets in the amount of 1000 images. As a result, we managed to improve the model for determining people for the environment we need (premises, hangars, workplaces, etc.). In particular, we conducted a search and analysis of alternative works describing models and methods of image processing, finding objects and classifying actions. As a result, the YOLO7 algorithm with a high rate in finding objects was chosen as a model and algorithm for finding a person in the frame. To classify actions at a given point in time, we chose an algorithm for extracting a skeleton from an image of a person or people, analyzing the position of the limbs and further classifying actions based on these data. The latter approach greatly complicates the task, but this method allows you to create large datasets from various sources and domains. Examples of finding objects are shown in Figures 2 and 3.
Figure 2. An example of person detection (from public access video)
Figure 3. An example of person detection (from public access video)
2022 |
||
Publication (off plan) |
Kozhirbayev, Z., Islamgozhayev, T., Yessenbayev, Z., & Sharipbay, A. (2022). Preliminary tasks of unsupervised speech recognition based on unaligned audio and text data. In 2022 International Conference on Engineering & MIS (ICEMIS) (pp. 1-3) |
Published |
2023 |
||
Publication in a Scopus journal (off plan) |
Zhanibek Kozhirbayev *, Talgat Islamgozhayev. Cascade Speech Translation for the Kazakh language. MDPI Applied Sciences, Acoustic and Vibrations. |
Accepted Due date – August, 2023 |