Библиографическое описание:

Горбина М. А., Левина О. С. Cloud computing technology: future development // Молодой ученый. — 2016. — №10. — С. 163-166.



Nowadays cloud computing technology is being actively developed. The main idea is to provide the user with software and computer resources as Internet services, while placing all the applications and data, that are needed to work on a remote server on the Internet [1].

This technology finds many applications, for example, in the field of education. The examples are electronic diaries, journals, electronic books, exercise equipment, diagnostic, test and training systems, laboratory systems, digital libraries, computer programs. In addition, such systems can be used to control the examination or for political elections.

Cloud computing technology together with computer vision technology is used to manage software and smart tools in large-scale video surveillance system, such as schools, stadiums, psychotic hospitals, prisons in order to ensure security and order. This technology is called situational video analytics, and it allows not only to detect an object and track its movement, but also to classify the behavior of an object based on user-defined rules.

Such video surveillance systems are widely deployed in various spheres of life.

There are methods that have been tested on a different set of videos and using different specifications for the recognition of violence. The method identifies the violent scenes video from the movie using the classifier trained scenes of violent changes, explosions, blood and audio frame information [2]. Another method is based on audiovisual information [3] and uses statistics of audio features and average motion and motion orientation variance features in video combined in a k-Nearest Neighbor classifier to decide whether the given sequence is violent. But there is the problem within the context of video sharing sites by using textual tags along with audio and video [4,5].

The method for detecting scenes of violence in certain video features is also used in the method [6] presented as a violence detector built on the concept of visual codebooks using linear support vector machines. The main difference from the existing works of violence detection in what concern the data representation, as none has considered local spatio-temporal features with bags of visual words. The evaluation of the importance of local spatio-temporal features for characterizing the multimedia content is conducted through the cross-validation method.

“Visual words” are distinctive feature vectors, i.e. features considered informative enough to account for the underlying patterns of a set of visual data. They compose what is called the visual codebook. Depending on the intrinsic characteristics of the interest point detector and feature descriptor are used to compute the feature vectors. Low-level features of each video are computed by using a feature descriptor, e.g.SIFT (Spatio-temporal features), STIP (Space-Time Interest Points).

Table 1

Performance of shot classification using oriented-gradient features with 100-word codebook

SIFT

(%)

Violent

Non-Violent

Violent

80.09

19.91

Non-Violent

14.65

85.35

Table 2

Performance of shot classification using spatio-temporal features with 100-word codebook

STIP

(%)

Violent

Non-Violent

Violent

99.54

0.46

Non-Violent

0

100.0

The results obtained confirm that motion patterns are crucial to distinguish violence from regular activities in comparison with visual descriptors that rely on the space domain.For the experiment of recognition of violence scenes videos of two categories were picked up: violence and non-violence samples.In total, there are 400 videos, 200 of them compose each category. The results of Tables 1 and 2 show the classification performance of the method with SIFT and STIP, respectively. SIFT are decisive to better define what is in fact relevant to separate the different categories, obviously, provided that the difference among the classes strongly takes into account motion patterns. The results somehow claim the relevant work with the space-time domain for encountering unique characteristics of the behavior of the interest structures in contrast to a visual descriptor that relies solely on the space domain [7].

Fig. 1 — Two consecutive frames in a fight clip from a movie

The key role in a problem of recognition of violence belongs to sharp, rapid movements, which can be obtained from the trajectories tracked points with using an Acceleration Measure Vectors (AMV) [8]. However, it should be noted that the emergency acceleration is blurring, which makes tracking be less accurate or impossible. Figure 1 on the left side of the frame can be seen motion blur. Motion blur involves a shift in image content towards low frequencies. Such behavior allows building an effective acceleration estimator for video. This is necessary in order to show existence of a sudden movement between the two frames, the power spectrum of the second frame is in the form of an ellipse [9].The ellipse is perpendicular to the direction of movement, the frequency of the ellipse is relaxed (figure 2).

Fig. 2. Left: Sample image. Center: simulated camera motion at 45◦. Right: Fourier transform of the center image

It is very important that the eccentricity of the ellipse depends on the acceleration. Essentially, the method is aimed at the detection of the sudden occurrence of an ellipse [10].

Let be two consecutive frames. Motion blur is equivalent to applying a low-pass oriented filter C.

(1)

where F () denotes the Fourier Transform. Then:

(2)

The low-pass oriented filter in C is the above-mentioned ellipse. For each pair of consecutive frames, we compute the power spectrum using the 2D Fast Fourier Transform (in order to avoid edge effects, a Hanning window was applied before computing the FFT). These spectra are denote as, and a simple computation image:

(3)

Since the proposed method does not involve tracking or optical-flow techniques it is more suitable for measuring extreme accelerations. The problem is that the camera movement could also cause blur in the image. Therefore to get rid of the blur using DE convolution preprocessing step. All other blur will remain in the frame. When backgrounds are relatively uniform and displacement small, global motion estimation may still fail. The window function mentioned above restricts processing to the inner part of the image. When motion is global, changes in the outer part of the image will be relatively on par with those in the inner part. To eliminate this disadvantage the additional function «Outer Energy» is computed and used.

Investigation of large masses of people is an important task in video surveillance systems, but challenging due to their uncertain nature of the form.First, optical flow are estimated and then it is used for the process an adjacency-matrix based adjacency matrix clustering (AMC) [11]. After obtained a group of people, their behavior characteristics are the direction, the position and the size of the crowd.

The method is able to predict the behavior of a person in the crowd, based on the power model, and then to detect abnormal situations.First, there is a preliminary clustering of optical flow that occurs by algorithm Lucas-Kanade (LK) [12], producing effective results of intensive streams in a video sequence. This process helps to predict model force field, because removed from the zero power point, i.e. those point-pair, which are in the same spatial position. The second stage, the composition of the adjacency matrix based on the adjacency matrix-based clustering (AMC). Further, we can create a model of the human crowd.

Figure 3 shows an example of modeling the crowd using force field. It can be seen that the crowd in the lower left part of the image is the strongest among the three groups of people, so the crowd will move in this direction with high probability.

Fig. 3. (a) Clusters obtained from AMC algorithm; (b) optical flows of each clusters and (c) dominant forces computed by Eq. (6)

For performance comparison is used the UMN dataset [13] to conduct the event detection process and compare to the baseline approach using pure optical flows and the social force model based approach (SFM) in [14] based on the detected time instant that the unusual event begins. Experimental results obtained by using extensive dataset have shown that this system is effective in detect unusual events for uncontrolled environment of surveillance videos.

Technology of violence is applied to solve the task of revealing dangerous situations while monitoring a small number of people and observing mass events, when outbreaks of violence is difficult to see without the special video systems. In general, the methods of recognition of violence based on the identification of the features of unusual behavior use various descriptors, characteristics and special clusters. Thus, it is clear that the problem of detection of violence in real time is relevant and constantly evolving.

References:

  1. В. В. Грибова, А. С. Клещев, Д. А. Крылов и др. Облачная платформа для разработки и управления интеллектуальными системами, Материалы конф.: Open Semantic Technologies for Intelligent Systems, 2011
  2. Lin, J. and Wang, W. (2009). Weakly-supervised violence detection in movies with audio and video based co- training. In Proceedings of the 10th Pacific Rim Conference on Multimedia, pages 930–935, Berlin, Heidelberg. Springer-Verlag.
  3. Giannakopoulos, T., Kosmopoulos, D., Aristidou, A., and Theodoridis, S. (2006). Violence content classification using audio features. In Advances in Artificial In- telligence, volume 3955 of Lecture Notes in Computer Science, pages 502–507.]
  4. Zou, X., Wu, O., Wang, Q., Hu, W., and Yang, J. (2012). Multi-modal based violent movies detection in video sharing sites. In IScIDE, pages 347–355.
  5. O.Deniz, I.Serrano, G.Bueno, T-K. Kim Fast violence detection in video (2014) The 9th International Conference on Computer Vision Theory and Applications (VISAPP)
  6. Violence Detection in Video Using Spatio-Temporal Features
  7. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, 2004.
  8. Datta, A., Shah, M., and Lobo, N. D. V. (2002). Person-on- person violence detection in video data. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 1, pages 433–438.
  9. Barlow, H. B. and Olshausen, B. A. (2004). Convergent evidence for the visual analysis of optic flow through anisotropic attenuation of high spatial frequencies. Journal of Vision, 4(6):415–426.
  10. O.Deniz, I.Serrano, G.Bueno, T-K. Kim Fast violence detection in video (2014) The 9th International Conference on Computer Vision Theory and Applications (VISAPP)
  11. Duan-Yu Chen, Po-Chung Huang, “Motion-based unusual event detection in human crowds”, J. Vis. Commun. Image R. 22, pp.178–186, 2011
  12. Lucas B. D., Kanade T., An iterative image registration technique with an application to stereo vision, in: Proceedings of the 1981 DARPA Imaging Understanding Workshop, 1981, pp. 121–130
  13. University of Minnesota — Crowd Activity Dataset. .
  14. Ramin Mehran, Alexis Oyama, Mubarak Shah, Abnormal crowd behavior detection using social force model, in: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Miami, 2009.
  15. Воронин В. В., Письменскова М. М., Марчук В. И., Морозова Т. В. Методы построения дескрипторов применительно к задаче распознавания действий человека на основе пространственно-временной обработки видеопоследовательности / Инновации, экология и ресурсосберегающие технологии (ИнЭРТ-2014) [Электронный ресурс]: труды XI международного научно-технического форума / ДГТУ; под ред. А. Д. Лукьянова — Ростов н/Д: ДГТУ, 2014. — С. 1326–1332.

Обсуждение

Социальные комментарии Cackle