A parallel spatiotemporal saliency and discriminative online learning method for visual target tracking in aerial videos

Citation metadata

From: PLoS ONE(Vol. 13, Issue 2)
Publisher: Public Library of Science
Document Type: Report
Length: 6,024 words
Lexile Measure: 1380L

Document controls

Main content

Article Preview :

Author(s): Amirhossein Aghamohammadi 1,*, Mei Choo Ang 1, Elankovan A. Sundararajan 2, Ng Kok Weng 3, Marzieh Mogharrebi 1, Seyed Yashar Banihashem 4


Visual tracking is an active research topic in computer vision. It has been used for many applications, such as activity recognition, surveillance, robotics, and human-computer interaction [1]. It has also been used for aerial video processing, such as tracking and object recognition, and is essential for intelligent remote sensing technologies such as unmanned aerial vehicle (UAV). In contrast to fixed cameras, aerial videos is more portable and can conduct reconnaissance and surveillance [2]. However, visual tracking algorithms and systems often fail on aerial videos. The sources of this failure include appearance variations in the target image caused by relative camera and target motion and inadequate spatial resolution or noise, scale changes, and pose variations [3-5]. The explicit modelling of target appearance provides one approach to deal with the problem of the variation of the target's appearance during tracking. Usually, appearance modelling subsystems are composed of modules that provide a visual representation and a means of updating the model. [6]. The visual representation significantly influences the performance of appearance modelling due to changes in target appearance in the images. A suitable representation could use visual properties, such as color, texture, intensity gradients, and saliency to represent the targets and other objects in the scene. The represented targets can be incrementally updated based on the updated model to generate sample model of target [7]. Therefore, an efficient visual representation is crucial to describe the target in the scene and generate a sample model [4,8].

Recently, biological features reported promising results in computer vision systems. With recent development involving biological features, visual saliency detection have attracted the attention of researchers for extracting Attentional Regions (AR) in the images [9]. The visual saliency detection is inspired by biological human mechanisms, specifically eye mechanisms and vision fixation, indicating that human perception is sensitive to more salient regions [10,11]. The salient regions in the image are called saliency. Based on the visual saliency detection and AR extraction, various studies have been carried out to detect moving objects in videos. The visual saliency detection methods for moving object detection can be categorized into temporal, spatial, and integrated (spatiotemporal)-based methods. Temporal saliency is generally used to extract the motion cues in videos. However, temporal saliency detection alone is not efficiently able to detect the moving regions due to the lack of spatial information, leading to missing detail of the target appearance representation [2]. However, spatial-based saliency detection are mostly used to process static images [2]. Therefore, the temporal and spatial saliencies can be integrated and called spatiotemporal saliency, which is capable of effectively detecting moving regions.

Spatial saliency detection is the main task in spatiotemporal saliency, as it deals with the target's visual representation. Numerous spatial saliency detection methods have been proposed in literature, based on multi-scale image features [11], graph-based visual saliency (GBVS) [12], quaternion discrete cosine transform (QDCT) [13], Fourier Transform (FT)...

Source Citation

Source Citation   

Gale Document Number: GALE|A527419875