SIGMAP 2016 Abstracts


Area 1 - Multimedia and Communications

Full Papers
Paper Nr: 16
Title:

Motion Error Compensation for Quad-rotor Miniature Unmanned Aerial Vehicle SAR Imaging

Authors:

Song Zhou, Lei Yang, Gang Xu and Guoan Bi

Abstract: Quad-rotor miniature unmanned aerial vehicle (QMUAV) synthetic aperture radar (SAR) is an ultra-small airborne SAR system. Because of lowing flying altitude and small size constraints, the motion errors of QMUAV-SAR are very complicated which introduces difficulty to the QMUAV-SAR imaging processing. To deal with this problem, an effective motion compensation approach for QMUAV-SAR is proposed. By establishing the relationship between the motion errors and the Doppler parameters of SAR echoes, the motion errors of QMUAV platform are extracted from the estimated Doppler rates. After the majority of the motion error being properly compensated, phase gradient autofocusing (PGA) is employed to estimate and compensate the residual phase errors to further improve the focusing quality of the SAR image. Experimental results are provided and the image quality is evaluated to demonstrate the ability of achieving well focused image and high spacial resolutions of the proposed method.

Posters
Paper Nr: 32
Title:

Super Resolution for Smartphones

Authors:

Seiichi Gohshi, Sakae Inoue, Isao Masuda, Takashi Ichinose and Yoshika Tatsumi

Abstract: Smartphones were developed as an advanced communication tool. Currently they are used in various applications. The display is one of the most important features in smartphones. Compared with television (TV) and cinema screens the display size of a smartphone is small. However, TV and film content is commonly enjoyed on smartphone screens. Currently, the smartphone display is one of the most used displays for various kinds of content. In the past it was thought that it would be difficult to recognize the resolution differences on small displays. However, this is no longer the case. The resolution of smartphones have been steadily improving, and high-definition television (HDTV) (1,920×1,080 pixels) viewing resolution support is common. Signal processing is another way to improve resolution. Super resolution (SR) has become an interesting research field and is applied to images and videos. SR is a technology for improving display resolution. Consequently, SR is mainly studied for application to TV screens and computer displays. SR technology algorithms are complex and a heavy load for a smartphone’s central or graphics processing unit (CPU/GPU). It is very difficult to apply SR for real-time videos on smartphones. Consequently, there have been no reports in SR for smartphones. This paper proposes a method for implementing real-time SR in smartphones. This method works for real-time videos on a smartphone GPU with the developed software.

Area 2 - Multimedia Signal Processing

Full Papers
Paper Nr: 7
Title:

Multiple Classifier Learning of New Facial Extraction Approach for Facial Expressions Recognition using Depth Sensor

Authors:

Nattawat Chanthaphan, Keiichi Uchimura, Takami Satonaka and Tsuyoshi Makioka

Abstract: In this paper, we are justifying the next step experiment of our novel feature extraction approach for facial expressions recognition. In our previous work, we proposed extracting the facial features from 3D facial wire-frame generated by depth camera (Kinect V.2). We introduced the facial movement streams, which were derived from the distance measurement between each pair of the nodes located on human facial wire-frame flowing through each frame of the movement. The experiment was conducted by using two classifiers, K-Nearest Neighbors (K-NN) and Support Vector Machine (SVM), with fixed values of k parameter and kernel. 15-people data set collected by our software was used for the evaluation of the system. The experiment resulted promising accuracy and performance of our approach in the last experiment. Consequently, we were anticipating to know the best parameters that would reflect the best performance of our approach. This time experiment, we try tuning the parameter values of K-NN as well as kernel of SVM. We measure both accuracy and execution time. On the one hand, K-NN overcomes all other classifiers by getting 90.33% of accuracy, but on the other hand, SVM consumes much time and gets just 67% of accuracy.

Paper Nr: 19
Title:

Study on the Use of Deep Neural Networks for Speech Activity Detection in Broadcast Recordings

Authors:

Lukas Mateju, Petr Cerva and Jindrich Zdansky

Abstract: This paper deals with the task of Speech Activity Detection (SAD). Our goal is to develop a SAD module suitable for a system for broadcast data transcription. Various Deep Neural Networks (DNNs) are evaluated for this purpose. Training of DNNs is performed using speech and non-speech data as well as artificial data created by mixing of both these data types at a desired level of Signal-to-Noise Ratio (SNR). The output from each DNN is smoothed using a decoder based on Weighted Finite State Transducers (WFSTs). The presented experimental results show that the use of the resulting SAD module leads to a) a slight improvement in transcription accuracy and b) a significant reduction in the computation time needed for transcription.

Paper Nr: 23
Title:

Study on the Use and Adaptation of Bottleneck Features for Robust Speech Recognition of Nonlinearly Distorted Speech

Authors:

Jiri Malek, Petr Cerva, Ladislav Seps and Jan Nouza

Abstract: This paper focuses on the robust recognition of nonlinearly distorted speech. We have reported (Seps et al., 2014) that hybrid acoustic models based on a combination of Hidden Markov Models and Deep Neural Networks (HMM-DNNs) are better suited to this task than conventional HMMs utilizing Gaussian Mixture Models (HMM-GMMs). To further improve recognition accuracy, this paper investigates the possibility of combining the modeling power of deep neural networks with the adaptation to given acoustic conditions. For this purpose, the deep neural networks are utilized to produce bottleneck coefficients / features (BNC). The BNCs are subsequently used for training of HMM-GMM based acoustic models and then adapted using Constrained Maximum Likelihood Linear Regression (CMLLR). Our results obtained for three types of nonlinear distortions and three types of input features show that the adapted BNC-based system (a) outperforms HMM-DNN acoustic models in the case of strong compression and (b) yields comparable performance for speech affected by nonlinear amplification in the analog domain.

Short Papers
Paper Nr: 6
Title:

A Spatio-temporal Approach for Video Caption Extraction

Authors:

Liang-Hua Chen, Meng-Chen Hsieh and Chih-Wen Su

Abstract: Captions in videos play an important role for video indexing and retrieval. In this paper, we propose a novel algorithm to extract multilingual captions from video. Our approach is based on the analysis of spatio-temporal slices of video. If the horizontal (or vertical) scan line contains some pixels of caption region then the corresponding spatio-temporal slice will have bar-code like patterns. By integrating the structure information of bar-code like patterns in horizontal and vertical slices, the spatial and temporal positions of video captions can be located accurately. Experimental results show that the proposed algorithm is effective and outperforms some existing techniques.

Paper Nr: 22
Title:

Contour Learning and Diffusive Processes for Colour Perception

Authors:

Francisco J. Diaz-Pernas, Mario Martínez-Zarzuela, Miriam Anton-Rodriguez and David González-Ortega

Abstract: This work proposes a bio-inspired neural architecture called L-PREEN (Learning and Perceptual boundaRy rEcurrent dEtection neural architecture). L-PREEN has three different feedback interactions that fuse the bottom-up and top-down contour information of visual areas V1-V2-V4-Infero Temporal. This recurrent model uses colour, texture, and diffusive features to generate surface perception and contour learning and recognition processes. We compare the L-PREEN model against other boundary detection methods using the Berkeley Segmentation Dataset and Benchmark (Martin et al., 2001). The results obtained show better performance of L-PREEN using quantitative measures.

Paper Nr: 24
Title:

Simplification of Moving 3D Scene Data on GPU

Authors:

Rajesh Chenchu, Nick Michiels, Sammy Rogmans and Philippe Bekaert

Abstract: Real-time large scale continuous image and geometry based data visualization, with an uninterrupted content delivery, quality and rendering, on home and mobile devices is difficult or even mostly impossible because of the low processing capabilities of these hardware devices. However, a gracefully simplified version of the same data can enable us to view the content without significant quality degradation. To do this in a graceful manner, we extended a well-known concept - called ’billboard cloud’ - for animated scene data and implemented this technique using the capabilities of the GPU to generate the simplified versions of large scale data sets.

Paper Nr: 30
Title:

Rapid Classification of Textile Fabrics Arranged in Piles

Authors:

Dirk Siegmund, Olga Kaehm and David Handtke

Abstract: Research on the quality assurance of textiles has been a subject of much interest, particularly in relation to defect detection and the classification of woven fibers. Known systems require the fabric to be flat and spread-out on 2D surfaces in order for it to be classified. Unlike other systems, this system is able to classify textiles when they are presented in piles and in assembly-line like environments. Technical approaches have been selected under the aspects of speed and accuracy using 2D camera image data. A patch-based solution was chosen using an entropy-based pre-selection of small image patches. Interest points as well as texture descriptors combined with principle component analysis were part of this evaluation. The results showed that a classification of image patches resulted in less computational cost but reduced accuracy by 3.67%.

Area 3 - Multimedia Systems and Applications

Full Papers
Paper Nr: 13
Title:

Robust Index Code with Digital Images on the Internet

Authors:

Minsu Kim, Kunwoo Lee, Katsuhiko Gondow and Jun-ichi Imura

Abstract: A new color code which has high robustness is proposed, called Robust Index Code (RIC for short). RIC can be used on digital images and link them with the database. There are several technologies embedding data into images such as QR Code and Digital watermark. QR Code cannot be used on digital images because it does not have robustness on digital images. Besides Digital watermark can be used on digital images, but embedded data cannot be extracted 100% on damaged images. From evaluation using our implemented RIC encoder and decoder, the encoded indexes can be extracted 100% on compressed images to 30%. We also implemented a doubt color correction algorithm for damaged images. In conclusion, RIC has the high robustness on digital images. Hence, it is able to store all the type of digital products by embedding indexes into digital images to access database, which means it makes a Superdistribution system with digital images realized. Therefore RIC has the potential for new Internet image services, since all the images encoded by RIC are possible to access original products anywhere.

Paper Nr: 20
Title:

Synthetic Workload Generation of Broadcast Related HEVC Stream Decoding for Resource Constrained Systems

Authors:

Hashan Roshantha Mendis and Leandro Soares Indrusiak

Abstract: Performance evaluation of platform resource management protocols, require realistic workload models as input to obtain reliable, accurate results. This is particularly important for workloads with large variations, such as video streams generated by advanced encoders using complex coding tools. In the modern High Efficiency Video Coding (HEVC) standard, a frame is logically subdivided into rectangular coding units. This work presents synthetic HEVC decoding workload generation algorithms classified at the frame and coding unit levels, where a group of pictures is considered as a directed acyclic graph based taskset. Video streams are encoded using a minimum number of reference frames, compatible with low-memory decoders. Characteristic data from several HEVC video streams, is extracted to analyse inter-frame dependency patterns, reference data volume, frame/coding unit decoding times and other coding unit properties. Histograms are used to analyse their statistical characteristics and to fit to known theoretical probability density functions. Statistical properties of the analysed video streams are integrated into two novel algorithms, that can be used to synthetically generate HEVC decoding workloads, with realistic dependency patterns and frame-level properties.

Paper Nr: 27
Title:

Recursive Total Variation Filtering Based 3D Fusion

Authors:

M. A. A. Rajput, E. Funk, A. Börner and O. Hellwich

Abstract: 3D reconstruction from mobile image sensors is crucial for many offline-inspection and online robotic application. While several techniques are known today to deliver high accuracy 3D models from images via offline-processing, 3D reconstruction in real-time remains a major goal still to achieve. This work focuses on incremental 3D modeling from error prone depth image data, since standard 3D fusion techniques are tailored on accurate depth data from active sensors such as the Kinect. Imprecise depth data is usually provided by stereo camera sensors or simultaneous localization and mapping (SLAM) techniques. This work proposes an incremental extension of the total variation (TV) filtering technique, which is shown to reduce the errors of the reconstructed 3D model by up to 77% compared to state of the art techniques.