SIGMAP 2017 Abstracts


Area 1 - Multimedia and Communications

Short Papers
Paper Nr: 15
Title:

Performance of Blind Deconvolution and Super Resolution Image Reconstruction

Authors:

Seiichi Gohshi and Michikazu Akasu

Abstract: Super Resolution (SR) is a technique for improving the resolution of digital images. Super Resolution Image Reconstruction (SRR) is one of the most common SR techniques. However, in addition to SRR, there are several other techniques to improve image resolution. A technique called Blind Deconvolution (BD) has been used to process out of focus images in the field of astronomy. When BD was first described, in the 1970s, it was not considered to be a viable candidate to be used for SR. However, the process of improving resolution is very similar to that of focusing images. SRR and BD both use iterations to create a high quality image from low resolution images. Compared with SRR, BD comes with some disadvantages. For example, algorithms sometimes cause divergences or limit cycles which means that the high resolution image cannot be obtained. In this study, we describe a method of fixing the issues that prevent BD from achieving a high-resolution image using simulation to increase its stability. The output from the improved algorithm for BD is compared with the current SR technique, SRR. We show that the BD technique is in fact superior to SRR.

Paper Nr: 19
Title:

An Improved Secure Image Transmission Technique via Mosaic Images by Nearly Reversible Color Transformations

Authors:

Freddy Acosta Buenaño, Gonzalo Olmedo Cifuentes, Inmaculada Mora-Jiménez and José Luis Rojo Álvarez

Abstract: Current multimedia communication systems seek to transmit images supporting contents by the data stream, which requires specific effort for their bandwidth optimization. In this scenario, security of data and information that is being transmitted is highly relevant. A new concept has been recently used in the field of secure image transmission, known as secret-fragment-visible mosaic image. In this case, a mosaic image is created and used as a camouflage for a secret image. The mosaic image is similar to a target image freely chosen. We propose here a technique based on a statistical simplification approach to reduce the number of required bits for recovering the secret image from the mosaic image. We benchmarked the performance of the method and the quality of the recovered image. Experimental results showed that the number of bits with the proposed method was dramatically reduced (close to 3 to 1), and quality metrics were about Root Mean Squared Error of 32 and Mean Structural Similarity of 0.7 for the created mosaic image. Our proposal paves the way towards improved procedures that are suitable for multimedia systems like digital terrestrial television and similar applications.

Area 2 - New Research and Developments in Multimedia

Posters
Paper Nr: 16
Title:

Should We Place the License Plate Tracking System in the Cloud?

Authors:

Razib Iqbal, Matthew Kenney and Jamil Saquer

Abstract: We developed a software system to extract and track vehicle license plate numbers from real-time surveillance cameras and crowd sourced video streams. The system can also calculate the probable routes of a vehicle over a range of dates based on the geographical coordinates. In this paper, we present both of our linear and parallel processing implementation schemes and analyze the performance based on evaluation results. Our results show that while cloud based parallel processing can address the scalability needs, performance outweighs the cost only when the real-time streaming data becomes increasingly large.

Area 3 - Multimedia Signal Processing

Full Papers
Paper Nr: 8
Title:

Plane Equation Features in Depth Sensor Tracking

Authors:

Mika Taskinen, Tero Säntti and Teijo Lehtonen

Abstract: The emergence of depth sensors has made it possible to track not only monocular cues but also the actual depth values of the environment. This is especially useful in augmented reality solutions, where the position and orientation (pose) of the observer need to be accurately determined. Depth sensors have usually been used in augmented reality as mesh builders and in some cases as feature extractors for tracking. These methods are usually extensive and designed to operate by itself or in cooperation with other methods. We propose a systematic light-weight algorithm to supplement other mechanisms and we test it against a random algorithm and ground truth.

Paper Nr: 11
Title:

WYA2: Optimal Individual Features Extraction for VideoSurveillance Re-identification

Authors:

Isaac Martín De Diego, Ignacio San Román, Cristina Conde and Enrique Cabello

Abstract: A novel method for re-identification based on optimal features extraction in VideoSurveillance environments is presented in this paper. A high number of features are extracted for each detected person in a dataset obtained from a camera in a scenario. An evaluation of the relative discriminate power of each bag of features for each person is performed. We propose a forward method in a Support Vector framework to obtained the optimal individual bags of features. These bags of features are used in a new scenario in order to detect suspicious persons using the images from a non-overlapping camera. The results obtained demonstrate the promising potential of the presented approach.

Paper Nr: 12
Title:

A Method for Traffic Sign Recognition with CNN using GPU

Authors:

Alexander Shustanov and Pavel Yakimov

Abstract: In recent years, the deep learning methods for solving classification problem have become extremely popular. Due to its high recognition rate and fast execution, the convolutional neural networks have enhanced most of computer vision tasks, both existing and new ones. In this article, we propose an implementation of traffic signs recognition algorithm using a convolution neural network. Training of the neural network is implemented using the TensorFlow library and massively parallel architecture for multithreaded programming CUDA. The entire procedure for traffic sign detection and recognition is executed in real time on a mobile GPU. The experimental results confirmed high efficiency of the developed computer vision system.

Posters
Paper Nr: 3
Title:

Frame Selection for Text-independent Speaker Recognition

Authors:

Abedenebi Rouigueb, Malek Nadil and Abderrahmane Tikourt

Abstract: In this paper, we propose a set of criteria for the selection of the most relevant frames in order to improve text-independent speaker automatic recognition (TISAR) task. The selection is carried out on the short term Cepstral feature vectors such as PLP and MFCC and performed at the front end processing level. The proposed criteria mainly attempt to select vectors lying far from the universal background model (UBM). Experiments are conducted on the MOBIO database and show that the selection allows an improvement in complexity (time and space) and in speaker identification rate, which is appropriate for real-time TISAR systems.

Area 4 - Multimedia Systems and Applications

Full Papers
Paper Nr: 10
Title:

A Computer-based Framework to Process Audiometric Signals using the Tomatis Listening Test System

Authors:

Félix Buendía-García, Manuel Agustí-Melchor, Cristina Pérez-Guillot, Hernán Cerna and Alvaro Capitán

Abstract: Some kinds of audio information are usually represented by images that need to be processed. This is the case of audiometric signals which are obtained from some devices that hardly produce quantifiable data. The current paper describes a computer-based framework able to process audiometer images in order to extract information which can be useful to analyse subject's hearing levels. Such information is complemented with additional data sources that allow a more comprehensive view of hearing issues either disorder symptoms or treatment results. These data sources are provided by the TLTS (Tomatis Listening Test System) device. The proposed framework is based on the use of OpenCV libraries that provide image processing functionalities together with scripts to manage audiometry spreadsheets. An experiment has been developed to test auditory stimulations in the context of a collaboration project with the Isora Solutions company where the proposed system was applied. Obtained results show the framework accuracy and adequacy to retrieve and process information from several audiometric data sources.

Short Papers
Paper Nr: 5
Title:

Automatic Lesser Kestrel’s Gender Identification using Video Processing

Authors:

Javier M. Mora-Merchan, Enrique Personal, Diego Francisco Larios, Francisco Javier Molina, Juan Carlos Tejero and Carlos Leon

Abstract: Traditionally, animal surveillance is a common task for biologists. However, this task is often accompanied by the inspection of huge amounts of video. In this sense, this paper proposes an automatic video processing algorithm to identify the gender of a kestrel species. It is based on optical flow and texture analysis. This algorithm makes it possible to identify the important information and therefore, minimizing the analysis time for biologists. Finally, to validate this algorithm, it has been tested against a set of videos, getting good classification results.

Paper Nr: 14
Title:

Cochlea-based Features for Music Emotion Classification

Authors:

Luka Kraljević, Mladen Russo, Mia Mlikota and Matko Šarić

Abstract: Listening to music often evokes strong emotions. With the rapid growth of easily-accessible digital music libraries there is an increasing need in reliable music emotion recognition systems. Common musical features like tempo, mode, pitch, clarity, etc. which can be easily calculated from audio signal are associated with particular emotions and are often used in emotion detection systems. Based on the idea that humans don’t detect emotions from pure audio signal but from a signal that had been previously processed by the cochlea, in this work we propose new cochlear based features for music emotion recognition. Features are calculated from the gammatone filterbank model output and emotion classification is then performed using Support Vector Machine (SVM) and TreeBagger classifiers. Proposed features are evaluated on publicly available 1000 songs database and compared to other commonly used features. Results show that our approach is effective and outperforms other commonly used features. In the combined features set we achieved accuracy of 83.88% and 75.12% for arousal and valence.

Posters
Paper Nr: 17
Title:

Towards Framework for Choosing 360-degree Video SDK

Authors:

Antti Luoto

Abstract: 360-degree videos are gaining popularity among consumers. Still, software developers are early adopters of technology so it is important to map their needs for 360-degree video development. They use software development kits that help creating software on the 360-degree video software domain. We want to find out which factors developers need to take into account when choosing these software development kits. In this position paper we describe a preliminary 360-degree video SDK choosing criteria, based on literature and our own experiences, which we plan to evaluate with a survey.