SIGMAP 2013 Abstracts


Area 1 - Multimedia and Communications

Full Papers
Paper Nr: 27
Title:

Optimization of Free Viewpoint Interpolation by Applying Adaptive Depth Plane Distributions in Plane Sweeping - A Histogram-based Approach to a Non-uniform Plane Distribution

Authors:

Patrik Goorts, Steven Maesen, Maarten Dumont, Sammy Rogmans and Philippe Bekaert

Abstract: In this paper, we present a system to increase performance of plane sweeping for free viewpoint interpolation. Typical plane sweeping approaches incorporate a uniform depth plane distribution to investigate different depth hypotheses to generate a depth map, used in novel camera viewpoint generation. When the scene consists of a sparse number of objects, some depth hypotheses do not contain objects and can cause noise and wasted computational power. Therefore, we propose a method to adapt the plane distribution to increase the quality of the depth map around objects and to reduce computational power waste by reducing the number of planes in empty spaces in the scene. First, we generate the cumulative histogram of the previous frame in a temporal sequence of images. Next, we determine a new normalized depth for every depth plane by analyzing the cumulative histogram. Steep sections of the cumulative histogram will result in a dense local distribution of planes; a flat section will result in a sparse distribution. The results, performed on controlled and on real images, demonstrate the effectiveness of the method over a uniform distribution and allows a lower number of depth planes, and thus a more performant processing, for the same quality.

Posters
Paper Nr: 46
Title:

SAO Filtering inside CTU Loop for High Efficiency Video Coding

Authors:

Adireddy Ramakrishna, N. S. Prashanth and G. B. Praveen

Abstract: In the HEVC standardization process, the In-loop filter module is added with a new video coding tool called sample adaptive offset (SAO). SAO is placed after de-blocking in video coding loop. The HM implementation (HM10.0, 2013) & the standard (Bross et al., 2013) indicates picture basis in-loop filtering i.e., both de-blocking and SAO. Although standard specifies picture basis de-blocking operation, it added a note indicating the possibility of CTU/CU level de-blocking execution. But there is no such mention of possibility for SAO execution at CTU/CU level. Standard explains about applying SAO filter on entire picture after reconstruction and de-blocking. But many-a-time, for the purpose of low-latency, better memory-bandwidth efficiency and cache performance, it is needed to implement SAO filter at CTU level for majority applications. As well, if any Hardware Accelerator (HWA)/ASIC to be developed for HEVC, all modules are very much expected to execute at CTU/CU level for better pipeline performance. This paper presents & discusses the possibility of bringing SAO at CTU level after de-blocking.

Paper Nr: 47
Title:

Design Approaches for Mode Decision in HEVC Encoder - Exploiting Parallelism at CTB Level

Authors:

Ramakrishna Adireddy and Suyash Ugare

Abstract: As CPU technology trend is strongly moving towards multi-core architectures, HEVC tried to embrace the parallel processing trend to possible extent. Hence, HEVC exploits some of the parallel processing capabilities like tiles, slices and WPP at frame level (Sullivan et al., 2012). Although slices, tiles and WPP can be used to achieve parallelism, they might end-up degrading either visual quality or compression efficiency. To address this problem, this paper tries to summarize/exploit the possible parallel processing capabilities of HEVC at Coding Tree Block (CTB) level with insignificant compromise in video quality and compression.

Area 2 - Multimedia Signal Processing

Full Papers
Paper Nr: 30
Title:

Improving the Performance of Speaker Verification Systems under Noisy Conditions using Low Level Features and Score Level Fusion

Authors:

Nassim Asbai, Messaoud Bengherabi, Farid Harizi and Abderrahmane Amrouche

Abstract: This paper provides an overview of low-level features for speaker recognition, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on); which has proven high noise robustness in the context of speaker verification. Using the TIMIT corpus the performance of the MFCC-asymmetric is compared with: the standard Mel-Frequency Cepstral Coefficients (MFCC) and The Linear Frequency Cepstral Coefficients (LFCC) under clean and noisy environments. To simulate real world conditions, the verification phase was tested with two noises (babble and factory) at different Signal-to-Noise Ratios (SNR) issued from NOISEX-92 database. The experimental results showed that MFCCs-asymmetric tapers (k=4) outperform other features in noisy condition. Finally, we have investigated the impact of consolidating evidences from different features by score level fusion. Preliminary results show promising improvement on verification rate with score fusion.

Paper Nr: 33
Title:

Multi-modal Bike Sensing for Automatic Geo-annotation - Geo-annotation of Road/Terrain Type by Participatory Bike-sensing

Authors:

Steven Verstockt, Viktor Slavkovikj, Pieterjan De Potter, Jürgen Slowack and Rik Van de Walle

Abstract: This paper presents a novel road/terrain classification system based on the analysis of volunteered geographic information gathered by bikers. By ubiquitous collection of multi-sensor bike data, consisting of visual images, accelerometer information and GPS coordinates of the bikers' smartphone, the proposed system is able to distinguish between 6 different road/terrain types. In order to perform this classification task, the system employs a random decision forest (RDF), fed with a set of discriminative image and accelerometer features. For every instance of road (5 seconds), we extract these features and map the RDF result onto the GPS data of the users' smartphone. Finally, based on all the collected instances, we can annotate geographic maps with the road/terrain types and create a visualization of the route. The accuracy of the novel multi-modal bike sensing system for the 6-class road/terrain classification task is 92%. This result outperforms both the visual and accelerometer only classification, showing that the combination of both sensors is a win-win. For the 2-class on-road/off-road classification an accuracy of 97% is achieved, almost six percent above the state-of-the-art in this domain. Since these are the individual scores (measured on a single user/bike segment), the collaborative accuracy is expected to even further improve these results.

Short Papers
Paper Nr: 25
Title:

Limitations of Super Resolution Image Reconstruction and How to Overcome them for a Single Image

Authors:

Seiichi Gohshi and Isao Echizen

Abstract: Super resolution image reconstruction (SRR) is a typical super resolution (SR) technology that has been researched with varying results. The SRR algorithm was initially proposed for still images. It uses many low-resolution images to reconstruct a high-resolution image. Unfortunately, in practice, we rarely have a sufficient number of low-resolution images for SRR to work. Usually, there is only one (or a few) blurry images. On the other hand, there is a need to improve blurry images in applications ranging from security and photo restoration to zooming functions and countless other examples related to the printing industry. Recently, SRR was extended to video sequences that have many similar frames that can be used as low-resolution images to reconstruct high-resolution frames. In normal SRR, one reconstructs a high-resolution image from lowresolution images sampled from one high-resolution image, but in the video application, the low-resolution video frames are not taken from higher resolution ones. This paper proposes a novel resolution improvement method that works without such a high- resolution image. Its algorithm is simple and can be applied to a single image and real-time video systems.

Paper Nr: 31
Title:

A Fast and Efficient Inter Mode Decision Algorithm for the H.264/AVC Video Coding Standard

Authors:

Skoudarli Abdellah, Nibouche Mokhtar and Serir Amina

Abstract: The H.264/AVC video coding standard is used in a wide range of applications from video conferencing to high-definition TV. Compared to the previous standard, the H.264/AVC has significantly better performance in terms of PSNR and visual quality at the same bit rate. It uses a complex mode decision technique based on rate-distortion optimization (RDO). Therefore, this technique introduces a high computational complexity. However, the computational complexity is one key challenge for the high efficient compression. In order to reduce the H.264/AVC complexity a new efficient and fast mode decision algorithm, based on the spatial homogeneity and temporal stationary characteristics of the current macroblock, is proposed in this paper. The experimental results show that the proposed algorithm is able to reduce up to 66,90 % of the computational complexity compared to the high complexity algorithm in the JM16.1 reference software with tolerant performance degradation.

Paper Nr: 38
Title:

Self-formable Optical Interconnection for Selective Transmission of Frequent Signals

Authors:

Mitsunori Saito, Kohei Sakiyama and Tatsuya Nishimura

Abstract: A smart polymer was fabricated to construct a photonic interconnection that transmitted frequent signals and blocked occasional signals. A photochromic dye in the polymer was colored by violet light and bleached by green light. Green signal pulses self-formed their optical path by bleaching dye molecules as they propagated in the opaque (colored) polymer. Molecular diffusion through the polymer matrix allowed the bleached molecules to be replaced by colored molecules in the surrounding, thereby erasing the optical path. These self-formation and self-healing functions induced selective transmission of frequent pulse signals. Experiments were conducted with green pulses (532 nm wavelength) with 10 ms duration and 5 mW peak power. When the pulse frequency was 40 pulse/s, the signal began to emerge from the opaque polymer in 20 s. When the average pulse frequency decreased to 2 pulse/s, however, the output signal disappeared gradually.

Posters
Paper Nr: 8
Title:

Simplified Computation of l2-Sensitivity for 1-D and a Class of 2-D State-Space Digital Filters Considering 0 and +-1 Elements

Authors:

Yoichi Hinamoto and Akimitsu Doi

Abstract: A simplified method of computing an improved l_2-sensitivity measure is developed for state-space digital filters by reducing the number of the Lyapunov equations, and it is expanded into a class of two-dimensional (2-D) state-space digital filters. First, a conventional improved l_2-sensitivity for state-space digital filters is reviewed and simplified to two novel forms so that the number of the Lyapunov equations is reduced. Next, the resulting mehod is expanded into a class of 2-D state-space digital filters. Finally, two numerical examples are presented to evaluate more precise (improved) l_2-sensitivity measures for 1-D and a class of 2-D state-space digital filters by employing the proposed methods.

Paper Nr: 16
Title:

Length of Phonemes in a Context of their Positions in Polish Sentences

Authors:

Magdalena Igras, Bartosz Ziółko and Mariusz Ziółko

Abstract: The paper presents statistical phonetic data of Polish collected from a corpus. Lengths of phonemes vary from 5 ms to 670 ms. Average durations of Polish phonemes are presented as well as an important anomaly of longer phonemes in the end of sentences, which is the main topic of the paper. This observation can be used in speech recognition for automatic insertation of dots and sentence modelling. Data of 45 speakers, 5130 sentences in total, were described and compared with the values taken from the phonetic literature.

Paper Nr: 23
Title:

A TV Commercial Retrieval System based on Audio Features

Authors:

Jose E. Borras, Jorge Igual, Carlos Fernandez-Llatas and Vicente Traver

Abstract: In spite of new digital platforms, television (TV) continues to be the most influential advertising medium. The advertisers need to verify that their commercials are broadcasted on TV in the number and time they pay for them. Nowadays, this job is done manually by visual inspection of recordings of the broadcasted signal every day, consuming a lot of human resources. We present a system that automatize the process of identification of TV commercials. It is based on the detection of target commercials using their audio features. With the purpose of reducing the time of detection and the storage requirements, it uses audio features in a compact transformed domain. The algorithm is based on the similarities in the cepstral domain of the commercial to be detected and the audio recording of the TV signal. The results show that the system is able to obtain a satisfactory detection rate in a short time (detection rate above 90% with no false alarms), allowing the analysis of long recordings in a fast way.

Paper Nr: 45
Title:

An Unsupervised Ensemble-based Markov Random Field Approach to Microscope Cell Image Segmentation

Authors:

Bálint Antal, Bence Remenyik and András Hajdu

Abstract: In this paper, we propose an approach to the unsupervised segmentation of images using Markov Random Field. The proposed approach is based on the idea of Bit Plane Slicing. We use the planes as initial labellings for an ensemble of segmentations. With pixelwise voting, a robust segmentation approach can be achieved, which we demonstrate on microscope cell images. We tested our approach on a publicly available database, where it proven to be competitive with other methods and manual segmentation.

Area 3 - Multimedia Systems and Applications

Short Papers
Paper Nr: 18
Title:

Multi-Point Measurement System and Data Processing for Earthquakes Monitoring

Authors:

Valery Korepanov and Fedir Dudkin

Abstract: Lithospheric ultra low frequency (ULF) magnetic activity is recently considered as very promising candidate for application to short-time earthquake (EQ) forecasting. However the intensity of the ULF lithospheric magnetic field is very weak and often masked by much stronger ionospheric and magnetospheric signals. The study of pre-EQ magnetic activity before the occurrence of strong EQ is a very hard problem which consists of the identification and localization of weak signal sources in EQ-hazardous areas of the Earth’s crust. A new approach is developed to find a source of pre-EQ ULF electromagnetic activity of lithospheric origin. For separation and localization of such sources a new polarization ellipse technique has been used to process data acquired from 3-component magnetometers. The polarization ellipse is formed by the magnetic field components at the measurement station. Calculations based on polarization ellipse parameters from two distant points allow discrimination of seismo-EM signals from natural background ULF signals. The results of experimental verification of this method in Kanto region (Japan), known as one of the most seismoactive, are given which partially confirm its efficiency and give hope, with its further improvement, to the progress in the EQ precursors reliable detection in other regions of the Globe, particularly, in Iceland known by the active seismic activity.

Paper Nr: 29
Title:

A Linearvibrotactile Actuator for Mobile Devices

Authors:

Sang-Youn Kim, Bonggoo Kim and Tae-Heon Yang

Abstract: Although, the current vibrotactile actuators are widely used for haptic interaction with mobile devices, they have still problems to be solved before accepting in many mobile devices. The most critical problem is that the conventional vibrotactile actuators creates vibrotactile signal with limited frequency bandwidth. The vibrotactile actuator with large frequency bandwidth allows a user to delicately and immersively manipulate mobile devices. This paper presents a new vibrotactile actuator which creates vibrotactile signals with a large frequency bandwidth. In our actuators, vibrotactile signal is generated by interaction between solenoids and a permanent magnet. Experiments are conducted to investigate whether the proposed actuator generates enough output force to stimulate human skin across a large frequency bandwidth. The result of the experiments demonstrates that the proposed actuator is suitable for the haptic interaction with mobile devices.

Paper Nr: 39
Title:

Automatic Attendance Rating of Movie Content using Bag of Audio Words Representation

Authors:

Avi Bleiweiss

Abstract: The sensory experience of watching a movie, links input from both sight and hearing modalities. Yet traditionally, the motion picture rating system largely relies on the visual content of the film, to make its informed decisions to parents. The current rating process is fairly elaborate. It requires a group of parents to attend a full screening, manually prepare and submit their opinions, and vote out the appropriate audience age for viewing. Rather, our work explores the feasibility of classifying age attendance of a movie automatically, resorting to solely analyzing the movie auditory data. Our high performance software records the audio content of the shorter movie trailer, and builds a labeled training set of original and artificially distorted clips. We use a bag of audio words to effectively represent the film sound track, and demonstrate robust and closely correlated classification accuracy, in exploiting boolean discrimination and ranked retrieval methods.

Paper Nr: 40
Title:

Towards Live Subtitling of TV Ice-hockey Commentary

Authors:

Aleš Pražák, Josef V. Psutka, Josef Psutka and Zdeněk Loose

Abstract: This paper deals with live subtitling of TV ice-hockey commentaries using automatic speech recognition technology. Two methods are presented - a direct transcription of a TV program and a re-speaking approach. Practical issues emerging from the real subtitling system are introduced and their solutions are proposed. Acoustic and language modelling is described as well as modifications of existing live subtitling application. Both methods were evaluated during simulated subtitling and their results are discussed.

Paper Nr: 43
Title:

SignalBIT - A Web-based Platform for Real-time Biosignal Visualization and Recording

Authors:

Ana Priscila Alves, Hugo Plácido da Silva, André Lourenço and Ana Fred

Abstract: Biosignals have had an increasingly important role in the research and development of new applications for healthcare, sports, quality of life, and many other fields. Still, researchers are often faced with problems related with the ease-of-use and practicality of software tools for rapid prototyping of applications that involve biosignal acquisition and processing. Typically, there are either highly flexible scientific computing tools or custom developed and application-specific tools, the former being often characterized by long learning curves and limited user interface design capabilities, while the latter is often characterized by poor cross-platform compatibility, and overheads in terms of development time when new features are needed. In this paper we present a versatile, flexible, and extensible software framework for rapid prototyping of end-user applications, specifically targeted at biosignal acquisition and post-processing. We build on the advantages of combining web technologies with the Python programming language, to improve the usability, interaction, cross-platform compatibility, extensibility, and flexibility of biosignal-based applications.

Paper Nr: 44
Title:

A Data Cube Model for Surveillance Video Indexing and Retrieval

Authors:

Hansung Lee, Sohee Park and Jang-Hee Yoo

Abstract: We propose a novel data cube model, viz., \emph{SurvCube}, for the multi-dimensional indexing and retrieval of surveillance videos. The proposed method provides the multi-dimensional analysis of interesting objects in surveillance videos according to the chronological view, events and locations by means of data cube structure. By employing the OLAP operation on the surveillance videos, it is able to provides desirable functionalities such as 1) retrieval of objects and events at a different level of abstraction, i.e., coarse to fine grained retrieval; 2) providing the tracing of interesting object trajectories across the cameras; 3) providing the summarization of surveillance video with respect to interesting objects (and/or events) and abstract level of time and locations.

Posters
Paper Nr: 12
Title:

A Parity-based Error Control Method for Distributed Compressive Video Sensing

Authors:

Shou-ning Chen, Bao-yu Zheng and Liang Zhou

Abstract: A novel framework called distributed compressive video sensing (DCVS), combining distributed video coding (DVC) and compressive sensing (CS), directly capture the raw video data as measurements with low-complexity and low-cost process. It meets the requirements of distributed system very well, because of its resource consumption shifting from encoder to decoder. Nevertheless, the issue of measurements transmission in bit error channel has not been considered yet in the previous work of DCVS. This paper improved the existing DCVS codec scheme by adding the quantization and inverse quantization process, and proposed a parity-based error control (PEC) method. This method is simple enough, and has high coding efficiency. The proposed method is shown to increase video recovery quality greatly under binary symmetric channel.

Paper Nr: 13
Title:

REENACT: Augmented Reality and Collective Role Playing to Enhance the Pedagogy of Historical Events - An EXPERIMEDIA Experiment

Authors:

Martin Lopez-Nores, Yolanda Blanco-Fernandez, José J. Pazos-arias, Alberto Gil-solla, Jorge Garcia-Duque and Manuel Ramos-Cabrer

Abstract: Much of human history has been shaped by the outcomes of countless battles and wars. Unfortunately, the classical pedagogy of these events merely tells about who were the belligerent forces, how long the fights lasted for, and who ended up winning. We present one proposal to engage groups of people into immersive collective experiences that will make them learn about a certain battle or war both from the inside, as reenactors, and from the outside, as historians. The participants will be equipped with smartphones or tablets that interact with the technological facility developed within the EXPERIMEDIA FP7 project, which provides support for the implementation, deployment and execution of distributed live games, social networking features and augmented reality.

Paper Nr: 32
Title:

Fully Automatic Saliency-based Subjects Extraction in Digital Images

Authors:

Luca Greco, Marco La Cascia and Francesco Lo Cascio

Abstract: In this paper we present a novel saliency-based technique for the automatic extraction of relevant subjects in digital images. We use enhanced saliency maps to determine the most relevant parts of the images and an image cropping technique on the map itself to extract one or more relevant subjects. The contribution of the paper is two-fold as we propose a technique to enhance the standard GBVS saliency map and a technique to extract the most salient parts of the image. The GBVS saliency map is enhanced by applying three filters particularly designed to optimize the performance for the task of relevant subjects extraction. The extraction of relevant subjects is demonstrated on a manually annotated dataset and results are encouraging. A variation of the same technique has also been used to extract the most significant region of an image. This region can then be used to obtain a thumbnail keeping most of the relevant information of the original image and discarding nonsignificant background. Experimental results are reported also in this case.

Paper Nr: 34
Title:

Markerless Augmented Reality based on Local Binary Pattern

Authors:

Youssef Hbali, Mohammed Sadgal and Abdelaziz EL Fazziki

Abstract: Augmented reality is becoming the future of e-commerce, throw their mobile devices, customers have access to all kind of information, going from weather, news papers, shops and so on. Today’s mobiles devices are so powerful to the point that they can be used as a platform of virtual try-on systems. Over this paper we present a virtual eye glasses try-on system based on augmented reality and LBP for face and eyes detection. The well-known machine learning Ada Boost algorithm is used for real time eyes tracking, the resulting face and eyes positions are continuously utilized to overlay the glasses model over the face. The system helps evaluating glasses before trying them in the store and makes possible the design of its own style.