SIGMAP 2018 Abstracts


Area 1 - Multimedia and Communications

Short Papers
Paper Nr: 7
Title:

A Digital Countryside Notebook for Smart Agriculture and Oranges Classification

Authors:

T. Rotondo, G. M. Farinella, A. Chillemi, F. Ferlito and S. Battiato

Abstract: We present a digital countryside notebook designed to help land owners to monitor the operations performed on a cultivation. The system helps to collect and trace information over time through the developed mobile App. This guarantees the traceability of the product. Our system has many advantages, such as the easy collection of information and the reduction of the time in the analysis of the information acquired. To improve and automate the process of data collection, the system uses a classifier to label images of oranges during the plant monitoring.

Paper Nr: 21
Title:

Robust Statistics for Feature-based Active Appearance Models

Authors:

Marcin Kopaczka, Philipp Gräbel and Dorit Merhof

Abstract: Active Appearance Models (AAM) are a well-established method for facial landmark detection and face tracking. Due to their widespread use, several additions to the original AAM algorithms have been proposed in recent years. Two previously proposed improvements that address different shortcomings are using robust statistics for occlusion handling and adding feature descriptors for improved landmark fitting performance. In this paper, we show that a combination of both methods is possible and provide a feasible and effective way to improve robustness and precision of the AAM fitting process. We describe how robust cost functions can be incorporated into the feature-based fitting procedure and evaluate our approach. We apply our method to the challenging 300-videos-in-the-wild dataset and show that our approach allows robust face tracking even under severe occlusions.

Area 2 - New Research and Developments in Multimedia

Full Papers
Paper Nr: 8
Title:

A Comparison of Techniques based on Image Classification and Object Detection to Count Cars and Non-empty Stalls in Parking Spaces

Authors:

D. Di Mauro, A. Furnari, G. Patanè, S. Battiato and G. M. Farinella

Abstract: The world-wide growth of population in urban areas demands for the development of sustainable technologies to manage city services, such as transportation, in an efficient way. Motivated by the cost-effectiveness of image-based solutions, in this paper we investigate the exploitation of techniques based on image classification and object detection to count cars and non-empty stalls in parking areas. The analysis is performed on a dataset of images collected in a real parking area. Results show that techniques based on image classification are very effective when parking stalls are delimited by marking lines and the geometry of the scene is known in advance.

Area 3 - Multimedia Signal Processing

Full Papers
Paper Nr: 6
Title:

Real-Time Non-linear Noise Reduction Algorithm for Video

Authors:

Chinatsu Mori and Seiichi Gohshi

Abstract: Noise is an essential issue for images and videos. Recently, a range of high-sensitivity imaging devices have become available. Cameras are often used under poor lighting conditions for security purposes or night time news gathering. Videos shot under poor lighting conditions are afflicted by significant noise which degrades the image quality. The process of noise removal from videos is called noise reduction (NR). Although many NR methods are proposed, they are complex and are proposed as computer simulations. In practical applications, NR processing of videos occurs in real-time. The practical real-time methods are limited and the complex NR methods cannot cope with real-time processing. Video has three dimensions: horizontal, vertical and temporal. Since the temporal relation is stronger than that of horizontal and vertical, the conventional real-time NR methods use the temporal relation to reduce noise. This approach is known as the inter-frame relation, and the noise reducer comprises a temporal recursive filter. Temporal recursive filters are widely used in digital TV sets to reduce the noise affecting images. Although the temporal recursive filter is a simple algorithm, moving objects leave trails when it reduces the high-level noise. In this paper, a novel NR algorithm that does not suffer from this trail issue and shows better performance than NR using temporal recursive filters is proposed.

Paper Nr: 14
Title:

First Experiments on Speaker Identification Combining a New Shift-invariant Phase-related Feature (NRD), MFCCs and F0 Information

Authors:

Aníbal Ferreira

Abstract: In this paper we report on a number of speaker identification experiments that assume a phonetic-oriented segmentation scheme exists such as to motivate the extraction of psychoacoustically-motivated phase and pitch related features. MFCC features are also considered for benchmarking. An emphasis is given to an innovative shift-invariant phase-related feature that is closely linked to the glottal source. A very simple statistical modeling is proposed and adapted in order to highlight the relative discrimination capabilities of different feature types. Results are presented for individual features and a discussion is also developed regarding possibilities of fusing features at the speaker modeling stage, or fusing distances at the speaker identification stage.

Short Papers
Paper Nr: 9
Title:

Quality Assessment of JPEG-distorted Face Images: Influence of Affective Content

Authors:

Silvia Corchs, Gianluigi Ciocca and Francesca Gasparini

Abstract: In this work we investigate if the affective content of images influences the perception of image quality. Two database are generated and psychophysical experiments are conducted, where participants rate the stimuli in a five point Likert scale. We have fixed the semantic content, choosing only close-ups of face images, two emotion categories (happy and sad images) and JPEG-distortion. Also the influence of the background is considered. From the analysis of the subjective data we observe that the influence of affective content is more evident for images of very high or very low quality. The subjective scores are further used as ground-truth labels to train a five quality-class classifier. Two different feature spaces are used (visual features and quality metrics) to train a SVM classifier.

Paper Nr: 10
Title:

Stress Detection Through Speech Analysis

Authors:

Kevin Tomba, Joel Dumoulin, Elena Mugellini, Omar Abou Khaled and Salah Hawila

Abstract: The work presented in this paper uses speech analysis to detect candidates stress during HR (human resources) screening interviews. Machine learning is used to detect stress in speech, using the mean energy, the mean intensity and Mel-Frequency Cepstral Coefficients (MFCCs) as classification features. The datasets used to train and test the classification models are the Berlin Emotional Database (EmoDB), the Keio University Japanese Emotional Speech Database (KeioESD) and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The best results were obtained with Neural Networks with accuracy scores for stress detection of 97.98% (EmoDB), 95.83% (KeioESD) and 89.16% (RAVDESS).

Posters
Paper Nr: 13
Title:

Multimodal Classification of Sexist Advertisements

Authors:

Francesca Gasparini, Ilaria Erba, Elisabetta Fersini and Silvia Corchs

Abstract: Advertisements, especially in online social media, are often based on visual and/or textual persuasive messages, frequently showing women as subjects. Some of these advertisements create a biased portrays of women, finally resulting as sexist and in some cases misogynist. In this paper we give a first insight in the field of automatic detection of sexist multimedia contents, by proposing both a unimodal and a multimodal approach. In the unimodal approach we propose binary classifiers based on different visual features to automatically detect sexist visual content. In the multimodal approach both visual and textual features are considered. We created a manually labeled database of sexist and non sexist advertisements, composed of two main datasets: a first one containing 423 advertisements with images that have been considered sexist (or non sexist) with respect to their visual content, and a second dataset comprising 192 advertisements labeled as sexist and non sexist according to visual and/or textual cues. We adopted the first dataset to train a visual classifier. Finally we proved that a multimodal approach that considers the trained visual classifier and a textual one permits good classification performance on the second dataset, reaching 87% of recall and 75% of accuracy, which are significantly higher than the performance obtained by each of the corresponding unimodal approaches.

Area 4 - Multimedia Systems and Applications

Full Papers
Paper Nr: 12
Title:

Novelty and Diversity in Image Retrieval

Authors:

Simone Santini

Abstract: This paper studies the formalization and the use of the concepts of novelty and diversity to diversify the result set of a multimedia query, avoiding the presence of uninformative results. First, we review and adapt several diversity measures proposed in the information retrieval literature. The problem of maximizing diversity being NP-complete, we propose a general greedy algorithm (dependent on a scoring function) for finding an approximate solution, and instantiate it using three scenarios: a probabilistic one, a fuzzy one, and a geometric one. Finally, we perform tests on two data sets, one in which retrieval is based on annotations and the other in which retrieval is purely visual.

Paper Nr: 25
Title:

Network Support for AR/VR and Immersive Video Application: A Survey

Authors:

Dongbiao He, Cedric Westphal and J. J. Garcia-Luna-Aceves

Abstract: Augmented Reality and Virtual Reality are rapidly gaining attention and are increasingly being deployed over the network. These technologies have large industrial potential to become next big platform with a wide range of applications. This experience will only be satisfying when the network infrastructure is able to support these applications. Current networks however, are still having a hard time streaming high quality videos. The advent of 5G Networks will improve the network performance, but it is unclear it will be sufficient to provide new applications delivering augmented reality and virtual reality services. There are few surveys on the topic of augmented reality systems, and their focus mostly stays on the actual displays and potential applications. We survey the literature on AR/VR networking, and we focus here on the potential underlying network issues.

Short Papers
Paper Nr: 4
Title:

Eye Tracking as a Method of Controlling Applications on Mobile Devices

Authors:

Angelika Kwiatkowska and Dariusz Sawicki

Abstract: The possibility of using eye tracking in multimodal interaction is discussed. Nowadays, communication by eye movements can be both natural and intuitive. The main goal of the present work was to develop a method which allows for controlling a smartphone application by using eye movements. The designed software was based on the Open Source Computer Vision Library (OpenCV) and dedicated for Android system. We conducted two sets of tests: usability tests of the new solution, and tests on how the methods of template matching affect the operation of the device. The results, obtained by testing a small group of people, showed that the application meets all stated expectations.

Paper Nr: 17
Title:

Localization of Visitors for Cultural Sites Management

Authors:

F. Ragusa, L. Guarnera, A. Furnari, S. Battiato, G. Signorello and G. M. Farinella

Abstract: We consider the problem of localizing visitors in a museum from egocentric (first person) images. Localization information can be useful to both assist the user during his visit (e.g., by suggesting where to go and what to see next) and to provide behavioral information to the manager of the museum (e.g., how much time has been spent by visitors at a given location?). To address the problem, we have considered a dataset of egocentric videos acquired using two cameras: a head-mounted HoloLens and a chest-mounted GoPro. We performed experiments exploiting a state-of-the-art method for room-based temporal segmentation of egocentric videos. Experiments pointed out that compelling information can be extracted to serve both the visitors and the site-manager. A web interface has been developed to provide a tool useful to manage the cultural site and to perform analysis of the videos acquired by visitors. Also a digital summary is generated as additional service for the visitors providing “sharable” memories of their experience.

Posters
Paper Nr: 20
Title:

EEG Data of Face Recognition in Case of Biological Compatible Changes: A Pilot Study on Healthy People

Authors:

Aurora Saibene, Silvia Corchs, Roberta Daini, Alessio Facchin and Francesca Gasparini

Abstract: Recognizing people from their faces has a strong impact on social interaction. In this paper we present a pilot study on healthy people where brain activities during a face recognition task have been recorded using electroencephalogram (EEG). Target images (previously seen in a training phase), were presented in the recognition phase in two different conditions: identical to those of the initial phase, modified with biologically plausible changes (such as features enlargement or changed expression) and randomly presented with new faces. The raw EEG data were properly cleaned from both biological or non-physiological artifacts. Statistically significant differences in brain activations were registered between the two experimental conditions, especially in the frontal area, during the recognition process. The results of the analysis on this database of healthy people can be useful as baseline for further studies on people affected by congenital prosopagnosia or autism.