SIGMAP 2008 Abstracts
CONFERENCE
Area 1 - Multimedia Communications
Area 2 - Multimedia Signal Processing
Area 3 - Multimedia Systems and Applications
Title:
SUFFIX ARRAYS - A Competitive Choice for Fast Lempel-Ziv Compressions
Author(s):
Artur J. Ferreira, Arlindo L. Oliveira and Mário A. T. Figueiredo
Abstract:
Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple data structure, named suffix array, to hold the dictionary of the LZ encoder, and propose an algorithm to search the dictionary. A comparison with the suffix tree based LZ encoder is carried out, showing that the compression ratios are roughly the same. The ammount of memory required by the suffix array is fixed, being much lower than the variable memory requirements of the suffix tree encoder, which depends on the text to encode. We conclude that suffix arrays are a very interesting option regarding the tradeoff between time, memory, and compression ratio, when compared to the use of suffix trees, that make them preferable in some compression scenarios.

Title:
CONTEXT-AWARE HOARDING OF MULTIMEDIA CONTENT IN A LARGE-SCALE TOUR GUIDE SCENARIO - A Case Study on Scaling Issues of a Multimedia Tour Guide
Author(s):
J. Köpke, R. Tusch, H. Hellwagner and L. Böszörmenyi
Abstract:

This paper discusses scaling issues of a mobile multimedia tour guide. Making tourist-information available in a substantially large geographical area (e.g. a federal state in Austria) raises new questions, compared to providing similar information in a limited area (such as a museum). First, we have to assume a heterogeneous network infrastructure containing high and low bandwidth links and even total network loss. Video streaming is therefore not possible at any place. Secondly, the total amount of data grows linearly to the number of Points of Interest (POIs) which are augmented by the tour guide. Therefore, a preloading of all data onto a device with limited storage is not possible. A possible solution to these problems is hoarding, i.e. preloading an ”appropriate” subset of data. The crucial question is to find the proper subset in dependence of the actual context. The paper discusses the questions of (1) what kind of context information should be considered and (2) what kind of usage patterns can be assumed. Based on these considerations hoarding strategies are developed for the tour guide. The strategies are finally evaluated with real-world data from a federal state wide tourist-card system.


Title:
AN ADAPTIVE SPATIAL ERROR CONCEALMENT FOR H.264/AVC VIDEO STREAM
Author(s):
Jun Wang, Lei Wang, Takeshi Ikenaga and Satoshi Goto
Abstract:
Transmission of compressed video over error prone channels may result in packet losses or errors, which can significantly degrade the image quality. Therefore an error concealment scheme is applied at the video receiver side to mask the dam-aged video. Considering there are 3 types of MBs (Macro Blocks) in natural video frame, i.e., Textural MB, Edged MB, and Smooth MB, this paper proposed an adaptive spatial error concealment which can choose 3 different methods for these 3 different MBs. For criterion of choosing, 2 factors are taken into consideration. Firstly, standard deviation of our proposed edge statistical model is exploited. Secondly, some new fea-tures of latest video compression standard H.264/AVC, i.e., intra prediction mode is also considered for criterion formula-tion. Compared with previous work which are only based on deterministic measurement, proposed method achieves more accurate MB type decision which leads to better image recov-ery. Subjective and objective image quality evaluations in experiments confirmed this.

Title:
QoS IMPROVEMENTS RESULT FROM TCP/RLC AND MAC IN A MOBILE CHANNEL
Author(s):
Jahangir Dadkhah Chimeh, Mohammad Hakkak, Hamidreza Bakhshi and Paeiz Azmi
Abstract:
Mobile telecommunication new services are based on data networks specially Internet. These services include http, telnet, ftp, Simple Mail Transfer Protocol (SMTP), etc. Besides we recognize a mobile network as a multi-user network. Transmission Control Protocol/Internet Protocol (TCP/IP) which is sensitive to link congestion in wireline data links is also used in wireless networks. In order to improve the system performance, the TCP layer uses flow control and congestion control. Besides, Radio Link Control (RLC) has been introduced to compensate the deficiency of TCP layer in wireless environment. MAC/RLC have important roles in quality of service enhancement of UMTS. In this paper we review the protocol stack of UTRAN (UMTS Terrestrial Radio Access Network) which is based on Third Generation Partnership Project (3GPP). Then we evaluate its layer two error control mechanisms and verify TCP over Automatic Repeat reQuest (ARQ) error control mechanism and finally quality of service improvement results from it in fading channels.

Title:
FAR-END CROSSTALK IN ITERATIVELY DETECTED MIMO-OFDM TWISTED PAIR TRANSMISSION SYSTEMS
Author(s):
Andreas Ahrens and Christoph Lange
Abstract:

Crosstalk between neighbouring wire pairs in multi-pair copper cables is an important disturbance, which essentially limits the transmission quality and the throughput of such cables. For high-rate transmission, often the strong near-end crosstalk (NEXT) disturbance is avoided or suppressed and only the far-end crosstalk (FEXT) remains as crosstalk influence. In this contribution the effect of far-end crosstalk (FEXT) in iteratively detected MIMO-OFDM transmission schemes is studied. EXIT (extrinsic information transfer) charts are used for analyzing and optimizing the convergence behaviour of the iterative demapping and decoding.


Title:
NOVEL DIGITAL DIFFERENTIATOR AND CORRESPONDING FRACTIONAL ORDER DIFFERENTIATOR MODELS
Author(s):
Maneesha Gupta, Pragya Varshney, G. S. Visweswaran and B. Kumar
Abstract:
This paper proposes a novel first order digital differentiator. The differentiator is obtained by linear mixing of Al-Alaoui operator [1] and wide band differentiator [2]. MATLAB simulation results of the proposed differentiator for various sampling frequencies have been presented. The magnitude results are in close conformity to the theoretical results for approximately 78% of the full range. The phase of the new differentiator is almost linear, with a maximum phase error of 8.24º. We have also proposed new operator based fractional order differentiator models. These models are obtained by performing the Taylor series expansion and continued fraction expansion of the proposed operator. Comparisons of the suggested models with the existing models of half differentiators show perceptible improvement in performance of the fractional order circuit. MATLAB simulation results show that the magnitude response of the proposed half differentiator matches with the theoretical results of continuous-time domain half differentiator for almost the whole frequency range and the phase approximates a constant group delay which is desirable for many applications.

Title:
ACCURACY ANALYSES OF PASSIVE TRACKING OF SEVERAL CLICKING SPERM WHALES - A Case of Complex Sources Binding
Author(s):
Frédéric Caudal and Hervé Glotin
Abstract:
This paper provides a real-time passive underwater acoustic method to track multiple emitting whales using four or more omni-directional widely-spaced bottom-mounted hydrophones and to evaluate the performance of the system via the Cramér-Rao Lower Bound (CRLB) andMonte Carlo simulations. After a non-parametric Teager-Kaiser-Mallat signal filtering, rough Time Delays Of Arrival are calculated, selected and filtered, and used to estimate the positions of whales for a constant or linear sound speed profile. The complete algorithm is tested on real data from the NUWC and the AUTEC. The CRLB andMonte Carlo simulations are computed and compared with the tracking results. Our model is validated by similar results from the US Navy and Hawaii univ labs in the case of one whale, and by similar whales counting from the Columbia univ. ROSA lab in the case of multiple whales. At this time, our tracking method is the only one giving typical speed and depth estimations for multiple (5) emitting whales located at 1 to 5 km from the hydrophones.

Title:
STATIC FEATURES IN ISOLATED VOWEL RECOGNITION AT HIGH PITCH
Author(s):
Aníbal Ferreira
Abstract:

Vowel recognition is frequently based on Linear Prediction (LP) analysis and formant estimation techniques. However, the performance of these techniques decreases in the case of female or child speech because at high pitch frequencies (F0) the magnitude spectrum is scarcely sampled making formant estimation unreliable.
In this paper we describe the implementation of a perceptually motivated concept of vowel recognition that is based on Perceptual Spectral Clusters (PSC) of harmonic partials. PSC based features were evaluated in automatic recognition tests using the Mahalanobis distance and using a data base of five natural Portuguese vowel ounds uttered by 44 speakers, 27 of whom are child speakers. LP based features and Mel-Frequency Cepstral Coefficients (MFCC) were also included in the tests as a reference. Results show that while the recognition performance of PSC features falls between that of LP based features and that of MFCC coefficients, the normalization of PSC features by F0 increases the performance and approaches that of MFCC coefficients. PSC features are not only amenable to a psychophysical interpretation (as LP based features are) but have also the potential to compete with global shape features.


Title:
CONFIGURABLE VLSI ARCHITECTURE OF A GENERAL PURPOSE LIFTING-BASED WAVELET PROCESSOR
Author(s):
Andre Guntoro, Hans-Peter Keil and Manfred Glesner
Abstract:
The richness of wavelet transformation has been known in many fields. There exist different classes of wavelet filters that can be used depending on the application. In this paper, we propose a general purpose lifting-based wavelet processor that can perform various forward and inverse DWTs. Our architecture is based on NxM PEs which can perform either prediction or update on a continuous data stream in every clock cycle. We also consider the normalization step which takes place at the end of the forward DWT or at the beginning of the inverse DWT. To cope with different wavelet filters, we feature a multi-context configuration to select among various DWTs. For the 16-bit implementation, the estimated area of the proposed wavelet processor with 2x8~PEs configuration in a 0.18-$\mu$m technology is 1.8~mm~square and the estimated frequency is 355~MHz.

Title:
FACE DETECTION USING DISCRETE GABOR JETS AND COLOR INFORMATION
Author(s):
Ulrich Hoffmann, Jacek Naruniec, Ashkan Yazdani and Touradj Ebrahimi
Abstract:

Face detection allows to recognize and detect human faces and provides information about their location in a given image. Many applications such as biometrics, face recognition, and video surveillance employ face detection as one of their main modules. Therefore, improvement in the performance of existing face detection systems and new achievements in this field of research are of significant importance. In this paper a hierarchical classification approach for face detection is presented. In the first step, discrete Gabor jets (DGJ) are used for extracting features related to the brightness information of images and a preliminary classification is made. Afterwards, a skin detection algorithm, based on modeling of colored image patches, is employed as a post-processing of the results of DGJ-based classification. It is shown that the use of color efficiently reduces the number of false positives while maintaining a high true positive rate.Finally, a comparison is made with the OpenCV implementation of the Viola and Jones face detector and it is concluded that higher correct classification rates can be attained using the proposed face detector.


Title:
PARTIAL TRACKING IN SINUSOIDAL MODELING - An Adaptive Prediction-based RLS Lattice Solution
Author(s):
Leonardo O. Nunes, Paulo A. A. Esquef, Luiz W. P. Biscainho and Ricardo Merched
Abstract:

Partial tracking plays an important role in sinusoidal modeling analysis, being the stage in which the model parameters are obtained. This is accomplished by coherently grouping the spectral peaks found in each frame into time-evolving tracks of varying frequency and amplitude. The main difficulties faced by partial tracking algorithms are the analysis of polyphonic signals and the pursuit of tracks exhibiting strong modulations in frequency and amplitude. In these circumstances, linear prediction over the trajectory of a given track has been shown to improve partial tracking performance. This paper proposes the an adaptive RLS lattice filter for the purpose of prediction in partial tracking. A new heuristic which certifies the filter convergence is also presented. Computer simulation results are shown to compare the proposed implementation with that of other predictors. The performance of the proposed solution is similar to that of competing methods, albeit with reduced computational complexity as well as improved numerical stability.


Title:
A ROBUST SPEECH COMMAND RECOGNIZER FOR EMBEDDED APPLICATIONS
Author(s):
Alexandre Maciel, Arlindo Veiga, Cláudio Neves, José Lopes, Carla Lopes, Fernando Perdigão and Luís Sá
Abstract:
This paper describes a command-based robust speech recognition system for the Portuguese language. Due to an efficient noise reduction algorithm the system can be operated in adverse noise environments such as in cars or factories. The recognizer was trained and tested with a speech database with 250 commands spoken by 345 speakers in clean and noisy conditions. The system incorporates a user friendly application programming interface and was optimized for embedded platforms with limited computational resources. Performance tests for the recognizer are presented.

Title:
MOTION CAPTURE FOR 3D DATABASES - Overview of Methods for Motion Capture in 3D Databases
Author(s):
Dalibor Lupínek and Martin Drahanský
Abstract:
Motion Capture is a modern method that is commonly used in animation and augmented reality. In spite of it being quite new to this date, there exists a large variety of functional systems working on all sorts of bases. Concept of this paper is to provide a preview describing the basic description potential of Motion Capture systems that are widely used or represent a promising future. In addition, this paper presents an overview of a new system, which is now in development.

Title:
NOISE REDUCTION BASED ON CROSS TF ε-FILTER
Author(s):
Tomomi Abe, Mitsuharu Matsumoto and Shuji Hashimoto
Abstract:

A time-frequency ε-filter(TF ε-filter) is an advanced ε-filter applying to complex spectra along the time axis. It can reduce most kinds of noise while preserving a signal that varies frequently such as a speech signal. The filter design is simple and it can effectively reduce noise. It is applicable not only to small amplitude stationary noise but also to large amplitude nonstationary noise. However when we consider the noise that varies much frequently along the time axis, TF ε-filter cannot reduce noise without the signal distortion. When we consider the noise where the neighboring frequency bins have similar powers such as impulse noise, we can reduce the noise by using ε-filter applying to the complex spectra not along the time axis, but along the frequency axis. This paper introduces an advanced method for noise reduction that applies ε-filter to complex spectra not only along the time axis but also along the frequency axis labeled cross TF ε-filter. We conducted the experiments utilizing the sounds with stationary, nonstationary and natural noise.


Title:
CLASSIFICATION OF MOTOR IMAGINARY TASKS USING ADAPTIVE RECURSIVE BANDPASS FILTER - Effective Classification for Motor Imaginary BCI
Author(s):
Vickneswaran Jeyabalan, Andrews Samraj and Loo Chu Kiong
Abstract:
The noteworthy point in the advancement of Brain Computer Interface research is not only to develop new technology but also to adopt the easiest procedures since the expected beneficiaries are of disabled in nature. The nature of the locked in patients is that possess strong mental ability in thinking and understanding but they are extremely unable to express their views. Imagination is possible for all most all the Licked-in patients and hence any BCI not relay on finger movement or other muscle activity is definitely an added advantage in this arena. This paper deals with our research findings that are obtained by a combination of frequency shifting and segmentation techniques on such Motor imaginary signals extracted from the left and right cortex of human brain. The signals are captured from the C3, C4, and Cz channels of the scalp electrodes and are band pass filtered to exposed mu rhythm. The results of classification using a simple threshold articulate the effectiveness of our proposed technique. The best results were found in the latency range of 3 to 9 seconds of the imagination and this proves the existing neuro science knowledge by previous studies.

Title:
TIME DOMAIN ATTACK AND RELEASE MODELING - Applied to Spectral Domain Sound Synthesis
Author(s):
Cornelia Kreutzer, Jacqueline Walker and Michael O'Neill
Abstract:

We introduce a time-domain model for the synthesis of attack and release parts of musical sounds. This approach is an extension of a spectral synthesis model we developed: the Reduced Parameter Synthesis Model (RPSM). The attack and release model is independent from a preceding spectral analysis as it is based on the time domain sustain part of the sound. We apply a shaping function to this sustain part to obtain the sound attack and the release. The model has been tested with linear and polynomial shaping functions and produces good results for three different instruments. The time-domain approach also overcomes the problem of synthesis artefacts that often occur when using spectral analysis/synthesis methods for sounds with transient events. Moreover, the model can be combined with any synthesis model of the sustain part and offers the possibility to determine the duration of the attack and release parts of the sound.


Title:
SIGNAL-DEPENDENT ANALYSIS OF SIGNALS SAMPLED BY SEND ON DELTA SAMPLING SCHEME
Author(s):
M. Greitans and R. Shavelis
Abstract:

The interest about the application of signal driven sampling schemes is increased. They offer various advantages over traditional sampling. The paper presents principles of send-on-delta concept. The obtained samples are spaced non-uniformly, thus an advanced method should be employed for processing. The paper discusses the method for spectral analysis of signal, which is sampled using send-on-delta principle. In such a way, it is possible to decrease the sampling density. Non-uniform location of send-on-delta events suppresses the distortion due to frequency aliasing. The method is based on minimum variance filter. To improve the resolution and accuracy, iterative updating of autocorrelation matrix is used. The performance of such approach is demonstrated by computer simulations. The proposed approach can be interesting for distributed wireless data acquisition in remote sensing, because it allows to sample signal with decreased density.


Title:
CONSTANT BITRATE CONTROL FOR A DISTRIBUTED VIDEO CODING SYSTEM
Author(s):
Mariusz Jakubowski, João Ascenso and Grzegorz Pastuszak
Abstract:
In some distributed video coding (DVC) systems, the total bitrate depends mainly on the key frames (Intra coded) quality and on the side information accuracy. In this paper, a rate control (RC) mechanism is proposed to achieve and maintain a certain target bitrate for the overall Intra and WZ bitstream, mainly by adjusting online the Intra frames quality through the quantization parameter (QP). In order to obtain a similar decoded quality of Intra and WZ frames, the relevant parameters: QP for the key frames and the quantization index (QIndex) for WZ frames are controlled jointly. The major novelty of this work is a statistical model that expresses the relationship between QIndex and WZ frames bitrate. The proposed rate control solution is integrated into the VISNET2 WZ codec and the experimental results demonstrate the efficiency of the proposed algorithm to reach and maintain the target bitrate.

Title:
BIRTH-DEATH FREQUENCIES VARIANCE OF SINUSOIDAL MODEL - A New Feature for Audio Classification
Author(s):
Shahrokh Ghaemmaghami and Jalil Shirazi
Abstract:
In this paper, a new feature set for audio classification is presented and evaluated based on sinusoidal modeling of audio signals. Variance of the birth-death frequencies in sinusoidal model of signal, as a measure of harmony, is used and compared to typical features as the input into an audio classifier. The performance of this sinusoidal model feature is evaluated through classification of audio to speech and music using both the GMM and the SVM classifiers. Classification results show that the proposed feature is quite successful in speech/music classification. Experimental comparisons with popular features for audio classification, such as HZCRR and LSTER, are presented and discussed. By using a set of three features, we achieved 96.83% accuracy in the audio classification.

Title:
AN IMPROVED STEGANOGRAPHIC METHOD
Author(s):
Hyoung Joong Kim and Amiruzzaman Md
Abstract:
An improved steganographic method is proposed in this paper. Two distinct methods are combined here with optimized way with possibly high data hiding capability. First, the proposed method shifts the last nonzero AC coefficients in each JPEG block, and, second, changes the magnitude value of the first nonzero AC coefficients.

Title:
BIMODAL QUANTIZATION OF WIDEBAND SPEECH SPECTRAL INFORMATION
Author(s):
Driss Guerchi
Abstract:

In this work we introduce an efficient method to reduce the coding rate of the spectral information in an algebraic code-excited linear prediction (ACELP) wideband codec. The Bimodal Vector Quantization (BMVQ) exploits the interframe correlation in spectral information to reduce the coding rate while maintaining high coded speech quality. In the BMVQ training phase, two codebooks are separately designed for voiced and unvoiced speech. For each speech frame, the optimal codebook for the search procedure is selected according to the interframe correlation of the spectral information. The BMVQ was successfully implemented in an ACELP wideband coder. The objective and subjective performance were found to be comparable to that of the combination of the split vector quantization and multistage vector quantization at 2.3 kbit/s.


Title:
IMPROVEMENT OF THE SIMPLIFIED FTF-TYPE ALGORITHM
Author(s):
Madjid Arezki, Ahmed Benallal, Abderezak Guessoum and Daoud Berkani
Abstract:
In this paper, we propose a new algorithm M-SMFTF which reduces the complexity of the simplified FTF-type (SMFTF) algorithm by using a new recursive method to compute the likelihood variable. The computational complexity was reduced from 7L to 6L, where L is the finite impulse response filter length. Furthermore, this computational complexity can be significantly reduced to (2L+4P) when used with a reduced P-size forward predictor. Finally, some simulation results are presented and our algorithm shows an improvement in convergence over the normalized least mean square (NLMS).

Title:
A COMPOUND IMAGE ENCODER BASED ON THE MULTISCALE RECURRENT PATTERN ALGORITHM
Author(s):
Nelson C. Francisco, Ricardo N. R. Sardo, Nuno M. M. Rodrigues, Eduardo A. B. da Silva, Murilo B. de Carvalho, Sérgio M. M. Faria, Vitor M. M. da Silva and Manuel J. C. S. Reis
Abstract:

In this paper we present the current state of the project SCODE (Scanned COmpound Document Encoder). The objective of this project is the development of a new image application, based on the Multidimensional Multiscale Parser algorithm (MMP), for compression of scanned documents, composed by pictures, graphs and text.
MMP is a generic compression algorithm that has been successfully applied in image coding. The use of a multiscale adaptive pattern matching coding paradigm allows it to achieve good results, consistently, for both smooth and text images. On the contrary, the traditional transform-based methods have a well known performance deficit for non-smooth image coding.
Current state-of-the-art compound image coding schemes rely on the use of segmentation techniques to split foreground and background planes of an input image.
The performance of such methods, generally, degrades with the loss of efficiency of the segmentation process, namely for complex documents or low quality scans.
These losses result from the use of transform-based compression for the background layer, like in DjVu or JPEG2000/Part6. The flexibility of MMP algorithm makes it efficiency independent of the segmentation process. Our experimental results show that MMP already outperforms some state-of-the-art algorithms, thus proving its usefulness as a compound image encoding algorithm.
In this paper we present the current results and the developed coding schemes, as well as an overview on the future work for this project.


Title:
HMM INVERSION WITH FULL AND DIAGONAL COVARIANCE MATRICES FOR AUDIO-TO-VISUAL CONVERSION
Author(s):
Lucas D. Terissi and Juan C. Gómez
Abstract:
A speech driven MPEG-4 compliant facial animation system is proposed in this paper. The main feature of the system is the audio-to-visual conversion based on the inversion of an Audio-Visual Hidden Markov Model. The Hidden Markov Model Inversion algorithm is derived for the general case of considering full covariance matrices for the audio-visual observations. A performance comparison with the more common case of considering diagonal covariance matrices is carried out. Experimental results show that full covariance matrices are preferable because the same performance, as in the case of using of diagonal matrices, can be achieved using a less complex model.

Title:
LANGUAGE MODEL BASED ON POS TAGGER
Author(s):
Bartosz Ziółko, Suresh Manandhar, Richard C. Wilson and Mariusz Ziółko
Abstract:
Language models are necessary for any large vocabulary speech recogniser. There are two main types of information which can be used to support modelling a language: syntactic and semantic. One of the ways to apply syntactic modelling is to use POS taggers. Morphological information can be statistically analysed to provide probability of a sequence of words using their POS tags. The results for Polish language modelling are presented.

Title:
APPROXIMATION OF 5-LIMIT JUST INTONATION - Computer MIDI Modeling in Negative Systems of Equal Divisions of the Octave
Author(s):
Mykhaylo Khramov
Abstract:

This paper for international conference on Signal Processing and Multimedia Applications (SigMAp 2008) originally is written in Russian. The English translation of the original text was performed by author. The paper is addressed to Topic Area 2 of SigMAp 2008. The matter is Music Processing by MIDI protocol during computer modeling of fixed tuning with non-traditional equal temperaments. Are touched temperaments, which are based on closed series of fifths, compressed relative conventional tuning. Is marked, that such systems, can better approach to just intonation and give inaccessible in a conventional temperament system sensation out of tuning during listening to music performed by scores with mistaken using of accidentals.


Title:
EMOTION ASSESSMENT TOOL FOR HUMAN-MACHINE INTERFACES - Using EEG Data and Multimedia Stimuli Towards Emotion Classification
Author(s):
Jorge Teixeira, Vasco Vinhas, Luís Paulo Reis and Eugénio Oliveira
Abstract:
The identification and assessment of human being emotional states belongs to one of the primordial objectives of the scientific research in disparate areas such as artificial intelligence, medicine or psychology. The main objective of this project is related to automatic assessment of a subject’s basic emotional states by using electroencephalography as a source for biometric data acquisition. This evaluation is based on predefined mechanisms of emotional induction, as well as specific methods and tools capable of data analysis and processing. From the experimental results attained in several experimental sessions and through the support tools developed, the most pertinent conclusion extracted from this work refers to the capability of effectively performing automatic classification of the subject’s predominant emotional state. The emotional conditions were induced through the presentation of specific visual multimedia contents. The success rate of this tool, compared against the self assessment interviews carried out immediately after the experimental session, was approximately 75%. It was also experimentally concluded that female subjects are emotionally more demonstrative than the male ones.

Title:
NEW TIME-FREQUENCY VOWEL QUANTIZATION ENHANCED BY SUBBAND RANKING
Author(s):
Fraihat Salam and Glotin Hervé
Abstract:

Some specificities of the speech signal may not well be addressed by the conventional speech processing. In this paper we focuses on a parcimonious representation of speech dynamics. We propose a novel coding strategy based on speech time-frequency quantization (TFQ) using simple Allen temporal interval algebra applied on subband voicing levels. Our compressed speech representation contains only 15 integers for a speech window up to 32 ms long. We evaluate the discrimination power of these features for vowel recognition (1 hour, 6 vowels) on a referenced radio broadcast news used during evaluation campaign ESTER. Preliminary model of independent speaker vowel identification, on the subset of the most frequent french vowels, using 15 integers TFQ features (CF of 6,4) gives an error reduction of 15,1% relatively to the random classifier, whereas the 48 float voicing level gives 31%. We adds frequency ranking features to allen features to improve our recognition scores. Further works to improve our parcimonious coding are then discussed.


Title:
DECORRELATION TECHNIQUES IN IMAGE RESTORATION
Author(s):
Catalina Cocianu, Luminita State, Panayiotis Vlamos and Doru Constantin
Abstract:
Image restoration methods are used to improve the appearance of an image by application of a restoration process that uses a mathematical model for image degradation. The restoration can be viewed as a process that attempts to reconstruct or recover an image that has been degraded by using some a priori knowledge about the degradation phenomenon. The advantages of using principal components reside from the fact that bands are uncorrelated and no information contained in one band can be predicted by the knowledge of the other bands, therefore the information contained by each band is maximum for the whole set of bits. The multiresolution support provides a suitable framework for noise filtering and image restoration by noise suppression. We present the algorithms GMNR, a generalization of the MNR algorithm based on the multiresolution support set for noise removal in case of arbitrary mean, and NFPCA. A comparative analysis of the performance of the algorithms GNMR and NFPCA is experimentally performed against the standard AMVR and MMSE. In the final section, we proposed the Compression Shrinkage Principal Components Algorithm (CSPCA) and its model-free version as shrinkage-PCA based methods for noise removal and image restoration.

Title:
PREDICTING BLOCKING EFFECTS IN THE SPATIAL DOMAIN USING A LEARNING APPROACH
Author(s):
Aladine Chetouani, Ghiles Mostafaoui and Azeddine Beghdadi
Abstract:
A new method for predicting blocking effect in the spatial domain is proposed. This method aims at estimating the appearance of blocking artefacts in the original image prior to compression for a given bit rate and a given compression technique. The basic idea is to use a training process in order to compute a visibility measure. A weighting function of the blocking effects is then derived from this training process performed on a database. The proposed method is objectively and subjectively evaluated on various actual images. The obtained results confirm the efficiency of the proposed method in predicting blocking effect.

Title:
GLOTTAL SOURCE ESTIMATION ROBUSTNESS - A Comparison of Sensitivity of Voice Source Estimation Techniques
Author(s):
Thomas Drugman, Thomas Dubuisson, Alexis Moinet, Nicolas D’Alessandro and Thierry Dutoit
Abstract:
This paper addresses the problem of estimating the voice source directly from speech waveforms. A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase. This technique is compared to two other state-of-the-art well-known methods, namely the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms. Decomposition quality is assessed on synthetic signals through two objective measures: the spectral distortion and a glottal formant determination rate. Technique robustness is tested by analyzing the influence of noise and Glottal Closure Instant (GCI) location errors. Besides impacts of the fundamental frequency and the first formant on the performance are evaluated. Our proposed approach shows significant improvement in robustness, which could be of a great interest when decomposing real speech.

Title:
MODELING OF REAL TIME VIDEO COMPRESSION SYSTEM - Three-dimensional Discrete Cosine Transform
Author(s):
Tomas Fryza
Abstract:
One of the methods used for the video signals' compression is the Three Dimensional Discrete Cosine Transform. The aim of this block-based method is to combine intraframe and interframe coding into a single transform coding, therefore no motion compensation and motion prediction have to be implemented. The paper deals with the practical ways of the 3-D DCT computing. It will be proof, the transform coding could be used for encoding of video sequences in real time domain.

Title:
H.264/SVC ROI ENCODING WITH SPATIAL SCALABILITY
Author(s):
Lino Ferreira, Luís Cruz and Pedro Assunção
Abstract:
This paper proposes two H.264/AVC compliant methods for encoding Regions-of-Interest (ROI) with spatial scalability and evaluates their respective rate-distortion-complexity performance. The base layer is kept unchanged and provides lower resolution images with roughly constant quality, without identification of the ROI. In the proposed methods there is no need to encode contour information because the ROI is implicitly defined in the upper layer of the spatial resolution in a transparent way by using different encoding parameters for the ROI and its complementary region. It is shown, that spatial scalability in ROI can be efficiently used to enhance specific regions of an image sequence in both spatial resolution and quality with low coding complexity. The proposed encoding scheme is suitable for remote surveillance, medical applications and entertainment, where higher resolution and higher quality ROI is a useful functionality for object/face recognition, selective encryption, detail analysis, etc.

Title:
BIOMETRIC ACREDITATION ENTITIES - An Approach for Web Acreditation Services
Author(s):
B. Ruiz, L. Puente, D. Carrero and M. J. Poza
Abstract:
Identity verification is nowadays a crucial task for security applications. In the near future organizations dedicated to store individual biometric information will emerge in order to determine individual identity. Biometric authentication is currently information intensive. The volume and diversity of new data sources challenge current database technologies. Biometric identity heterogeneity arises when different data sources interoperate. New promising application fields such as the Semantic Web and Semantic Web Services can leverage the potential of biometric identity, even though heterogeneity continues rising. Semantic Web Services provide a platform to integrate the lattice of biometric identity data widely distributed both across the Internet and within individual organizations. In this paper, we present a framework for solving biometric identity heterogeneity based on Semantic Web Services. We use a multimodal fusion recognition scenario as a test-bed for evaluation.

Title:
ACCURATE AUTOMATIC SPOT ADDRESSING FOR MICROARRAY IMAGES
Author(s):
Mónica G. Larese and Juan C. Gómez
Abstract:
In this paper a novel procedure based on texture spatial characterization techniques is proposed aimed at automatically addressing spots in microarray images. The algorithm relies on the regular and pseudo-periodic patterns of spots, which can be considered as texture primitives. A fully automatic procedure is proposed to segment the autocorrelation functions of subgrid images and accurately determine the locations of the peaks. These candidate peaks, \emph{i.e.}, vectors, are next used to compute the displacement vectors that fully characterize the spatial arrangement of spots, describing the spot spacing and angle of rotation of the pattern. A refinement procedure is then applied to improve the accuracy of the norms and angles of the displacement vectors. An ideal template is generated using the computed spanning vectors, which is superimposed over the real grid. This template is deformed and adjusted via Markov Random Fields (MRF) modelling. Experiments based on artificial and real images are promising, showing improvements regarding robustness against image rotations, and accuracy, over results provided by state-of-the-art methods.

Title:
ROTATION INVARIANT FEATURE EXTRACTION FOR WATERMARKING
Author(s):
M. Scagliola and P. Guccione
Abstract:

Many watermarks for still images are robust against common signal processing techniques, mainly JPEG compression, noise adding and low-pass filtering, while they are sensitive w.r.t. geometrical manipulations,inducing desynchronization errors.
In this paper robustness against global geometric transformations is achieved using the proposed feature extraction method, based on the Radon transform, whose aim is to identify an unique (and robust) feature from the image spectrum. The embedding, which exploits the extracted feature, is based on a multiplicative rule technique and is applied on a suitable subset of the image Fourier transform. The properties of the extracted feature allows to resynchronize the detector and the embedded watermark even if the image undergoes geometric manipulations (in particular rotations) as well as other processings, so that the correct watermark retrieving is guaranteed.
Experimental results, lead on many standard images, confirm the effectiveness of the feature extraction method and the robustness of the watermark against both processing and geometric transformations.


Title:
A PROTOTYPE FOR PRACTICAL EYE-GAZE CORRECTED VIDEO CHAT ON GRAPHICS HARDWARE
Author(s):
Maarten Dumont, Steven Maesen, Sammy Rogmans and Philippe Bekaert
Abstract:
We present a fully functioning prototype to convincingly restore eye contact between two video chat participants, with a minimal amount of constraints. The proposed six-fold camera setup is easily integrated into the monitor frame, and is used to interpolate an image as if its virtual camera captured the image through a transparent screen. The peer user has a large freedom of movement, resulting in system specifications that enable genuine practical usage. Our software framework thereby harnesses the powerful computational resources inside graphics hardware, to achieve real-time performance up to 30 frames per second for 800 x 600 resolution images. Furthermore, an optimal set of finetuned parameters are presented, that optimizes the end-to-end performance of the application, and therefore is still able to achieve high subjective visual quality.

Title:
ESTIMATING H.264/AVC VIDEO PSNR WITHOUT REFERENCE - Using the Artificial Neural Network Approach
Author(s):
Martin Slanina and Václav Rícný
Abstract:

This paper presents a method capable of estimating peak signal-to-noise ratios (PSNR) of digital video sequences compressed using the H.264/AVC algorithm. The idea is in replacing a full reference metric - the PSNR (for whose evaluation we need the original as well as the processed video data) - with a no reference metric, operating on the encoded bit stream only. As we are working just with the encoded bit stream, we can spare a significant amount of computations needed to decode the video pixel values. In this paper, we describe the network inputs and network configurations, suitable to estimate PSNR in intra and inter predicted pictures. Finally, we make a simple evaluation of the proposed algorithm, having the correlation coefficient of the real and estimated PSNRs as the measure of optimality.


Title:
METHOD OF INTER-WORKING BETWEEN IMS AND NON-IMS (GOOGLE TALK) NETWORKS FOR MULTIMEDIA SERVICES
Author(s):
Zhongwen Zhu and Richard Brunner
Abstract:
With the evolution of third generation network, more and more multimedia services are developed and deployed. Any new service to be deployed in IMS network is required to inter-work with existing Internet communities or legacy terminal users in order to appreciate the end users, who are the main factors for the service to succeed. The challenge for Inter-working between IMS and non-IMS network is “how to handle recipient’s address”. This is because each network has its own routable address schema. For instance, the address for Google user is xmpp:xyz@google.com, which is un-routable in IMS network. Hereafter a new Inter-working (IW) solution between IMS and non-IMS network is proposed for multimedia services that include Instant Messaging, Chat, and File transfer, etc. It is an end-to-end solution built on IMS infrastructure. The Public Service Identity defined in 3GPP is used to allow terminal clients to allocate this IW service. When sending the SIP request out for multimedia services, the terminal includes the recipient’s address in the payload instead of the “Request-URI” header. In the network, the proposed solution provides the mapping rules among different networks. The detailed technical description and the corresponding use cases are present. The benefits with the proposed solution are discussed.

Title:
REVERBERATION ASSESSMENT IN AUDIOBAND SPEECH SIGNALS FOR TELEPRESENCE SYSTEMS
Author(s):
A. A. de Lima, F. P. Freeland, P. A. A. Esquef, L. W. P. Biscainho, B. C. Bispo, R. A. de Jesus, S. L. Netto, R. Schafer, A. Said, B. Lee and A. Kalker
Abstract:

Modern telepresence systems constitute a new challenge for quality assessment of multimedia signals. This paper focuses on the evaluation of the reverberation impairment for audioband speech signals. A review on the reverberation effect is presented, with emphasis given on the mathematical modeling of its components, including early reflections and late reverberation. A subjective test for evaluating the human perception of the reverberation phenomenon is completely described, from its conception to the final results. Analyses are provided comparing the average subjective grades to current quality-evaluation standards for speech and audio signals. It is observed that the PESQ and PEAQ objective algorithms constitute interesting starting points for developing an objective method for measuring the reverberation effect on speech signals.


Title:
STAFF LINE DETECTION AND REMOVAL WITH STABLE PATHS
Author(s):
Artur Capela, Ana Rebelo, Jaime S. Cardoso and Carlos Guedes
Abstract:
Many music works produced in the past are still currently available as original manuscripts or as photocopies. Preserving them entails their digitalization and consequent accessibility in a machine-readable format, which encourages browsing, retrieval, search and analysis while providing a generalized access to the digital material. Carrying this task manually is very time consuming and error prone. While optical music recognition (OMR) systems usually perform well on printed scores, the processing of handwritten music by computers remains below the expectations. One of the fundamental stages to carry out this task is the detection and subsequent removal of staff lines. In this paper we integrate a general-purpose, knowledge-free method for the automatic detection of staff lines based on stable paths, into a recently developed staff line removal toolkit. Lines affected by curvature, discontinuities, and inclination are robustly detected. We have also developed a staff removal algorithm adapting an existing line removal approach to use the stable path algorithm at the detection stage. Experimental results show that the proposed technique outperforms well-established algorithms. The developed algorithm will now be integrated in a web based system providing seamless access to browsing, retrieval, search and analysis of inserted scores.

Title:
A RANDOM CONSTRAINED MOVIE VERSUS A RANDOM UNCONSTRAINED MOVIE APPLIED TO THE FUNCTIONAL VERIFICATION OF AN MPEG-4 DECODER DESIGN
Author(s):
George S. Silveira, Karina R. G. da Silva and Elmar U. K. Melcher
Abstract:
The advent of the new VLSI technology and SoC design methodologies has brought about an explosive growth to the complexity of modern electronic circuits. One big problem in the hardware design verification is to find good stimuli to make functional verification. A MPEG-4 decoder design require movies in order to make the functional verification. A real movie applied alone is not enough to test all functionalities, a random movie is used as stimuli to implement functional verification and reach coverage. This paper presents a comparison between a random constrained movie generator called RandMovie versus the use of a Unconstrained Random Movie. It shows the benefits of using a random constrained movie in order to reach the specified functional coverage. With such a movie generator one is capable of generating good random constrained movies, increasing coverage and simulating all specified functionalities. A case study for an MPEG-4 decoder design has been used to demonstrate the effectiveness of this approach.

Title:
DEVELOPMENT OF AN EYE GAZE INTERFACE SYSTEM AND IMPROVEMENT OF CURSOR CONTROL FUNCTION
Author(s):
Tetsuya Yonezawa, Kohichi Ogata, Masashi Nishimura and Kohei Matsumoto
Abstract:
This paper introduces an eye gaze interface system for controlling a mouse cursor on the computer display. The system consists of a small video camera to capture an eye image and a computer to detect the eye gaze from the image and to calculate the position of the cursor to be displayed depending on the detected eye gaze. In order to develop an easy-to-use system, consideration of involuntary and voluntary eye blink is necessary for practical use. Improvement of the stability of eye gaze-controlled cursor movement is also important. In this paper, smooth cursor control using a moving average filter and detection of involuntary and voluntary eye blink are described. The experiments show the usefulness of the proposed methods for quick and stable mouse cursor control. In the experiment of cursor pointing accuracy, distances between the target and the cursor point are about 30 pixels in horizontal direction and 20 pixels in vertical direction.

Title:
A NEW VIDEO QUALITY PREDICTOR BASED ON DECODER PARAMETER EXTRACTION
Author(s):
Andreas Rossholm and Benny Lövström
Abstract:
In the mobile communication area there is a demand for reference free perceptual quality measurements in video applications. In addition low complexity measurements are required. This paper proposes a method for prediction of a number of well known quality metrics, where the inputs to the predictors are readily available parameters at the decoder side of the communications channel. After an investigation of the dependencies between these parameters and between each parameter and the quality metrics, a set of parameters is chosen for the predictor. This predictor shows good results, especially for the PSNR and the PEVQ metrics.

Title:
A SMART SURVEILLANCE SYSTEM FOR HUMAN FALL-DOWN DETECTION USING DUAL HETEROGENEOUS CAMERAS
Author(s):
Shaou-Gang Miaou, Cheng-Yu Chien, Fu-Chiau Shih and Chia-Yuan Huang
Abstract:
We propose a new surveillance system that uses both omni-directional (OD) and Pan/Tilt/Zoom (PTZ) cameras with heterogeneous characteristics and a relatively simple image processing algorithm to achieve the goal of real time surveillance. The system is demonstrated for detecting the occurrence of human’s fall-down event. An OD camera has a 360∘viewing angle. It is used here to replace the multiple traditional cameras having limited viewing angles in order to reduce the system cost. A PTZ camera is also used in the system to track the target of interest and verify the occurrence of the event. Various unique features obtained from OD images are used for fall down detection and a multi-classifier approach is used for better recognition performance. Experimental results show that the system is quite robust to sudden changes of walking paths and different directions of falling. During the tracking process, a moving target is captured and its representative coordinates is obtained based on the processing of continuous OD images. The coordinates of the target in the OD camera space will be converted to its corresponding three dimensional (3D) coordinates in a real-world space. This derived information is served as guidance for the automatic control of the PTZ camera to track the moving target as closely as it can. By combining the advantages of two heterogeneous types of cameras, our experimental results show that the proposed system can track the moving target well without the need of a complicated method, showing the feasibility and potential of the system.

Title:
ANONYMOUS BUYER-SELLER WATERMARKING PROTOCOL WITH ADDITIVE HOMOMORPHISM
Author(s):
Mina Deng, Li Weng and Bart Preneel
Abstract:
Buyer-seller watermarking protocols integrate multimedia watermarking and fingerprinting with cryptography, for copyright protection, piracy tracing, and privacy protection. We propose an efficient buyer-seller watermarking protocol based on dynamic group signatures and additive homomorphism, to provide all the required security properties, namely traceability, anonymity, unlinkability, dispute resolution, non-framing, and non-repudiation. Another distinct feature is the improvement of the protocol's utility, such that the double watermark insertion mechanism is avoided; the final quality of the distributed content is improved; the communication expansion ratio and computation complexity are reduced, comparing with conventional schemes.

Title:
ADAPTIVE REAL-TIME WATERMARKING USING BLOCK CLASSIFICATION FOR H.264 COMPRESSED DOMAIN
Author(s):
Yin Zhang, Zengxiang Lu and Haiming Lu
Abstract:
Focusing on the problem that watermark can cause visible image distortions in some plain areas, an adaptive watermarking algorithm for H.264 is proposed. To embed watermark quickly, we directly operate in the DCT domain. For high imperceptibility, we classify the blocks based on Human Visual System (HVS). However, most current classification methods are not suitable for H.264 compressed domain because the DCT coefficients residue cannot reflect the texture activity accurately due to the employment of intra-prediction. A new block classification method is applied, in which we make restriction during the encoding process so that the 4×4 blocks can be classified into plain or non-plain blocks according to the intra-prediction mode and quantized integer DCT coefficients in compressed domain. It is effective in block classification and realizes good adaptive performance for watermarking. Drift compensation is also accomplished in our watermarking algorithm. The experimental results demonstrate our watermarking method can achieve both large capacity and good image imperceptibility. Additionally, the method is simple and appropriate for real-time applications.

Title:
BUILDING MODULAR SURVEILLANCE SYSTEMS BASED ON MULTIPLE SOURCES OF INFORMATION - Architecture and Requirements
Author(s):
Daniel Durães, Luís F. Teixeira and Luís Corte-Real
Abstract:
Intelligent surveillance is becoming increasingly important for the enhanced protection of facilities such as airports and power stations from various types of threats. We propose a surveillance system architecture based on multiple sources of information to apply on large scale surveillance networks. The main contribution of this paper is the definition of the requirements for a flexible and scalable architecture that supports intelligent surveillance using, alongside video, different sources of information, such as audio or other sensors.

Title:
AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES
Author(s):
Filipe Coelho, Luis Batista, Luis F. Teixeira and Jaime S. Cardoso
Abstract:
Until the rise of electronic means for direct debit, bank cheques have been used as the best form of payment, balancing security and ease of use. Its acceptance and generalized use are result of international agreements that define rules for filling and using it. The fast processing of payments and transactions through safer electronic methods has created the need to reduce its usage over the last years. But despite this progressive reduction, bank cheques still are and will continue to be used; therefore, there is the need to optimize processing mechanisms. Despite the existence of several cheque automatic processing systems, they are proprietary and not adapted to the Portuguese language, which is crucial for the cheque analysis and recognition. A prototype of an automatic system for the recognition of the amount in Portuguese bank cheques amounts has been implemented and is being used as a test platform for improved intelligent character recognition algorithms.

Title:
2D HAND GESTURE RECOGNITION METHODS FOR INTERACTIVE BOARD GAME APPLICATIONS
Author(s):
Athanasios Kalpakas, Konstantinos Stampoulis, Nikolaos Zikos and Stefanos Zaharos
Abstract:
The purpose of the current project is to demonstrate a complete interactive application capable of recognizing 2D hand gestures in order to interact with computer-based board games without the use of a special input devices, such as pointer, mouse or keyboard. A web camera is placed at the top of the platform and captures in real-time player’s hand gestures and then recognizes the position of his fingertip on the board. The user is able to choose a piece, select a destination spot and move a piece just by simply placing and moving his/her index finger onto the board. Therefore an interactive, compact platform was developed, containing a light-wood construction, a printed chess board and a conventional webcam in order to test the effectiveness of the system. The suggested interactive system is fully compatible with the latest software technologies, uses a custom GUI, real-time 2D hand gesture recognizer and earcons.

Title:
ON THE NEED FOR INCENTIVES TO SUPPORT PERSONALIZATION SYSTEMS - Turning Users into Active Providers of Contents and Metadata
Author(s):
Martín López-Nores, José J. Pazos-Arias, Jorge García-Duque, Yolanda Blanco-Fernández, Alberto Gil-Solla and Manuel Ramos-Cabrer
Abstract:
Research in personalization systems has made enormous progress in the last few years. However, the phenomenon of information overload is taking the state of the art to a dead end, due to the lack of metadata to describe the growing number of available contents. In this position paper, we take a look at the problem and suggest a research roadmap to find a way out, working on the idea of providing incentives to the end users to become active providers of contents and metadata.

Title:
A NEW ARCHITECTURE FOR A MULTIPLATFORM AUGMENTED REALITY SYSTEM
Author(s):
Andriamasinoro Rahajaniaina and Jean-Pierre Jessel
Abstract:
In this paper we describe a new architecture for augmented reality (AR) multiplatform hardware device, which works in dynamic workspace environment with 3D virtual models. The users can interact with the virtual model using mouse, keyboard and stylus as interaction tools. The work plan is formed by ARToolkitPlus’ fudicial multimarker. For adding virtual objects, we propose a virtual menu that is inspired by the metaphor of forward and next buttons. The work plan is augmented by the virtual workspace. User can choose his virtual workspace in dynamic way using this virtual menu and can choose a virtual object suited to the current workspace.

Title:
ON NLMS ESTIMATION FOR VOIP PLAYOUT DELAY ALGORITHMS - Improving Delay Spike Detection
Author(s):
Karen S. Miranda-Campos and Víctor M. Ramos R.
Abstract:

Voice over IP (VoIP) applications are now very popular and widely used on the Internet. Such applications use receiver playout buffers to smooth delay variations so as to reconstruct the periodic form of the transmitted packets. Packets arriving after their scheduled playout time are considered late and are not played out. Playout delay control algorithms often operate by updating the playout delay between periods of silence. A recent class of playout control algorithms has received particular attention; this class of algorithms uses autoregressive measures on the network delay so as to estimate future packet delay values and adjust the playout delay accordingly. In this work, we compare two algorithms previously proposed that use such autoregressive approach; both playout algorithms use a normalized least-mean square (NLMS) adaptive predictor. The difference between both algoritms is that the second one is an extension of the first that adds delay spike detection. We demonstrate, by using Internet audio packet traces that, contrary on what was claimed, the algorithm that uses spike detection does not overperfom the first one. Finally, we propose an algorithm based on the original NLMS algorithm with delay spike detection that overperforms the previous two NLMS playout algorithms.


Title:
SECURE AND ROBUST COPYRIGHT PROTECTION FOR H.264/AVC BASED ON SELECTED BLOCKS DCT
Author(s):
K. Ait Saadi, A. Bouridane and H. Meraoubi
Abstract:
This paper proposes a new block based DCT selection and a robust video watermarking algorithms to hide copyright information in the compressed domain of the emerging video coding standard H.264/AVC. To achieve robustness, the watermark is first quantized and inserted securely in selected high entropy DCT blocks using Linear Congruential Generator (LCG) technique. This approach leads to a good robustness by maintaining good visual quality of the watermarked sequences. The experimental results demonstrate the effectiveness of the algorithm against some attacks such as re-compression, transcoding and scaling.

Title:
CAPTURING THE HUMAN ACTION SEMANTICS USING A QUERY-BY-EXAMPLE
Author(s):
Anna Montesanto, Paola Baldassarri, A. F. Dragoni, G. Vallesi and P. Puliti
Abstract:
The paper describes a method for extracting human action semantics in video’s using queries-by-example. Here we consider the indexing and the matching problems of content-based human motion data retrieval. The query formulation is based on trajectories that may be easily built or extracted by following relevant points on a video, by a novice user too. The so realized trajectories contain high value of action semantics. The semantic schema is built by splitting a trajectory in time ordered sub-sequences that contain the features of extracted points. This kind of semantic representation allows reducing the search space dimensionality and, being human-oriented, allows a selective recognition of actions that are very similar among them. A neural network system analyzes the video semantic similarity, using a two-layer architecture of multilayer perceptrons, which is able to learn the semantic schema of the actions and to recognize them.

Title:
ARTIFICIAL NEURAL NETWORKS BASED SYMBOLIC GESTURE INTERFACE
Author(s):
C. Iacopino, Anna Montesanto, Paola Baldassarri, A. F. Dragoni and P. Puliti
Abstract:
The purpose of the developed system is the realization of a gesture recognizer, applied to a user interface. We tried to get fast and easy software for user, without leaving out reliability and using instruments available to common user: a PC and a webcam. The gesture detection is based on well-known artificial vision techniques, as the tracking algorithm by Lucas and Kanade. The paths, opportunely selected, are recognized by a double layered architecture of multilayer perceptrons. The realized system is efficiency and has a good robustness, paying attention to an adequate learning of gesture vocabulary both for the user and for system.

Title:
A COMPARATIVE USABILITY EVALUATION OF TWO AUGMENTED REALITY LEARNING SCENARIOS
Author(s):
Alexandru Balog and Costin Pribeanu
Abstract:
The proliferation of Augmented Reality (AR) systems is challenging the developers to design novel interaction techniques which are mainly driven by the possibilities to manipulate specific real objects. These interaction components have to be tested with users as early as possible in the development cycle in order to avoid usability problems. While formative usability evaluation is performed with the purpose of improving the design summative evaluation is a useful aid to compare alternative solutions or similar systems. This paper reports on a comparative analysis of the usability evaluation results for two AR-based learning scenarios. The purpose of the evaluation was twofold: (a) getting an early feedback from users on the first version of the software, and (b) comparing the usability of two learning scenarios developed onto the same AR platform. For this purpose, a usability questionnaire has been developed that is based on a technology acceptance model. The comparison has been performed between both quantitative and qualitative measures collected during a summer school.

Title:
MULTI-ATTRIBUTE DECISION MAKING FOR AFFECTIVE BI-BIMODAL INTERACTION IN MOBILE DEVICES
Author(s):
Efthymios Alepis, Maria Virvou and Katerina Kabassi
Abstract:

This paper presents how multi attributes decision making is used for affective interaction in mobile devices. The system bases its inferences about users’ emotions on user input evidence from the keyboard and the microphone of the mobile device. The actual combination of evidence from these two modes of interaction has been performed based on an innovative inference mechanism for emotions and a multi-attribute decision making theory. The mechanism that integrates the inferences form the two modes has been based on the results of two empirical studies, with the participation of human experts and possible users of the system.


Title:
A CONFIGURABLE LINUX FILE SYSTEM FOR MULTIMEDIA DATA
Author(s):
Nicola Corriero, Vittoria Cozza, Eustrat Zhupa and Vito De Tullio
Abstract:
In MusicMeshFS the tree structure of the virtual Linux filesystem, extended and made configurable by the MusicMeshFS language, is adapted for storing and efficiently retriving multimedia data. In the case of installing MusicMeshFS inside an embedded system equipped with WIFI card, the multimedia data sharing over an ad hoc mesh network can be achieved for free.

Title:
PERFORMANCE CONSIDERATIONS ON ADMISSION CONTROL FOR MULTIMEDIA SERVICES
Author(s):
Brikena Statovci-Halimi and Harmen R. van As
Abstract:
New service management paradigms, protocols and control mechanisms are required for supporting today’s Internet service heterogeneity and integration while providing consistent quality of service (QoS). Within service management admission control has been recognized as a convenient mechanism to provide high-quality communication by ensuring resources availability. It represents a meaning in fulfilling the contracted service level agreement (SLA) between the user and the network provider. This paper provides some considerations on main issues and characteristics of admission control mechanisms. It further gives a more detailed analysis on the measurement-based admission control, and provides a performance comparison of different approaches, and introduces a new admission control approach based on measurements. The performance of this algorithm is illustrated through simulation results.

Title:
HUMAN SKIN COLOR DETECTION AND APPLICATION TO ADULT IMAGE DETECTION
Author(s):
Ryszard S. Choras
Abstract:

In this paper, we aimed at the detection of adult image. The methods of detection mainly focus on the detection/identification of skin region. Skin detection is of the paramount importance in the detection of adult images. Our algorithm is designed to detect human skin color in $YC_bC_r$ color space. The proposed system finds skin regions and then generates the skin likelihood image. Since the skin likelihood image contains shape information as well as skin color information, we used the skin likelihood image to classify the adult images.


Title:
SOFTWARE LIFE-CYCLE FOR AN ADAPTIVE GEOGRAPHICAL INFORMATION SYSTEM
Author(s):
Katerina Kabassi, Maria Virvou, Eleni Charou and Aristotelis Martinis
Abstract:
In this paper we present the software life-cycle for the development of an adaptive Geographical Information System (GIS). More specifically, we focus on the experimental studies for designing and implementing a decision making theory into the GIS for adapting the data presented to each user. The data set used by the GIS is heterogeneous and contains raster data, vector data and multimedia. For the evaluation of the data used, the system uses the multi-criteria decision making model and selects those that seem more appropriate for a particular user. In this way, the GIS has the ability of adapting its interaction to each user and make interaction more user friendly.

Title:
SUBJECTIVE VERIFICATION OF PERCEPTUAL METRICS FOR IMAGE WATERMARKING FIDELITY
Author(s):
Franco Del Colle and Juan Carlos Gómez
Abstract:

In this paper, the performance of several state-of-the-art watermark perceptual transparency metrics is evaluated through subjective assessment. Simulation results show that a metric based on S-CIELAB perceptual distortion maps proved to be better correlated to the subjective tests than other objective metrics available in the literature. The paper focus on Image Adaptive Watermarking methods in the Discrete Wavelet Transform Domain since they yield better results regarding robustness and transparency than other watermarking schemes.