SIGMAP 2006 Abstracts
CONFERENCE
Area 1 - Multimedia Communications
Area 2 - Multimedia Signal Processing
Area 3 - Multimedia Systems and Applications
Title:
DESIGN AND IMPLEMENTATION OF VIDEO ON DEMAND SERVICES OVER A PEER-TO-PEER MULTIOVERLAY NETWORK
Author(s):
Jia-Ming Chen, Jenq-Shiou Leu, Hsin-Wen Wei, Li-Ping Tung, Yen-Ting Chou and Wei-Kuan Shih
Abstract:
Video-on-Demand (VoD) services using peer-to-peer (P2P) technologies benefit by balancing load among clients and maximizing their bandwidth utilization to reduce the burden on central video servers with the single point of failure. Conventional P2P techniques for realizing VoD services stream media only consider data between active peers in the same VoD session. They never consider those inactive peers that have left the session but may still hold partial media content in their local storage. In this article, we propose a novel architecture to construct a fully decentralized P2P overlay network for VoD streaming services based on a multioverlay concept. The architecture is referred to as MegaDrop, and not only takes the types of peers into consideration but also provides mechanisms for discovering nodes that may contain desired media objects. Such a P2P-based scheme can distribute media among peers, allow peers to search for a specific media object over the entire network efficiently, and stream the media object from a group of the peers. We employ a layered architecture consisting of four major tiers: Peer Discovery Layer, Content Lookup Layer, Media Streaming Layer, and Playback Control Layer. The evaluation results show that our architecture is particularly efficient for huge media delivery and multiuser streaming sessions.

Title:
FAST CONVERSION OF H.264/AVC INTEGER TRANSFORM COEFFICIENTS INTO DCT COEFFICIENTS
Author(s):
R. Marques, V. Silva, S. Faria, A. Navarro and P. Assuncão
Abstract:
In this paper we propose a fast method to convert H.264/AVC Integer Transform (IT) coefficients to Discrete Cosine Transform (DCT) coefficients for applications in video transcoding. We derive the transform kernel matrix for converting, in the transform domain, four IT coefficient blocks into one DCT block of coefficients. By exploiting the symmetry of this matrix, we show that the proposed conversion method requires fewer computations than its equivalent in the pixel domain. An integer approximation for the kernel matrix is also proposed. The experimental results show that a negligible error is introduced, while the computational complexity can be significantly reduced.

Title:
A PREDICTIVE MULTI-CHANNEL MBAC TECHNIQUE FOR ON-LINE VIDEO STREAMING
Author(s):
Pietro Camarda, Cataldo Guaragnella and Domenico Striccoli
Abstract:
A measurement based admission control predictive technique is introduced for on line streaming systemsexploiting the GOP length demultiplexing of the aggregate bit stream in conjunction with a linear predictive algorithm. Due to the long latency of the statistical aggregate, the predictive technique is able to predict the bit rate over about two seconds of time. The prediction information is used in an admission control system to estimate the bit rate and the margin with respect to the channel capacity in the proposed streaming system. These measures have been used to estimate the overflow probability in a general aggregate situation. Tests conducted over real video sequences confirm the feasibility of the proposed technique.

Title:
HDL LIBRARY OF PROCESSING UNITS FOR GENERIC AND DVB-S2 LDPC DECODING
Author(s):
Marco Gomes, Gabriel Falcão, João Gonçalves, Vitor Silva, Miguel Falcão and Pedro Faia
Abstract:
This paper proposes an efficient HDL library of processing units for generic and DVB-S2 LDPC decoders following a modular and automatic design approach. General purpose, low complexity and high throughput bit node and check node functional models are developed. Both full serial and parallel architecture versions are considered. Also, a dedicated functional unit for an array processor LDPC decoder architecture to the DVB-S2 standard is considered. Additionally, it is described an automatic HDL code generator tool for arbitrary decoder architectures and LDPC codes, based on the proposed processing units and Matlab scripts.

Title:
USING PLACEHOLDER SLICES AND MPEG-21 BSDL FOR ROI EXTRACTION IN H.264/AVC FMO-ENCODED BITSTREAMS
Author(s):
Peter Lambert, Wesley De Neve, Davy De Schrijver, Yves Dhondt and Rik Van de Walle
Abstract:
The concept of Regions of Interest (ROIs) within a video sequence is useful for many application scenarios. This paper concentrates on the exploitation of ROI coding within the H.264/AVC specification by making use of Flexible Macroblock Ordering. It shows how ROIs can be coded in an H.264/AVC compliant bitstream and how the MPEG-21 BSDL framework can be used for the extraction of the ROIs. The first type of ROI extraction that is described, is simply dropping the slices that are not part of one of the ROIs. The second type is the replacement of these slices with so-called placeholder slices, the latter being implemented as P slices containing only macroblocks that are marked as 'skipped'. The exploitation of ROI scalability, as achieved by the presented methodology, illustrates the possibilities that are offered by the single-layered H.264/AVC specification for content adaptation. The results show that the bit rate needed to transmit the adapted bitstreams can be reduced significantly. Especially in the case of a static camera and a fixed background, this bit rate reduction has very little impact on the visual quality. Another advantage of the adaptation process is the fact that the execution speed of the receiving decoder fairly increases.

Title:
WEIGHT UPDATING METHODS FOR A DYNAMIC WEIGHTED FAIR QUEUING (WFQ) SCHEDULER
Author(s):
Gianmarco Panza, Valentin Besoiu, Catherine Lamy-Bergot, Filippo Sidoti and Roberto Bertoldi
Abstract:
This work aims to analyze different weight updating methods for a dynamic Weighted Fair Queuing (WFQ) scheduler providing Quality of Service (QoS) guarantees for the applications of the IST PHOENIX project and for new value-added services in general. Two weight updating methods are investigated in terms of granted delays to concerned service classes and buffer utilization of the related queues at a given IP interface. In particular, a novel weight updating based on the Knightly’s theory is proposed. Simulation results have demonstrated that a dynamic WFQ based on either of the weight updating methods can well support a proportional relative model of QoS in a Diff-Serv architecture in an IP-based Next Generation Network. The designed system is extremely simple and effective, and with low computational overhead by employing an innovative technique to evaluate the trend of the entering traffic aggregates, in order to trigger a scheduler’s weights updating only when needed.

Title:
STREAMING LOW-DELAY VIDEO OVER AIMD TRANSPORT PROTOCOLS
Author(s):
Ahmed Abd El Al, Tarek Saadawi and Muyng Lee
Abstract:
In this paper, we present adaptation strategies for low-delay video streams over Additive-Increase Multiplicative- Decrease (AIMD) transport protocols, where we switch among several versions of the coded video to match the available network bandwidth accurately, and meet client delay constraints. By monitoring the application buffer at the server, we estimate the current and future server buffer drain delay, and derive the transmission rate to minimize client buffer starvation. We also show that the adaptation accuracy can be significantly improved by a simple scaling to transport protocol send-buffer size. The proposed mechanisms were implemented over Stream Control Transmission Protocol (SCTP) and evaluated through simulation and real Internet traces. Performance results show that the adaptation mechanism is responsive to bandwidth fluctuations, while ensuring that the client buffer does not underflow, and that the quality adaptation is smooth so that the impact on the perceptual quality at the client is minimal.

Title:
AN EFFICIENT PACKETIZATION SCHEME FOR VOIP
Author(s):
Antonio Estepa, Rafael Estepa and Juan Vozmediano
Abstract:
A number of VoIP audio codecs generate Silence Insertion Descriptor(SID) frames during talk-gaps of conversations to update the confort noise generation parameters at the receiver. According to RFC 3551 packetization scheme, discontinuously-generated SID frames can not be carried in the same IP packet which increases the conversation's bandwidth requirement. We define a novel packetization scheme in which a set of non-consecutive SID frames are allowed to share the packet overhead while keeping the timing between them. We provide analytical expressions and experimental validation for the bandwidth saving obtained with this new scheme, which yields up to a 14\% for the G.729B codec.

Title:
VIDEOCONFERENCE OVER IPV6 - IPv6 Networks Advanced Developments
Author(s):
Carlos Friaças, José Baptista, Mónica Domingues and Paulo Ferreira
Abstract:
This document focuses IPv6 support on H.323 videoconference protocol and on ConferenceXP architecture. The last one was designed by Microsoft to develop collaborative tasks and videoconference applications. In this document videoconference solutions like GnomeMeeting and ConferenceXP Client (and adjacent services) that implement the analyzed protocols are also presented. This way, guidelines for the deployment of a videoconference service with IPv6 support are provided.

Title:
A COMPARATIVE STUDY OF STATE-OF-THE-ART ADAPTATION TECHNIQUES FOR SCALABLE MULTIMEDIA STREAMS
Author(s):
Andreas Schorr, Franz J. Hauck, Bernhard Feiten and Ingo Wolf
Abstract:
Multimedia adaptation is a key technology for enabling communication between heterogeneous multimedia devices and applications, possibly located in heterogeneous wired or wireless networks. Converting already compressed multimedia streams into a format suitable for a certain receiver terminal and network can be achieved by transcoding or by filtering of media streams. Transcoding allows more flexible adaptation operations but is in general a very CPU-intensiv process. Therefore, scalable media formats have been developed, which allow more efficient adaptation of media streams through media filtering. Several filter techniques for specific media formats have been proposed and implemented during the last decade. Recently, the MPEG-21 Digital Item Adaptation standard has defined a set of new tools for multimedia adaptation. In this paper, we provide a comparative study of several adaptation techniques for scalable multimedia streams. We compare generic MPEG-21-based adaptation techniques with filter mechanisms for specific media formats with respect to the required processing resources and scalability. We also compare filter techniques with stream adaptation through transcoding. Moreover, we compare how adaptation of multiple media streams performs on systems with single-core and with multi-core processors.

Title:
CONGESTION CONTROL ACROSS A VIDEO-DOMINATED INTERNET TIGHT LINK
Author(s):
Emmanuel Jammeh, Martin Fleury and Mohammed Ghanbari
Abstract:
Changing traffic patterns on the Internet imply that existing congestion controllers designed with TCP traffic in mind may encounter all UDP video traffic controlled by the same or another congestion controller. Three different congestion controllers (RAP, TFRC, and fuzzy-logic based), already successful in avoiding instability in current TCP-dominated internets, were tested across a tight link in which video traffic predominated. Congestion control is either achieved by modulating the sending rate in response to feedback of packet loss rates and/or round-trip delays (RAP/RTFRC) or a congestion level based on packet dispersion across a network path (fuzzy controller). The controllers were found to differ in the smoothness of resulting video clip streams, with fuzzy followed by TFRC controllers producing the least bursty received video. Tests also demonstrated that, when controlled flows of different types compete across a tight link, it is possible for the sending rate of TFRC to exceed the available bandwidth, resulting in excess packet loss and implying reduced received video quality. The results show that fuzzy-logic control is more flexible when video dominates.

Title:
BALANCED RESOURCE SHARE MECHANISM IN OPTICAL NETWORKS
Author(s):
Hyeon Park, Byung-Ho Yae, Dong-Hun Lee and Sang-Ha Kim
Abstract:
Existing protection mechanisms to rapidly recover the failures allocate the backup path just SRLG-disjointed with working path. However, these mechanisms have the low resource utilization because the resource does not be shared among the backup paths. To complement it, Kini[1], Somdip[2] and so on have proposed the mechanisms to share the resources of the backup paths. Although these mechanisms can improve the efficiency of bandwidth usage, those did not consider the unbalanced resource share of backup paths. The backup paths can be centralized on the specific link so that the idle resource is not used. As a result, as those do not use the resource efficiently the whole resource utilization is not so good. So we propose the mechanism to enhance the resources utilization as settling down the unbalanced resource share. Our purpose is to improve the link utilization as distributing the maximum link load. We formulate the problem to minimize the number of the used backup resource (exactly, wavelengths) as considering the maximum link load. For the mechanism we offer two SRLGs, Low-SRLG (Low Shared Risk Link Group) and High-SRLG (High Sub-domain Resource Link Group). LSRLG is provided to decide the resource share among the backup paths and HSRLG is provided to distribute the link load, which present the unbalanced resource usage of the link. We compare the existing mechanisms with our mechanism by the spare resource capacity as the result of the simulation result.

Title:
CONVOLUTION KERNEL COMPENSATION APPLIED TO 1D AND 2D BLIND SOURCE SEPARATION
Author(s):
Damjan Zazula, Aleš Holobar and Matjaž Divjak
Abstract:
Many practical situations can be modelled with multiple-input multiple-output (MIMO) models. If the input sources are mutually orthogonal, several blind source separation methods can be used to reconstruct the sources and model transfer channels. In this paper, we derive a new approach of this kind, which is based on the compensation of the model convolution kernel. It detects the triggering instants of individual sources, and tolerates their non-orthogonalities and high amount of additive noise, which qualifies the method in several signal and image analysis applications where other approaches fail.. We explain how to implement the convolution kernel compensation (CKC) method both in 1D and 2D cases. This unified approach made us able to demonstrate its performance in two different experiments. A 1D application was introduced to the decomposition of surface electromyograms (SEMG). Nine healthy males participated in the tests with 5% and 10% maximum voluntary isometric contractions (MVC) of biceps brachii muscle. We identified 3.4 ± 1.3 (mean ± standard deviation) and 6.2 ± 2.2 motor units (MUs) at 5% and 10% MVC, respectively. At the same time, we applied the 2D version of CKC to range imaging. Dealing with the Middlebury Stereo Vision referential set of images, our method found correct matches of 91.3 ± 12.1% of all pixels, while the obtained RMS disparity difference was 3.4 ± 2.5 pixels. This results are comparable to other ranging approaches, but our solution exhibits better robustness and reliability

Title:
ON-THE-FLY TIME SCALING FOR COMPRESSED AUDIO STREAMS
Author(s):
Suzana Maranhão, Rogério Rodrigues and Luiz Soares
Abstract:
Timescale is a technique used to modify media-object presentation duration. This paper proposes an audio timescale-algorithm focuses on supporting applications that need to maintain the original data format for storing or immediately presenting on any legacy audio player; to perform linear timescale in real time, allowing the adjustment factor to vary along the audio presentation; and to perform time mark-up maintenance, that is, to compute new time values for original marked audio time instants. The proposed algorithm is appropriate for those applications that do not need a great adjustment-factor variation. The integration with content rendering tools is presented in the paper and also an example of using these tools in a hypermedia presentation formatter.

Title:
SIGNAL DENOISING BASED ON PARAMETRIC HAAR-LIKE TRANSFORMS
Author(s):
Susanna Minasyan, Karen Egiazarian, Jaakko Astola and David Guevorkian
Abstract:
Orthogonal transforms have found considerable interest in signal denoising applications. Recently, it has been shown that Parametric Haar-like Transforms (PHT) have found their application in image denoising and compression applications. PHT is such that it may be computed with fast algorithm in structure similar to that of classical fast Haar transform and such that its matrix contains a predefined basis vector as its first row. In this paper the capability of parametric transforms, including family of Haar-like transforms, in 1D signal denoising application is explored. Results from simulations show that by applying the proposed denoising method the performance of wavelet-thresholding based method is improved significantly. Similar algorithm may be applied also in image denoising.

Title:
IMPROVING MULTISCALE RECURRENT PATTERN IMAGE CODING WITH DEBLOCKING FILTERING
Author(s):
Nuno M. M. Rodrigues, Eduardo A. B. da Silva, Murilo B. de Carvalho, Sérgio M. M. de Faria and Vitor M. M. da Silva
Abstract:
The Multidimensional Multiscale Parser (MMP) algorithm is an image encoder that approximates the image blocks by using recurrent patterns, from an adaptive dictionary, at different scales. This encoder performs well for a large range of image data. However, images encoded with MMP suffer from blocking artifacts. This paper presents the design of a deblocking filter that improves the performance the MMP. We present the results of our research, that aims to increase the performance of MMP, particularly for smooth images, without causing quality losses for other image types, where its performance is already up to 5 dB better than that of top transform based encoders. For smooth images, the proposed filter introduces relevant perceptual quality gains by efficiently eliminating the blocking effects, without introducing the usual blurring artifacts. Besides this, we show that, unlike traditional deblocking algorithms, the proposed method also improves the objective quality of the decoded image, achieving PSNR gains of up to about 0.3 dB. With such gains, MMP reaches an almost equivalent performance to that of the state-of-the-art image encoders (equal to that of JPEG2000 for higher compression ratios), for smooth images, while maintaining its gains for non-smooth images. In fact, for all image types, the proposed method provides significant perceptual improvements, without sacrificing the PSNR performance.

Title:
DIRECTION BIASED SEARCH ALGORITHMS FOR FAST BLOCK MOTION ESTIMATION
Author(s):
Niranjan Mulay
Abstract:
Motion estimation (ME) is computationally the most challenging part of the video encoding process. It has a direct impact on speed, bit rate and qualitative performance of the encoder. The block based motion estimation algorithms have become widely accepted in the real time video applications based on H.26x and MPEG standards. Even though a Full Search (FS) algorithm delivers qualitatively the most optimal solution resulting into the least residual data to be coded, the requirement of a huge computational complexity makes it unsuitable for the real time video applications. Consequently, many sub-optimal but faster ME algorithms have been developed till date. In particular, the Three Step Search (TSS) and Four Step Search (FSS) algorithms have become popular because of their ease of implementation. The TSS algorithm is a uniformly spaced block matching algorithm, which performs better in case of large motion. On the other hand, the New Three Step Search (NTSS) and FSS are center biased algorithms that outperform TSS in case of smooth correlated motion. Later, another center biased search technique namely, the Diamond Search (DS) algorithm was introduced which was proved to deliver a faster convergence than FSS in case of smooth motion scenarios. The fact that the motion vector distribution for most of the real world video sequences is prominently biased towards the zero motion has made the center biased search algorithms more popular and successful. However, the performance of the center biased algorithms degrades in sequences having consistently large or uncorrelated motion as they become susceptible to getting trapped in local minima near the center. In this paper, two novel ME algorithms, namely, dual square search (DSS) and dual diamond search (DDS) are proposed in order to strike a balance between the center biased and uniformly spaced search techniques. The proposed algorithms suggest that a decision to shift the search center should be delayed till the candidates on a coarse as well as fine grid are evaluated. Moreover, these algorithms are modeled to exploit motion vector distribution found in most of the real world video sequences by giving more precedence to candidates near the center, followed by the candidates in the horizontal and vertical directions than those in the diagonal direction. The performance of the proposed algorithms is compared with TSS and FSS algorithms in terms of computational speed, motion compensation error and the compression achieved for various kinds of video sequences. The test results show that both these algorithms are substantially faster than TSS and FSS. The proposed ME algorithms promise to achieve a balanced tradeoff amongst ‘speed – bit rate - quality’ for different kinds of motion sequences.

Title:
A Simple and computationally efficient algorithm for real-time blind source seperation of speech mixtures
Author(s):
Tarig Ballal, Nedelko Grbic and Abbas Mohammed
Abstract:
In this paper we exploit the amplitude diversity provided by two sensors to achieve blind separation of two speech sources. We propose a simple and highly computationally efficient method for separating sources that are W-disjoint orthogonal (W-DO), that are sources whose time-frequency representations are disjoint sets. The Degenerate Unmixing and Estimation Technique (DUET), a powerful and efficient method that exploits the W-disjoint orthogonality property, requires extensive computations for maximum likehood parameter learning. Our proposed method avoids all the computations required for parameters estimation by assuming that the sources are ""cross high-low diverse (CH-LD)"". In a system with two sensors, two sources are said to be CH-LD, if the two sources are not both close to the same sensor. A source is close to a sensor, if its energy at that sensor is higher than its energy at the other sensor. This assumption can be satisfied exploiting the sensors settings/directions. With this assumption and the W-disjoint orthogonality property, two binary time-frequency masks that can extract the original sources from one of the two mixtures, can be constructed directly from the amplitude ratios of the time-frequency points of the two mixtures. The method works very well when tested with both artificial and real mixtures. Its performance is comparable to DUET if it does not outperform it, and it requires only 2% of the computations required by the DUET method. Moreover, it is free of convergence problems that lead to poor SIR ratios in the first parts of the signals. As with all binary masking approaches, the method suffers from artifacts that appear in the output signals.

Title:
SPEECH/MUSIC DISCRIMINATION BASED ON WAVELETS FOR BROADCAST PROGRAMS
Author(s):
Emmanuel Didiot, Irina Illina, Odile Mella, Dominique Fohr and Jean-Paul Haton
Abstract:
The problem of speech/music discrimination is a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) performance. This paper proposes new features for the Speech/Music discrimination task. We propose to use a decomposition of the audio signal based on wavelets, which allows a good analysis of non stationary signal like speech or music. We compute different energy types in each frequency band obtained from wavelet decomposition. Two class/non-class classifiers are used : one for speech/non-speech, one for music/non-music. On the broadcast test corpus, the proposed wavelet approach gives better results than the MFCC one. For instance, we have a significant relative improvements of the error rate of 39% for the speech/music discrimination task.

Title:
Real-Time Image Wavelet Coding for Low Bit Rate Transmission
Author(s):
Gaoyong Luo
Abstract:
Embedded coding for progressive image transmission has recently gained popularity in image compression community. However, current progressive wavelet-based image coders tend to be complex and computationally intense requiring large memory space. The encoding process usually sends information on the lowest-frequency wavelet coefficients first. At very low bit rates, images compressed are therefore dominated by low frequency information, where high frequency components belonging to edges are lost leading to blurring the signal features. This paper presents a new image coder for real-time transmission, employing edge preservation based on local variance analysis to improve the visual appearance and recognizability of compressed images. The analysis and compression is performed by dividing an image into blocks. Lifting wavelet filter bank is constructed for image decomposition and reconstruction with the advantages of being computationally efficient and boundary effects minimized. A modified SPIHT algorithm with more bits used to encode the wavelet coefficients and transmitting fewer bits in the sorting pass for performance improvement, is used to reduce the correlation of the coefficients at scalable bit rates. Local variance estimation and edge strength measurement can effectively determine the best bit allocation for each block to preserve the local features. Experimental results demonstrate that the method performs well both visually and in terms of quantitative performance measures, and offers error resilience feature that is evaluated using a simulated transmission channel with random error. The proposed fast image coder provides a potential solution with less memory requirements for real-time applications.

Title:
OPTIMAL POWER ALLOCATION IN A MIMO-OFDM TWISTED PAIR TRANSMISSION SYSTEM WITH FAR-END CROSSTALK
Author(s):
Andreas Ahrensa and Christoph Lange
Abstract:
Crosstalk between neighbouring wire pairs is one of the major impairments in digital transmission via multi-pair copper cables, which essentially limits the transmission quality and the throughput of such cables. For high-rate transmission, often the strong near-end crosstalk (NEXT) disturbance is avoided or suppressed and only the far-end crosstalk (FEXT) remains as crosstalk influence. If FEXT is present, signal parts are transmitted via the FEXT paths from the transmitter to the receiver in addition to the direct transmission paths. Therefore transmission schemes are of great practical interest, which take advantage of the signal parts transmitted via the FEXT paths. Here a SVD (singular-value decomposition) equalized MIMO-OFDM system is investigated, which is able to take advantage of the FEXT signal path. Based on the Lagrange multiplier method an optimal power allocation schema is considered in order to reduce the overall bit-error rate at a fixed data rate and fixed QAM constellation sizes. Thereby an interesting combination of SVD equalization and power allocation is considered, where the transmit power is not only adapted to the subchannels but rather to the symbol amplitudes of the SVD equalized data block. As a result it can be seen that the exploitation of FEXT is important for wireline transmission systems in particular with high couplings between neighbouring wire pairs and therefore enormous gains are possible.

Title:
PROCESSING OF NON-STATIONARY SIGNAL USING LEVEL-CROSSING SAMPLING
Author(s):
Modris Greitans
Abstract:
Multimedia signals typically are non-stationary, and their statistical characteristics vary with time. Preferably, the sampling density would comply with instantaneous bandwidth of signal. The level-crossing sampling principle is discussed, that provides such capability for analog-to-digital conversion. As the captured samples are spaced non-uniformly, the appropriate digital signal processing is required. The classical time-frequency representations are inspected. Several enhancements of short-time Fourier transform approach are proposed, which are based on the idea to minimize the reconstruction error not only at sampling instants, but also between them with the same accuracy. Additional benefits are gained if the instantaneous spectral range of analysis is complied with local sampling density: artifacts are removed, complexity of calculations is decreased. The performance of algorithms is demonstrated by simulations. The resolution of analysis can be improved by signal-dependent transformation. Presented research is attractive for clock-less designs, which receive now an increasing interest. Their promising advantages can play a significant role in future electronics' development.

Title:
SIDE INFORMATION INTERPOLATION WITH SUB-PEL MOTION COMPENSATION FOR WYNER-ZIV DECODER
Author(s):
Sven Klomp, Yuri Vatis and Jörn Ostermann
Abstract:
A new video coding paradigm, called Distributed Video Coding (DVC), receives rising attention in research. With this coding paradigm, the complex task of exploiting the source statistics can be moved from the encoder (e.g. H.264) to the decoder. Such a DVC decoder needs side information to exploit the statistics. In common DVC codecs, the side information is obtained by interpolating the current frame from already decoded frames. This paper proposes an interpolation technique for the side information that uses motion compensation with sub-pel accuracy, and compares different interpolation filters for calculating the sub-pel values. Using a six tab Wiener filter, we observe a gain of up to 1.8 dB for the DVC coded frames.

Title:
INTERVENANT CLASSIFICATION IN AN AUDIOVISUAL DOCUMENT
Author(s):
Philippeau Jeremy, Pinquier Julien and Joly Philippe
Abstract:
This document deals with the definition of a new descriptor for audiovisual document indexing : the intervenant. We actually focus on its audiovisual localization, this is to say its place in an audiovisual sequence and its classification in 3 categories : IN, OUT or OFF. Based on the comparison of different analysis tools of both audio and video modes, we define a set of descriptors which can automatically be filled, potentially relevant to classify the intervenant localization. This decision is taken on the base of transition modeling between classes.

Title:
AN E-LIBRARIAN SERVICE THAT YIELDS PERTINENT RESOURCES FROM A MULTIMEDIA KNOWLEDGE BASE
Author(s):
Serge Linckels and Christoph Meinel
Abstract:
In this paper we present an e-librarian service which is able to retrieve multimedia resources from a knowledge base in a more efficient way than by browsing through an index or by using a simple keyword search. We explored the approach to allow the user to formulate a complete question in natural language. Our background theory is composed of three steps. Firstly, there is the linguistic pre-processing of the user question. Secondly, there is the semantic interpretation of the user question into a logical and unambiguous form, i.e. ALC terminology. The focus function resolves ambiguities in the question; it returns the best interpretation for a given word in the context of the complete user question. Thirdly, there is the generation of a semantic query, and the retrieval of pertinent documents. We developed two prototypes: one about computer history (CHESt), and one about fractions in mathematics (MatES). We report on experiments with these prototypes that confirm the feasibility, the quality and the benefits of such an e-librarian service. From 229 different user questions, the system returned for 97% of the questions the right answer, and for nearly half of the questions only one answer, the best one.

Title:
EXTRACTING PERSONAL USER CONTEXT WITH A THREE-AXIS SENSOR MOUNTED ON A FREELY CARRIED CELL PHONE
Author(s):
Toshiki Iso and Kenichi Yamazaki
Abstract:
To realize ubiquitous services such as presence services and health care services, we propose an algorithm to extract ”personal user context” such as user’s behavior; it processes information gathered by a three-axis accelerometer mounted on a cell phone. Our algorithm has two main functions; one is to extract feature vectors by analyzing sensor data in detail by wavelet packet decomposition. The other is to flexibly cluster personal user context by combining a self-organizing algorithm with Bayesian theory. A prototype that implements the algorithm is constructed. Experiments on the prototype show that the algorithm can identify personal user contexts such as walking, running, going up/down stairs, and walking fast with an accuracy of about 88[%].

Title:
SEARCHING MOVIES BASED ON USER DEFINED SEMANTIC EVENTS
Author(s):
Bart Lehane, Noel E. O'Connor and Hyowon Lee
Abstract:
The number, and size, of digital video databases is continuously growing. Unfortunately, most, if not all, of the video content in these databases is stored without any sort of indexing or analysis and without any associated metadata. If any of the videos do have metadata, then it is usually the result of some manual annotation process rather than any automatic indexing. Locating clips and browsing content is difficult, time consuming and generally inefficient. The task of managing a set of movies is particularly difficult given their innovative creation process and the individual style of directors. This paper proposes a method of searching video data in order to retrieve semantic events thereby facilitating management of video databases. An interface is created which allows users to perform searching using the proposed method. In order to assess the searching method, this interface is used to conduct a set of experiments in which users are timed completing a set of tasks using both the searching method and an alternate, keyframe based, retrieval method. These experiments evaluate the searching method, and demonstrate it's versatility.

Title:
A VIDEO CLASSIFICATION METHOD FOR USER-CENTERED STREAMING SERVICES
Author(s):
Yuka Kato and Katsuya Hakozaki
Abstract:
The present paper analyzes the relationship between video content and subjective video quality for user-centered streaming services. In this analysis, we conduct subjective assessments using various types of video programs, and propose a method of classifying video programs into a number of groups that are thought by a large majority of users to have the same video quality. Control to a high level of user satisfaction can be performed by applying a different control method to each group obtained using the proposed method. In addition, we demonstrate the necessity of rate control according to video content by comparing a classification result based on vision parameters with the classification result based on the assessment result.

Title:
A NETWORK TRAFFIC SCHEDULER FOR A VOD SERVER ON THE INTERNET
Author(s):
Javier Balladini, Leandro Souza and Remo Suppi
Abstract:
Most of the Video on Demand (VoD) systems were designed to work in dedicated networks. However, there are some approaches that provide VoD service in nondedicated and best effort networks, but they adapt the media's quality according to the available network bandwidth. Our research activities focus on VoD systems with high quality service on nondedicated networks. Currently, we have designed and developed, to integrate in the VoD server, a network manager that provide: total network control, network state information, and adaptation of the transmission rate in a TCP-Friendly way. The present work describes this network manager, named Network Traffic Scheduler (NTS), which incorporates a congestion control algorithm named ""Enhanced Rate Adaptation Protocol"" (ERAP). ERAP is an optimization of the well-known protocol denominated ""Rate Adaptation Protocol"" (RAP). Maintaining the basic behavior of RAP, ERAP increases the efficiency of the NTS by reducing the resources usage (of the server and the network). These components has been extensively evaluated by simulations and real tests in which the resource consumption and the performance were measured. This paper presents the advantages of using ERAP instead of RAP in a VoD server, and its viability to be integrated within the NTS in a VoD server on nondedicated networks.

Title:
PROVIDING PHYSICAL SECURITY VIA VIDEO EVENT AWARENESS
Author(s):
Dimitrios Georgakopoulos and Donald Baker
Abstract:
The Video Event Awareness System (VEAS) analyzes surveillance video from thousands of video cameras and automatically detects complex events in near real-time—at pace with their input video streams. For events of interest to security personnel, VEAS generates and routes alerts and related video evidence to subscribing security personnel that facilitate decision making and timely response. In this paper we introduce VEAS’s novel publish/subscribe run-time system architecture and describe VEAS’s event detection approach. Event processing in VEAS is driven by user-authored awareness specifications that define patterns of inter-connected spatio-temporal event stream operators that consume and produce facility-specific events described in VEAS’s surveillance ontology. We describe how VEAS integrates and orchestrates continuous and tasked video analysis algorithms (e.g., for entity tracking and identification), how it fuses events form multiple sources and algorithms in an installation-specific entity model, how it can proactively seek additional information by tasking video analysis algorithms and security personnel to provide it, and how it deals with late arriving information due to out-of-band video analysis tasks and overhead. We use examples from the physical security domain, and discuss related and future work.

Title:
ROBUST CONTENT-BASED VIDEO WATERMARKING EXPLOITING MOTION ENTROPY MASKING EFFECT
Author(s):
Amir Houmansadr, Hamed Pirsiavash and Shahrokh Ghaemmaghami
Abstract:
A major class of image and video watermarking algorithms, i.e. content-based watermarking, is based on the concept of Human Visual System (HVS) in order to adapt more efficiently to the local characteristics of the host signal. In this paper, a content-based video watermarking scheme is developed and the concept of entropy masking effect is employed to significantly improve the use of the HVS model. Entropy masking effect states that the human eye’s sensitivity decreases in high entropy regions, i.e. regions with spatial or temporal complexity. The spatial entropy masking effect has been exploited in a number of previous works in order to enhance the robustness of image-adaptive watermarks. In the current research, we use the temporal entropy masking as well to achieve a higher performance in video watermarking. Experimental results show that more robust watermarked video sequences are produced considering temporal entropy masking effect, while the watermarks are still subjectively imperceptible. Robustness enhancement is a function of temporal and spatial complexity of the host video sequences.

Title:
ENHANCED INTERACTION FOR STREAMING MEDIA
Author(s):
Wolfgang Hürst, Tobias Lauer and Rainer Müller
Abstract:
Streaming is a popular and efficient way of web-based on-demand multimedia delivery. However, flexible methods of interaction and navigation, as required, for example, in learning applications, are very restricted with streamed contents. Using the example of recorded lectures, we point out the importance of such advanced interaction which is not possible with purely streamed media. A new delivery method based on a combination of streaming and download is proposed which can be realized with Rich Internet Applications. It combines the advantages of streaming delivery with navigational and interactive features that are usually known only from locally available media.

Title:
REAL-TIME SIMULATION OF SOUND SOURCE OCCLUSION
Author(s):
Chris Share and Graham McAllister
Abstract:
Sound source occlusion occurs when the direct path from a sound source to a listener is blocked by an intervening object. Currently, a variety of methods exist for modeling sound source occlusion. These include finite element and boundary element methods, as well as methods based on time-domain models of edge diffraction. At present, the high computational requirements of these methods precludes their use in real-time environments. In the case of real-time geometric room acoustic methods (e.g. the image method, ray tracing), the model of sound propagation employed makes it difficult to incorporate wave-related effects such as occlusion. As a result, these methods generally do not incorporate sound source occlusion. The lack of a suitable sound source occlusion method means that developers of real-time virtual environments (such as computer games) have generally either ignored this phenomenon or used rudimentary and perceptually implausible approximations. A potential solution to this problem is the use of shadow algorithms from computer graphics. These algorithms can provide a way to efficiently simulate sound source occlusion in real-time and in a physically plausible manner. Two simulation prototypes are presented, one for fixed-position sound sources and another for moving sound sources.

Title:
MULTI-MODAL WEB-BROWSING - An Empirical Approach to Improve the Browsing Process of Internet Retrieved Results
Author(s):
Dimitrios Rigas and Antonio Ciuffreda
Abstract:
This paper describes a survey and an experiment which were carried out to measure some usability aspects of a multi-modal interface to browse documents retrieved from the Internet. An experimental platform, called AVBRO, was developed in order to be used as basis for the experiments. This study investigates the use of audio-visual stimuli as part of a multi-modal interface to communicate the results retrieved from Internet. The experiments were based on a set of Internet queries performed by an experimental and a control group of users. The experimental group of users performed Internet-based search operations using the AVBRO platform, and the control group using the Google search engine. On overall the users in the experimental group performed better than the ones in the control group. This was particular evident when users had to perform complex search queries with a large number of keywords (e.g. 4 to 5). The results of the experiments demonstrated that the experimental group, aided by the AVBRO platform, provided additional feedback about documents retrieved and helped users to access the desired information by visiting fewer web pages and in effect improved usability of browsing documents. A number of conclusions regarding issues of presentation and combination of different modalities were identified.

Title:
SEGMENTING OF RECORDED LECTURE VIDEOS - The Algorithm VoiceSeg
Author(s):
Stephan Repp and Christoph Meinel
Abstract:
In the past decade, we have witnessed a dramatic increase in the availability of online academic lecture videos. There are technical problems in the use of recorded lectures for learning: the problem of easy access to the content of multimedia lecture videos and the problem of finding the semantically appropriate information very quickly. The first step to a semantical lecture-browser is segmenting of the large videos-corpus into smaller cohesion area. The task of breaking documents into topically coherent subparts is called topic segmentation. In this paper, we present a new segmenting algorithm for recorded lecture videos based on their imperfect transcripts. The recorded lectures are transcripted by an out-of-the-box speech recognition software with a accuracy of approximately 70%-80%. These words and a time stamp for each word are stored in a data-base. These data are the input to our new algorithm. We will show, that the clustering of similar words, generating vectors with the values from the clusters and calculating of the cosine-mass of adjacent vectors leads to a better segmenting result compared to a standard segmenting algorithm.

Title:
DEVELOPMENT OF VOICE-BASED MULTIMODAL USER INTERFACES
Author(s):
Claudia Pinto P. Sena and Celso A. S. Santos
Abstract:
In the last decades, the interface evolution made the visual interfaces popular as standard and the keyboard and mouse as input device most used to the human-computer integration. The integration of voice as an input mode to visual-only interfaces could alleviate many of the known limitations of existing human-computer interaction. One of the major issues that remain is how to integrate voice input into a graphical interface application. In this paper, we introduce a development method of multimodal interfaces combining voice and visual input/output. In order to evaluate the proposed approach, a video application multimodal interface was implemented and analysed.

Title:
A TEMPORAL SYNCHRONIZATION MECHANISM FOR REAL-TIME DISTRIBUTED CONTINUOUS MEDIA
Author(s):
Luis A. Morales Rosales and Saul E. Pomares Hernandez
Abstract:
The preservation of temporal relations for real-time distributed continuos media is a key issue for emerging multimedia applications, such as Tele-Immersion and Tele-Engineering. Although several works try to model and execute distributed continuous media scenarios, they are far from resolving the problem. The present paper proposes a viable solution based on the identification of logical dependencies. Our solution considers two main components. First, it establishes a temporal synchronization model that expresses all possible temporal scenarios for continuous media according to their causal dependency constraints. The second component consists of an innovative synchronization mechanism that accomplishes the reproduction of continuous media according to its temporal specification. We note that the present work does not require previous knowledge of when nor for how long the continuous media of a temporal scenario is executed.

Title:
SUBGROUP FEEDBACK FOR SOURCE-SPECIFIC MULTICAST
Author(s):
Dan Komosny
Abstract:
The recent deployment of IP-based TV and radio distributions requires one-to-many multicast instead of the traditional many-to-many data distribution. These large multimedia sessions usually rely on the real time protocol (RTP) and the real time control protocol (RTCP). Although one-to-many multicast offers the required communication, it does not support the multicast feedback channel for carrying the RTCP control messages. Therefore, unicast feedback channels from session members to the source are used to carry these messages. In this paper, we introduce subgroup feedback scenarios for source-specific multicast, which is built on the one-to-many philosophy. Our extensions are based on the subgroup feedback framework standardized in the IETF. We outline a possible implementation of the subgroup feedback using the receiver summary information (RSI) packet. A theoretical RSI packet rate analysis is also presented in the paper.

Title:
RANDOMISED DYNAMIC TRAITOR TRACING
Author(s):
Jarrod Trevathan and Wayne Read
Abstract:
Dynamic traitor tracing schemes are used to trace the source of piracy in broadcast environments such as cable TV. Dynamic schemes divide content into a series of watermarked segments that are then broadcast. The broadcast provider can adapt the watermarks according to the pirate's response and eventually trace him/her. As dynamic algorithms are deterministic, for a given set of inputs, the tracing algorithm will execute exactly the same way each time. An adversary can use this knowledge to ensure that the tracing algorithm is forced into executing at its worst case bound. In this paper we review dynamic traitor tracing schemes and describe why determinism is a problem. We ammend several existing dynamic tracing algorithms by incorporating randomised decisions. This eliminates any advantage an adversary has in terms of the aforementioned attack, as he/she no longers knows exactly how the tracing algorithm will execute. We provide an efficiency analysis of the ammended algorithms and give some recommendations for reducing overhead.

Title:
A NEW PHOTOGRAPHING APPARATUS FOR HUMAN FACE SKIN RENDERING - Separation of Reflection Components Based on Ellisometry
Author(s):
Haedong Kim and Inho Lee
Abstract:
Many movies have been made with computer graphical effect recently, and computer-generated digital actors have been required more and more frequently. In a movie, the role of a digital actor has been similar to that of the human actor, human-like shape and human-like action, even though in a 3D animation. Skin rendering of a human face is difficult, because the colour and the expression of the face are changed as illumination condition. Generally movie contents makers require high quality texture map source data to render a realistic face skin. But data capture of a human face is difficult because the captured data can be easily changed during the capturing time. So we made a new data capture apparatus using polarization of the light to provide mapping source data for human face. The apparatus captures images of a human face without glossy reflection in very short time. And the images are used to make texture maps, bump maps, and special characteristic maps. The maps are used to render a realistic human face and able to apply to movies, advertisement and game moving pictures.

Title:
A COMPONENT-BASED SOFTWARE ARCHITECTURE FOR REALTIME AUDIO PROCESSING SYSTEMS
Author(s):
Jarmo Hiipakka
Abstract:
This paper describes a new software architecture for audio signal processing. The architecture was specifically designed low-latency, low-delay realtime applications in mind. Additionally, the frequently used paradigm of dividing the functionality into components all sharing the same interface, was adopted. The paper presents a systematic approach into structuring the processing inside the components by dividing the functionality into two groups of functions: realtime and control functions. The implementation options are also outlined with short descriptions of two existing implementations of the architecture. An algorithm example highlighting the benefits of the architecture concludes the paper.

Title:
SPEAKER’S GENDER IDENTIFICATION FOR HUMAN-ROBOT INTERACTION
Author(s):
Kyung-Sook Bae, Keun-Chang Kwak and Soo-Young Chi
Abstract:
This paper is concerned with a text-independent Speaker’s gender Identification (GI) for Human-Robot Interaction (HRI). For this purpose, we perform speaker’s gender recognition based on Gaussian Mixture Model (GMM) and use robot platform called WEVER, which is a Ubiquitous Robotic Companion (URC) intelligent service robot developed at Intelligent Robot Research Division in Electronics and Telecommunication Research Institute (ETRI). Furthermore, we communicate with intelligent service robots through a Korean-based spontaneous speech recognition and text-independent speaker’s gender identification to provide a suitable service such as selection of preferable TV channel or music for the identified speaker’s gender. The experimental results obtained for ETRI speaker database reveal that the approach presented in this paper yields a good identification (94.9%) performance within 3 meter.

Title:
APPLICATION OF DYNAMICALLY RECONFIGURABLE PROCESSORS IN DIGITAL SIGNAL PROCESSING
Author(s):
Hrvoje Mlinaric, Mario Kovac and Josip Knezovic
Abstract:
The paper describes a new approach to processor construction, which combines a general purpose processor and a program reconfigurable device, as well as its implementation in digital signal processing applications. Significant flexibility and adaptability of such a processor is obtained through the possibility of varying the components of the processor architecture. A simple change of architecture enables easy adaptation to various applications in which such processors are used. Furthermore, to achieve even greater functionality, a dynamic adjustment of the processor is enabled, by enabling the change of function of individual processor components without the need to turn the processor off. The functionality change itself is conducted in such a way that it is made in a single clock, which allows for a great flexibility of the processor, increases the functionality and enables simple implementation in various applications. Such processor architecture is broadly used in embedded computer systems for various multimedia, encryption and digital signal applications.

Title:
CONTENT-BASED VISUAL RETRIEVAL ON MULTIPLE FEATURES IN THE IMAGE DATABASES OBTAINED FROM DICOM FILES
Author(s):
Liana Stanescu and Dan Dumitru Burdescu
Abstract:
The paper presents the results of some experiments effectuated on color medical images extracted from the DICOM files provided by the medical tools in the content-based visual query process. The color feature was considered first, and the study implied more quantization methods (HSV color space at 166 colors, RGB color space at 64 colors and CIE-LUV color space at 512 colors) and several methods of computing the dissimilitude between the query and the target images (Euclidian distance, the histogram intersection and the quadratic distance between histograms). The content-based visual query on color texture feature was tested using two important methods of texture detection: the co-occurrence matrices and the Gabor filters. Also, the accurateness of the color set back-projection in detecting color regions representing sick tissue in medical images was studied. The effectuated statistics encourage the use of this algorithm in keeping track of the patient evolution under a certain treatment, with performances both in quality and speed

Title:
PERSONAL SOUND BROWSER - A Collection of Tools to Search, Analyze and Collect Audio Files in a LAN and in the Internet
Author(s):
Sergio Cavaliere and Carmine Colucci
Abstract:
In this paper we present a toolbox aimed to search for audio files on the Internet, in a Local Area Network or in a single computer. Search is finalized both to analyze the collected files and to populate a multimedia archive for further use or analysis. The related tools to interface to a multimedia Data Base and analyze files is also provided. The toolbox is intended to be open in the sense that any user may customize it at will adding proprietary tools and methods. It is freely distributed and also open to contributions. The goal has been achieved building a Matlab Toolbox; this, as is well known, results in an open environment that anybody may customize at will. Research in the field of music and sound browsing analysis and classification is a large and open field in which a large amount of different solutions have been proposed in the literature, even if up to now it seems that the field does not present yet well established tools and methods. Deciding which sound parameters are suited to a kind of search or classification is still an open problem: we are therefore providing an open environment where anybody may customize at will tools and methods, an environment which, as a plus respect to other tools in the literature, starts from the very first stage of the process, searching and browsing directly from the Internet. Our work goes in this direction and proposes an open environment made of open tools for the purpose. The used language allows also, as a further benefit, the advantage of straightforward prototyping of new tools.

Title:
TAG INTERFACE FOR PERVASIVE COMPUTING - Paper Tag Interface using Imae Code
Author(s):
Dong-Chul Kim, Jong-Hoon Seo, Cheolho Cheong and Tack-Don Han
Abstract:
Abstract Recently, computing environments move to Ubiquitous computing age with rapid growth of internet and appearance of various mobile devices. This means computer offers more convenient life with linking between physical object and digital information. With these advance of computing environment to link between various physical objects and digital information, several researches are progress that are about the tag interface which are image code like as barcode, or wireless technology based like as RFID. It leads high expenses that RFID has to buy tags much as wanted and need exclusive hardware to read or write. But image code can be printed to paper and can decode from camera connected to computer, so it is convenient and less maintenance cost. In this paper, we proposed the paper tag interface using image code to access the information more easy and fast in ubiquitous computing environment and develop encoding and decoding algorithm for image code.

Title:
itv Model: An HCI Based Model for the Planning, Development and Evaluation of iTV Applications
Author(s):
Alcina Prata, Nuno Guimarães and Piet Kommers
Abstract:
This document describes a new Model for the Planning, Development and Evaluation of iTV Viewers/Users Interfaces. Explained are the motivations for the development of the above mentioned Model, what is new about it, and what we expect to achieve. Also mentioned in this work, are the models, methodologies, theories, guidelines, heuristics, design patterns and processes that were combined in order to achieve our Model. Some conclusions are presented and also future lines of research are point out.