Technical Program



In addition to the regular talks we are glad to announce an invited talk Scalable video transport over wireless networks given by Prof. Dapeng Oliver Wu, University of Florida, USA.

8:15 - 8:50 Registration (located outside of workshop room in Galleria B)

8:50 - 9:00 Opening

9:00 - 10:40 Session 1: Image and speech applications 10:40 - 11:00 Coffee break

11:00 - 12:20 Session 2: Video, multimedia and mobile TV applications 12:20 - 13:40 Lunch (List of restaurants in downtown Tampa) and more information

13:40 - 14:40 Keynote Speech: Scalable video transport over wireless networks by Prof. Dapeng Oliver Wu, University of Florida, USA

14:40 - 15:20 Session 3: Multimedia services 15:20 - 15:40 Coffee break

15:40 - 17:00 Session 4: Face, Visualization and GIS applications 17:00 Closing



Robust 1-D barcode recognition on camera phones and mobile product information display
S.Wachenfeld, S.Terlunen and X.Jiang
Abstract: In this paper we present a robust algorithm for the recognition of 1-D barcodes using camera phones. The recognition algorithm is highly robust regarding the the typical image distortions and was tested on a database of barcode images, which covers typical distortions, such as inhomogeneous illumination, reflections, or blurriness due to camera movement. We present results from experiments with over 1.000 images from this database using a Matlab implementation of our algorithm, as well as experiments on the go, where a Symbian C++ implementation running on a camera phone is used to recognize barcodes in daily life situations. The proposed algorithm shows a close to 100% accuracy in real life situations and yields a very good resolution dependent performance on our database, ranging from 90.5% (640x480) up to 99.2% (2592x1944). The database is freely available for other researchers. Further we shortly present MobilePID, an application for mobile product information display on web-enabled camera phones. MobilePID uses product information services on the internet or locally stored on-device data.

Back to Technical Program



A fragile watermark for JPEG pictures on cell phones equipped with cameras
A.Katayama, R.Kitahara, H.Kawamura and H.Koike
Abstract: With the aim of creating practical applications based on fragile watermarking, we have proposed an algorithm suitable for JPEG pictures. We implement this algorithm on a real camera-equipped cell phone and evaluated its detection capabilities and processing speed.

Back to Technical Program



Sensing expressive lips with a mobile phone
S.Ur Rehman and L.Liu
Abstract: Considering the potential benifits of vibrations in mobile phones, we propose an intuitive method to render emotions for visually impaired persons. A mobile phone is ''synchronized" with emotion information based on human lips dynamics. By holding the mobile phone, the subject will be able to get on-line emotion information of others. Experimental results base on usability evaluation of the system are encouraging. The user studies show also perfect pattern recognition accuracy results for designed vibration patterns.

Back to Technical Program



Error correction via one key operation for mobile phone speech recognition
Z.Zhang, Y.Nakashima and N.Naka
Abstract: We propose a highly practical error correction method for mobile phones in the framework of Distributed Speech Recognition (DSR). The only user operation needed is one key input to specify the start of error words. The ends of error words are detected automatically; the part of speech feature corresponding to the error words is then extracted and re-recognition is performed in the mobile device. We investigate three methods for detecting the end of error words: pause position, interval-based method and confidence score-based method. The proposed method has the following advantages: 1. the user needs to make only a single key input to specify the error so that practicality is high. 2. the corrected result is displayed to the user immediately after the key input so the response is very fast. Our experiment shows that the proposed error correction method reduces speech recognition errors by 63.0%.

Back to Technical Program



Camera phone based tools for the visually impaired
X.Liu, D.Doermann and H.Li
Abstract: In this paper we describe a set of applications that utilize a camera phone to help the visually impaired with daily tasks. Our ``MobileEye'' software suite turns a camera enabled mobile device into a multi-purpose vision tool that helps individuals with visual disabilities. MobileEye consists of four subsystems for different types of visual disabilities. A color channel mapper helps distinguish colors, a software magnifier helps low vision individuals see detail, a pattern recognizer helps the severely impaired recognize objects and a document retriever provides access to printed materials. We apply cutting edge computer vision and image processing technologies on the mobile devices and tackle the challenges of limited computational resources and low image quality. We also consider the usability for the visually impaired so our system requires minimum keyboard operation. We provide a full software solution which runs on Symbian and Windows Mobile handsets. This paper provides a high level overview of the system.

Back to Technical Program



MuZeeker: a domain specific Wikipedia-based search engine
S.Halling, M.Sigurosson, J.Eg Larsen, S.Knudsen and L.Hansen
Abstract: We describe MuZeeker, a search engine with domain knowledge based on Wikipedia. MuZeeker enables the user to refine a search in multiple steps by means of category selection. In the present version we focus on multimedia search related to music and we present two prototype search applications (web-based and mobile). A category based filtering approach enables the user to refine a search through relevance feedback by category selection instead of typing additional text, which was found to be an advantage in the mobile MuZeeker application. We report from a usability evaluation using the think aloud protocol, in which N=10 participants performed tasks using respectively MuZeeker and a customized Google search engine. The experiment gave initial indications that participants were capable of solving tasks slightly better using MuZeeker, while the "inexperienced" MuZeeker users performed slightly slower than experienced Google users. 75% of the participants reported a subjective preference for MuZeeker.

Back to Technical Program



Evaluating the adaptation of multimedia services using a constraints-based approach
J.M.Oliveira and E.M.Carrapatoso
Abstract: This paper presents a proposal to solve the problem of the adaptation of multimedia services in mobile contexts. The paper combines context-awareness techniques with user interface modeling to dynamically adapt telecommunications services to user resources, in terms of terminal and network conditions. The solution is mainly characterized by the approach used for resolving the existing dependencies among user interface variables, which is based on the constraints theory. The experiments and tests carried out with these techniques demonstrate a general improvement of the adaptation of multimedia services in mobile environments, in comparison to systems that do not dynamically integrate the user context information in the adaptation process.

Back to Technical Program



Hybrid Layered video encoding for mobile internet-based multimedia
S.Bhandarkar, S.Chattopadhyay and S.S.Garlapati
Abstract: The increasing deployment of broadband networks and simultaneous proliferation of low-cost video capturing and multimedia-enabled mobile devices has triggered a new wave of mobile multimedia applications on the Internet. However, mobile networked environments are typically resource constrained in terms of the available bandwidth and battery capacity on mobile devices. Computer vision applications that typically entail analysis, transmission, storage and rendering of video data are resource-intensive. Since the available bandwidth in the mobile Internet is constantly changing and the battery life of a mobile video capturing and rendering device decreases with time, it is desirable to have a video representation scheme that adapts dynamically to the available resources. A Hybrid Layered Video (HLV) encoding scheme is proposed, which comprises of content-aware, multi-layer encoding of texture and a generative sketch-based representation of the object outlines. Different combinations of the texture- and sketch-based representations result in distinct video states, each with a characteristic bandwidth and power consumption profile. The proposed HLV encoding scheme is shown to be effective for mobile Internet-based multimedia applications such as background subtraction, face detection, face tracking and face recognition on resource-constrained mobile devices.

Back to Technical Program



Face detection in power constrained distributed environments
G.Tsagkatakis and A.Savakis
Abstract: Image classification algorithms, such as face detection, are primarily concerned with improving accuracy, but they typically assume availability of unlimited resources in terms of power and bandwidth. Recent developments in mobile computing, especially in the fields of smart cameras and wireless video sensor networks, point to the challenge of achieving high classifier performance in resource constrained environments. This paper explores the effects of rate and power on the accuracy of a face detection classifier. In particular, JPEG, JPEG2000, and H.264/MPEG-4 AVC compression schemes are utilized to assess the detection accuracy of a support vector machine classifier in the context of face detection. Results suggest that in power constrained environments a minimum rate threshold can be identified where power consumption is minimized without sacrificing classifier performance. Furthermore, our analysis provides guidelines on how performance is reduced in situations where it is necessary to operate below the minimum rate threshold.

Back to Technical Program



Video personalization and caching for resource constrained environments
S.Chattopadhyay, S.Bhandarkar, Y.Wei and L.Ramaswam
Abstract: Video playback in mobile devices such as PDA’s, laptop PCs, pocket PCs and cell phones is becoming increasingly popular. Since mobile devices are typically constrained by their battery capacity, bandwidth, screen resolution and decoding and rendering capability, various video personalization strategies are used to provide these resource constrained devices with personalized video content that is most relevant to the client’s request. Since mobile clients are typically not within network proximity of the personalizing server, it is often desirable to -Y΄smartly‘ cache portions of the video files in order to reduce latency observed at the client end, and also to offload the data load on the server to local caches. In this paper, we propose a novel video personalization server and cache architecture, which can disseminate personalized video to multiple power constrained clients efficiently. The video personalization server uses automatic video segmentation and video indexing scheme based on semantic video content, and generates personalized videos based on client preference. A novel cache design, together with a novel cache replacement algorithm, specifically suited for caching files for the proposed video personalizing servers, has been proposed. The proposed cache performs considerably better compared to several state-of-art cache replacement policies. Thus, the proposed video-cache architecture is well suited for personalized video dissemination, with low wait time and latency, to power constrained mobile devices such as mobile phones, PDAs and other mobile devices with multimedia capabilities.

Back to Technical Program



Registration and augmentation of partial images of city maps using camera phones
S.Wachenfeld, K.Broelemann, X.Jiang and A.Krueger
Abstract: In this paper, we present a novel approach for the registration and augmentation of images of city maps. Our algorithm enables camera phones to capture images of paper-based city maps and to augment the images with additional information. This can be used to display information of personal interest, like locations of cash machines or WLAN hot spots. Until now, special markers, regular dot grids on the map, or image-based feature descriptors have been used, which require special maps. In this paper we present an algorithm which is based on geometric hashing and which uses the maps topology. This makes our algorithm independent of markers and allows to augment city maps of various kinds. The image registration, which is required for the augmentation step is translation, scale and rotation invariant, map type independent and robust against noise and missing data.

Back to Technical Program



Mobile surveillance: video analytics from and to mobile devices
R.Cucchiara and G.Gualdi
Abstract: Mobile (video) surveillance addresses hardware and software systems for acquisition, analysis, storage and display of video streams and extracted information where acquisition sources and/or final users devices are mobile. This new architecture model calls for new solutions in streaming and processing. For the former aspect low bandwidth channels must be exploited with ad-hoc streaming process with low latency low frame skipping and high image quality; the latter aspect relies on new detection-by-tracking algorithms which cannot exploit the assumption on fixed background. This paper covers these aspects describing some solutions proposed at ImageLab in Modena under the FREE SURF Project.

Back to Technical Program



NatureFace: A cartoon face producer for mobile content service
Y.Liu, Y-Su and Z.Wu
Abstract: The ability to reproduce specific face cartoon is an important multimedia content service. We focus on a low-bit facial expression producer for facial character and its key techniques, such as efficient face modeling, low-bit face cartoon rendering. A novel three-layer face model is proposed for generating a personalized face sketch and cartoon; The NatureFace is implemented, a low-cost digital content producer, which can create face cartoon from a photograph without human interaction.

Back to Technical Program



MPEG-4 3D graphics for mobile phones
I.Arsov, M.Preda and F.Preteux
Abstract: In this paper we present a first implementation of an MPEG-4 .based solution for visualizing 3D graphics data on mobile phone. We analyze the player performances with respect to the decoding time and rendering framerate. More than only visualizing 3D data from MPEG-4 files, we extend the player for visualizing complex content such the one presented in games. The main idea consists in using a remote server for the game logic and to update the local 3D graphics scene by using standard commands. Thus, we demonstrate that by using a standard player (that in the near future can be natively built in the mobile device) it is possible to address in an optimized manner both visualization of 3D data and mobile games.

Back to Technical Program



A recommender handoff framework with DVB-H support on a mobile device
M.Ma, C.Zhu, C.Tang, G.-R.Chang, J.Zhu and Q.An
Abstract: Due to the proliferation of mobile phone market and the emerging of mobile TV standard, there is a need for providing consumers with electronic programming guide (EPG) on a mobile device. To reduce the information load on a mobile device, an EPG recommender framework is designed. The framework provides a flexible architecture in different environments where the recommender core can work stand-alone on a mobile device or switch to a hybrid mode utilizing the computing resource on a home network. In our prototype, a simulated hybrid framework with automatic recommender handoff is built for DVB-H environment.

Back to Technical Program