Author Archives: admin

Alberto Del Bimbo

Alberto Del Bimbo is Full Professor of Computer Engineering at the Università di Firenze, Italy. Since 1998 he is the Director of the Master in Multimedia of the Università di Firenze. His scientific interests are pattern recognition, image databases, multimedia and human computer interaction. Prof. Del Bimbo is the author of over 250 publications in the most distinguished international journals and conference proceedings.

Alberto Del Bimbo

Alberto Del Bimbo

The harmony potential: fusing local and global information for semantic image segmentation

Semantic image segmentation is the process of assigning semantically relevant labels to all pixels in an image. Hierarchical Conditional Random Fields (HCRFs) are a popular and successful approach this problem. One reason for their popularity is their ability to incorporate contextual information at different scales. However, existing HCRF models do not allow multiple labels to be assigned to individual nodes. At higher scales in the image, this results in an oversimplified model, since multiple classes can be reasonable expected to appear within a single region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Furthermore, neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales.

The harmony potential: fusing local and global information for semantic image segmentation

The harmony potential: fusing local and global information for semantic image segmentation

In this talk I will discuss a new potential function, the harmony potential, for defining HCRF models of semantic image segmentation. The harmony potential can encode any possible combination of class labels at the global level, enabling it to make better informed, fine discriminations at the low levels. This representational capacity of the harmony potential is also its primary weakness as the optimization over all possible labels quickly becomes intractable for more than a few classes. To address this, we show how the harmony potential model admits an effective sampling strategy that renders tractable the underlying optimization problem. Results show that our approach obtains state-of-the-art results on two challenging datasets: Pascal VOC 2009 and MSRC-21. The approach described in this talk additionally won six gold medals in the Pascal VOC 2009 Segmentation Challenge.

Mobile Robot Path Tracking with uncalibrated cameras

The aim of this transfer project is the motion control problem of a wheeled mobile robot (WMR) as observed from uncalibrated ceiling cameras. We develop a method that localizes the robot in real-time and smartly drives it over a path in a large environment with a pure pursuit controller, achieving less then 5 pixel on cross track error. Experiments are reported for Ambrogio, a two-wheel differentially-driven mobile robot provided by  Zucchetti Centro Sistemi.

Wheeled Mobile Robot path follower in uncalibrated multiple camera environment

Wheeled Mobile Robot path follower in uncalibrated multiple camera environment

The video below shows the improvements in the motion control of a wheeled mobile robot (WMR) with a controller that uses an osculating circle:

Gianpaolo D’Amico

Gianpaolo D’Amico completed his PhD with a thesis about multimedia distributed database in 2004. He is currently research fellow at the Visual Information and Media Lab at MICC and member of the board of operations at Master in Multimedia Content Design. His research work is in the field of user experience, education, sound design and digital media. He is also the cofounder of the blog sounDesign.

Gianpaolo D'Amico

Gianpaolo D’Amico

Maxime Devanne

Maxime Devanne, visiting researcher at MICC

Maxime Devanne, visiting researcher at MICC

Maxime Devanne received his Engineering degree from Telecom Lille 1 on october 2012 with a thesis on “3D Human Body Acquisition and Modeling from Microsoft Kinect cameras”. He is currently PhD student in collaboration between the MIIRE research group of University of Lille 1, France, and the Media Integration and Communication Center. His research interests are mainly focused on 3D videos, elastic shapes, human body motions, and their applications in computer vision, like activity recognition.

We organize ACM Multimedia 2010

The Media Integration and Communication Center organizes ACM Multimedia 2010 International Conference.

ACM Multimedia 2010 International Conference

ACM Multimedia 2010 International Conference

ACM Multimedia 2010 is the worldwide premier multimedia conference and a key event to display scientific achievements and innovative industrial products. The Conference offers to scientists and practitioners in the area of Multimedia plenary scientific and technical sessions, tutorials, panels and discussion meetings on relevant and challenging questions for the next years horizon of multimedia. The Interactive Art program ACM Multimedia 2010 will provide the opportunity of interaction between artists and computer scientists and investigation on the application of multimedia technologies to art and cultural heritage.

Svebor Karaman

Svebor Karaman received a master’s degree in computer engineering from the University of Bordeaux and a engineer diploma from the ENSEIRB in 2008. He has obtained a Ph.D. in Computer Science from the University of Bordeaux in 2011, with his thesis entitled “Indexing of Activities in Wearable Videos : Application to Epidemiological Studies of Aged Dementia”. He has joined the MICC – Media Integration and Communication Center, at the beginning of 2012 as a postdoctoral researcher.

Svebor Karaman

Svebor Karaman

His research interests focus on computer vision, semantic concepts recognition in images and videos, multimedia information retrieval and problematics related to the use wearable videos. During his PhD thesis, he has worked on human activities recognition by Hidden Markov Models (HMM) in videos recorded from a wearable device. He also proposed an object recognition approach in the Bag-of-Visual-Words framework which integrates spatial information within semi-local features: the Graph-Words.

At the MICC, he is highly involved in the Mnemosyne project

Automatic trademark detection and recognition in sports videos

The availability of measures of appearance of trademarks and logos in a video is important in fields of marketing and sponsoring. These statistics can, in fact, be used by the sponsors to estimate the number TV viewers that noticed them and then evaluate the effects of the sponsorship. The goal of this project is to create a semi-automatic system for detection, tracking and recognition of pre-defined brands and trademarks in broadcast television. The number of appearances of a logo, its position, size and duration will be recorded to derive indexes and statistics that can be used for marketing analysis.

Automatic trademark detection and recognition in sports videos

Automatic trademark detection and recognition in sports videos

To obtain a technique that is sufficiently robust to partial occlusions and deformations, we use local neighborhood descriptors of salient points (SIFT features) as a compact representation of the important aspects and local texture in trademarks. By combining the results of local point-based matching we are able to detect and recognize entire trademarks. The determination of whether a video frame contains a reference trademark is made by thresholding the normalized-match score (the ratio of SIFT points of the trademark that have been matched to the frame). Finally, we compute a robust estimate of the point cloud in order to localize the trademark and to approximate its area.

Video event classification using bag-of-words and string kernels

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However it does not model the temporal information of the video stream. We are working at a novel method  to introduce temporal information within the BoW approach by modeling a video clip as a sequence of histograms of visual features, computed from each frame using the traditional BoW model.

Video event classification using bag-of-words and string kernels

Video event classification using bag-of-words and string kernels

The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel (e.g using the Needlemann-Wunsch edit distance). Experimental results, performed on two domains, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.

Image forensics using SIFT features

In many application scenarios digital images play a basic role and often it is important to assess if their content is realistic or has been manipulated to mislead watcher’s opinion. Image forensics tools provide answers to similar questions. We are working on a novel method that focuses in particular on the problem of detecting if a feigned image has been created by cloning an area of the image onto another zone to make a duplication or to cancel something awkward.

Image forensics using SIFT features

Image forensics using SIFT features

The proposed approach is based on SIFT features and allows both to understand if a copy-move attack has occurred and which are the image points involved, and, furthermore, to recover which has been the geometric transformation happened to perform cloning, by computing the transformation parameters. In fact when a copy-move attack takes place, usually an affine transformation is applied to the image patch selected to fit in a specified position according to that context. Our experimental results confirm that the technique is able to precisely individuate the attack and the transformation parameter estimation is highly reliable.