Author Archives: admin

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

This research aims to realize a videosurveillance system for real-time 3D tracking of multiple people moving over an extended area, as seen from a rotating and zooming camera. The proposed method exploits multi-view image matching techniques to obtain dynamic-calibration of the camera and track many ground targets simultaneously, by slewing the video sensor from target to target and zooming in and out as necessary.

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

The image-to-world relation obtained with dynamic-calibration is further exploited to perform scale inference from focal length value, and to make robust tracking with scale invariant template matching and joint data-association techniques. We achieve an almost constant standard deviation error of less than 0.3 meters in recovering 3D trajectories of multiple moving targets, in an area of 70×15 meters.


This general framework will serve as support for the future development of a sensor resource manager component that schedules camera pan, tilt, and zoom, supports kinematic tracking, multiple target tracks association, scene context modeling, confirmatory identification, and collateral damage avoidance and in general to enhance multiple target tracking in PTZ camera networks.

The MICC at the EUscreen Open Workshop on metadata schemes and content selection policies

The first EUscreen Open Workshop on metadata schemes and content selection policies was held in Mykonos, Greece, on June 23 and 24 and recorded the presence of more than 60 participants. During the two days were scheduled presentations on the development of European projects including the European Film Gateway (EFG), WeKnowIT, NoTube, PrestoPRIME and on the activities of major research groups such as the European Broadcasting Union (EBU), the World Wide Web Consortium (W3C) and the European Data Model Working Group (EDM Group).

EUscreen Open Workshop on metadata schemes and content selection policies

EUscreen Open Workshop on metadata schemes and content selection policies

Marco Bertini, assistant professor at the Media Integration and Communication Center of the University of Florence has reported on the activities of two European projects, Vidivideo and IM3I, that involve our center.

The Vidivideo project aims to develop and integrate components for machine learning, audio event detection and video processing in an audio visual search engine that exploits informations from several sources: automatic and manual annotations of keywords and metadata, audio and visual data, speech and explicit knowledge.

IM3I aims to develop new methods of research and creative visualizations of large amounts of multimedia informations. IM3I provide service-oriented architecture allowing different views on multimedia data and a better interaction and sharing of multimedia content.

Marco has presented a method, used in both projects, to add metadata to multimedia material through automatic annotation of occurrences of concepts.

Within the Vidivideo project classifiers were trained on 1000 concepts and were able to extract metadata from audio and video material. This type of metadata extraction allows a user to search and visualize sequences of video frames exactly where a particular concept was detected. This technique offers high performance, but has some problems of scalability.

The IM3I project is trying to reduce the execution time using a very limited set of concepts and creating less fine-grained metadata. Marco, in his presentation ,shows the current debate existing among archivists in metadata extraction, notably time versus quality.

The MICC at Internet Better Life

Gianpaolo D’Amico and Nicola Torpei, researchers of the Media Integration and Communication Center participate to Internet Better Life, an event dedicated to the web, the social media and the world of digital communications: two days dedicated to reflection and debate on new forms of participation and the effects of the cultural revolution that we are living.

Internet Better Life

Internet Better Life

The event is organized by Fondazione Sistema Toscana, Regione Toscana and intoscana.it in collaboration with Google, Augmendy, Sisifo, Novamont, APT and Florence Promhotels.

The theme of the second edition is “The Internet Better Life”: a debate that will focus on how Internet and the Web 2.0 can help to improve the lives of individuals and to vehicle different and richer knowledge by changing relationships between people and effectively transforming social action with an extended and participatory approach.

Gianpaolo D'Amico and Nicola Torpei

Gianpaolo D'Amico and Nicola Torpei

In the morning of June 29 Gianpaolo D’Amico speaks at the workshop “Internet Better Business” in cooperation with ToscanaIN: an interactive session coordinated by Laura De Benedetto, Suzi Jenkins, Tommaso Olivieri and Alessandro Sordi. The workshop will focus on the analysis of whether and how Internet has improved the lives of employers and employees and on the future trends regarding the web advertising.

In the afternoon Nicola Torpei has his five minutes at Ignite Better Life “five minutes to tell us how Internet has improved their lives or how we could improve ours” and present “The City of Knowledge: ShareDesk. An advanced solution for multimedia enjoyment through natural interaction”.

Articulated human motion tracking with latent spaces and evolutionary search

This talk presents our research on multiview, articulated human motion tracking from its origins within pose estimation for immersive communications, through its evolution to full-body, model-free tracking using evolutionary search, to our current system. In the latter, we capture synchronized sequences of single-person activities (e.g., walking, kicking, punching) in our 10-camera, green-background studio.

Articulated human hotion tracking with latent spaces and evolutionary search

Articulated human hotion tracking with latent spaces and evolutionary search

Instantaneous frames are segmented and silhouettes represented with shape contexts. Silhouette representations, computed for the whole sequence, are converted into a low-dimensional latent space by charting, a dimensionality reduction technique not used before for human motion tracking.

A supervised training phase learns a manifold in latent space for each action (the action model). Generative tracking takes place in the latent space. Pose hypotheses are evaluated without expensive backprojecting to 3-D space, avoiding the costly generation of synthetic silhouettes; instead, a mapping between latent and silhouette space is learnt off-line for each action modelled. Results indicate state-of-the-art performance for the actions tested, at very modest computational costs compared with similar systems.

Current investigations include on-line action recognition and applications to clinical rehabilitation. Key contributors to the research described were Spela Ivekovic, Vijay John, and Craig Robertson.

Optimal face detection and tracking

The project’s goal is to develop a reliable face detector and tracker for indoor video surveillance. The problem that we have been asked to deal with is to provide good quality face images of people entering restricted areas. Those images are going to be used for face recognition, and a feedback will be provided from the face recognition system to state if the person has been recognized or not. The nature of the problem makes it very important to keep tracking the person until he is visible on the image plane, even if he is already been recognized. This is needed to prevent the system from providing repeated, multiple alarms from the same person.

Optimal face detection and tracking

Optimal face detection and tracking

In other words, what we aim to obtain is:

  • a reliable detector that could be used to start the tracker: the detector must be sensitive in order to be able to start the tracker as soon as possible when an intruder enters the supervised environment;
  • an efficient and robust tracker to be able to track the intruder without losing him until he leaves the supervised environment: as stated before, it is important to avoid repeated, multiple alarms to be generated from the same track, both for computational cost reduction and false – positives reduction;
  • a fast and reliable face detector to extract face images from the tracked person: the face detector must be reliable on order to provide ‘good’ face images from the target; what “good” stands for depends on the face recognition system, but usually this means that the image has to be at highest achievable resolution and well focused, and that the face has to be as frontal as possible;
  • a method to assess if the tracker has lost the target or is tracking good (a ‘stop criteria’): it is important to be able to detect situations in which the tracker has lost the target, because in such a situation some special action could be required.

At this time, we use a face detector based on the Viola-Jones algorithm to initialize a particle filter-based tracker that uses an histogram-based appearance model. The particle filter accuracy is greatly improved thanks to strong measures provided by the face detector.

To provide a reasonably small number of face images to the face recognition system, a method to evaluate the quality of the captured images is needed. We keep into account image resolution and symmetry in order to store only those images that give increasing quality for each detected person.

Below are reported a few sample videos with the face sequences grabbed from each of them. The faces are ordered by the system according to their quality (increasing from left to right).

Upon face tracking, it is really easy to build a face obfuscation application, though the requirements it needs may be in slight contrast with that needed for face logging. The following video shows an example:

Particle filter-based visual tracking

The project’s goal is to develop a computationally efficient, robust real-time particle filter-based visual tracker. In particular, we aim to increase the robustness of the tracker when it is used in conjunction with weak (but computationally efficient) appearance model, such as color histograms. To achieve this goal, we have proposed an adaptive parameter estimation method that estimates the statistic parameters of the particle filter on-line, so that it is possible to increase or reduce the uncertainty in the filter depending on a measure of its performances (tracking quality).

Particle filter based visual tracking

Particle filter based visual tracking

The method has proved to be effective in dramatically increasing the robustness of a particle filter-based tracker in situations that are usually critical for visual tracking, such as in presence of occlusions and highly erratic motion.

The data set we used is now available for download, with ground truth data, in order to make it possible for other people to test their tracker on our data set and compare the performance.

It is made of 10 video sequences showing a remote controlled toy car (Ferrari F40) filmed from two different point of view: ground floor or ceiling. The sequences will be provided in mjpeg format, together with text files (one per sequence) containing ground truth data (position and size of the target’s bounding box) for each frame. Below you can see an example of the ground truth provided with our data set (sequence #10):

We have tested the performance of the resulting tracker on the sequences of our data set comparing the segmentation provided by the tracker with the ground truth data. Quantitative measures of this performance are reported in the literature. Below we show a few videos that demonstrate the tracker capabilities.

This is an example of tracking on sequence #9 of the data set:

An example tracking humans outdoor with a PTZ camera. In this video (not in the data set) the camera was steered by the tracker. It is thus an active tracking and it shows that the method can be applied to PTZ cameras, since it does not use any background modeling techinque:

IM3I: immersive multimedia interfaces

The IM3I project addresses the needs of a new generation of media and communication industry that has to confront itself not only with changing technologies, but also with the radical change in media consumption behaviour. IM3I will enable new ways of accessing and presenting media content to users, and new ways for users to interact with services, offering a natural and transparent way to deal with the complexities of interaction, while hiding them from the user.

Daphnis: IM3I multimedia content based retrieval interface

Daphnis: IM3I multimedia content based retrieval interface

With the explosion in the volume of digital content being generated, there is a pressing need for highly customisable interfaces tailored according to both user profiles and specific types of search. IM3I aims to provide the creative media sector with new ways of searching, summarising and visualising large multimedia archives. IM3I will provide a service-oriented architecture that allow multiple viewpoints upon multimedia data that are available in a repository, and provide better ways to interact and share rich media. This paves the way for a multimedia information management platform which is more flexible, adaptable and customisable than current repository software. This in turn enables new opportunities for content owners to exploit their digital assets.

The IM3I project addresses the needs of a new generation of media and communication industry that has to confront itself not only with changing technologies, but also with the radical change in media consumption behaviour.

IM3I will enable new ways of accessing and presenting media content to users, and new ways for users to interact with services, offering a natural and transparent way to deal with the complexities of interaction, while hiding them from the user.

Andromeda demo at ACM Multimedia 2010 International Conference, Florence, Italy, October 25-29, 2010

But most of all, designed according to a SOA paradigm, IM3I will also define an enabling technology capable of integrating into existing networks, which will support organisations and users in developing their content related services.

Project website: http://www.im3i.eu/

Vidivideo: improving accessibility of videos

The VidiVideo project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine. The outcome of the project is an audio-visual search engine, composed of two parts: an automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users.

Andromeda - Vidivideo graph based video browsing

Andromeda - Vidivideo graph based video browsing

Video plays a key role in the news, cultural heritage documentaries and surveillance, and it is a natural form of communication for the Internet and mobile devices. The massive increase in digital audio-visual information poses high demands on advanced storage and search engines for consumers and professional archives.

Video search engines are the product of progress in many technologies: visual and audio analysis, machine learning techniques, as well as visualization and interaction. At present the state-of-the-art systems are able to annotate automatically only a limited set of semantic concepts, and the retrieval is allowed using only a keyword-based approach based on a lexicon.

The VidiVideo project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine.

The outcome of the project is an audio-visual search engine, composed of two parts: a automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users.

The automatic annotation part of the system performs audio and video segmentation, speech recognition, speaker clustering and semantic concept detection.

The VidiVideo system has achieved the highest performance in the most important object and concept recognition international contests (PASCAL VOC and TRECVID).

The interactive part provides two applications: a desktop-based and a web-based search engines. The system permits different query modalities (free text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example) and visualizations, resulting in an advanced tool for retrieval and exploration of video archives for both technical and non-technical users in different application fields. In addition the use of ontologies (instead of simple keywords) permits to exploit semantic relations between concepts through reasoning, extending the user queries.

The off-line annotation part has been implemented in C++ on the Linux platform, and takes advantage of the low-cost processing power provided by GPUs on consumer graphics cards.

The web-based system is based on the Rich Internet Application paradigm, using a client side Flash virtual machine. RIAs can avoid the usual slow and synchronous loop for user interactions. This allows to implement a visual querying mechanism that exhibits a look and feel approaching that of a desktop environment, with the fast response that is expected by users. The search results are in RSS 2.0 XML format, while videos are streamed using the RTMP protocol.

Accurate Evaluation of HER-2 Amplification in FISH Images

Fluorescence in situ hybridization (FISH) is a cytogenetic technique used to detect and localize the presence or absence of specific DNA sequences on chromosomes.  FISH uses fluorescent probes, each tagged with a different fluorophore, that bind to specific parts of the chromosome. Through multi-band fluorescence microscopy the positions where the fluorescent probes bound to the chromosomes can be displayed so as to derive information of clinical relevance based on the presence and position of the fluorescent probes.

Accurate Evaluation of HER-2 Amplification in FISH Images

Accurate Evaluation of HER-2 Amplification in FISH Images

A sample application of this technique targets the measurement of the amplification of the HER-2 gene within the chromosomes, that constitutes a valuable indicator of invasive breast carcinomas. This requires the application to a tumor tissue sample of fluorescent probes that attach themselves to the HER-2 genes in a process called hybridization. These fluorescent probes carry a marker that emit light when the probes bind to the HER-2 genes, and this makes them visible as green spots under a fluorescent microscope. Similarly, a different probe, carrying a marker that makes it visible as a orange spot under a fluorescent microscope, is used to target the centromere 17 (CEP-17). Measuring the ratio of HER-2 over CEP-17 dots within each nucleus and then averaging this ratio for a representative number of cells allows estimation of HER-2 amplification.

In this research we present a system that supports accurate estimation of the ratio of HER-2 over CEP-17 dots in FISH images of breast tissue samples. Compared to previous work, the system incorporates a model to associate with each segmented nucleus a reliability score that estimates the confidence of the measure of the ratio of HER-2 over CEP-17 dots within the nucleus. This enables the computation of values of the ratio using only nuclei with high reliability scores so as to extract a measure of the amplification of HER-2 versus CEP-17 dots that better conforms to the evaluation of the pathologist compared to the ratio averaged over all available nuclei.

Master in Multimedia Content Design

The Master in Multimedia Content Design of the University of Florence was instituted as a postgraduate specialization course in the 1998/1999 academic year in collaboration with RAI Radiotelevisione Italiana and Mediateca Regionale Toscana.  As of 2001/2002 it has become a first level MA programme of the University of Florence characterized by a specific experimental connotation combining technical and scientific skills with those related to the field of Humanities.

Master in Multimedia Content Design

Master in Multimedia Content Design

The programme, characterized by classroom lessons, case studies, experimental laboratory activities and project development, features two distinct specialization tracks:

  • Interactive Environments, focused on planning applications and interactive multimedia environments;
  • Video Post-production, focused on the planning of 3D animations and special effects, and video postproduction.

The aim is to train on the use of the instruments, develop the individual’s awareness and critical approach towards the new media and encourage creativity.

Please visit the Master in Multimedia official website