Interactive Video Search and Browsing Systems: MediaPick
Daniele will present two interactive systems for video search and browsing; a rich internet application designed to obtain the levels of responsiveness and interactivity typical of a desk- top application, and a system that exploits multi-touch devices to implement a multi-user collaborative application. Both systems use the same ontology-based video search engine, that is capable of expanding user queries through ontology reasoning and let users to search for specific video segments that contain a semantic concept or to browse the content of video collections, when it’s too difficult to express a specific query.
The VidiVideo project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine. The outcome of the project is an audio-visual search engine, composed of two parts: an automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users.
Andromeda - Vidivideo graph based video browsing
Video plays a key role in the news, cultural heritage documentaries and surveillance, and it is a natural form of communication for the Internet and mobile devices. The massive increase in digital audio-visual information poses high demands on advanced storage and search engines for consumers and professional archives.
Video search engines are the product of progress in many technologies: visual and audio analysis, machine learning techniques, as well as visualization and interaction. At present the state-of-the-art systems are able to annotate automatically only a limited set of semantic concepts, and the retrieval is allowed using only a keyword-based approach based on a lexicon.
The VidiVideo project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine.
The outcome of the project is an audio-visual search engine, composed of two parts: a automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users.
The automatic annotation part of the system performs audio and video segmentation, speech recognition, speaker clustering and semantic concept detection.
The VidiVideo system has achieved the highest performance in the most important object and concept recognition international contests (PASCAL VOC and TRECVID).
The interactive part provides two applications: a desktop-based and a web-based search engines. The system permits different query modalities (free text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example) and visualizations, resulting in an advanced tool for retrieval and exploration of video archives for both technical and non-technical users in different application fields. In addition the use of ontologies (instead of simple keywords) permits to exploit semantic relations between concepts through reasoning, extending the user queries.
The off-line annotation part has been implemented in C++ on the Linux platform, and takes advantage of the low-cost processing power provided by GPUs on consumer graphics cards.
The web-based system is based on the Rich Internet Application paradigm, using a client side Flash virtual machine. RIAs can avoid the usual slow and synchronous loop for user interactions. This allows to implement a visual querying mechanism that exhibits a look and feel approaching that of a desktop environment, with the fast response that is expected by users. The search results are in RSS 2.0 XML format, while videos are streamed using the RTMP protocol.
The availability of measures of appearance of trademarks and logos in a video is important in fields of marketing and sponsoring. These statistics can, in fact, be used by the sponsors to estimate the number TV viewers that noticed them and then evaluate the effects of the sponsorship. The goal of this project is to create a semi-automatic system for detection, tracking and recognition of pre-defined brands and trademarks in broadcast television. The number of appearances of a logo, its position, size and duration will be recorded to derive indexes and statistics that can be used for marketing analysis.
Automatic trademark detection and recognition in sports videos
To obtain a technique that is sufficiently robust to partial occlusions and deformations, we use local neighborhood descriptors of salient points (SIFT features) as a compact representation of the important aspects and local texture in trademarks. By combining the results of local point-based matching we are able to detect and recognize entire trademarks. The determination of whether a video frame contains a reference trademark is made by thresholding the normalized-match score (the ratio of SIFT points of the trademark that have been matched to the frame). Finally, we compute a robust estimate of the point cloud in order to localize the trademark and to approximate its area.