• Nem Talált Eredményt

Tartalomalapú médiavisszakeresés

N/A
N/A
Protected

Academic year: 2023

Ossza meg "Tartalomalapú médiavisszakeresés"

Copied!
28
0
0

Teljes szövegt

(1)

Tartalomalapú

médiavisszakeresés

Kiss Attila

Információs Rendszerek Tanszék kiss@inf.elte.hu

(2)

Bevezetés

A webre egyre több multimédia tartalom (kép, hang, videó) kerül fel.

A multimédia tartalmak kategorizálása, szöveges leírása

– rengeteg emberi munka lenne,

– a leírások nem biztos, hogy elég pontosak.

A tartalom alapú keresők

– a multimédia adatokat a jellemzők (features) értékei alapján egy többdimenziós térben ábrázolják. Ezek után osztályozási,

mintaillesztési feladatokat oldanak meg.

– Tipikus, hogy egy képhez keresünk hasonlókat. Lehet hangminta, dúdolás alapján is keresni.

Alkalmazások:

– például röntgen képeket összehasonlítva a hasonló képek alapján segít diagnózist alkotni.

(3)

An Image Retrieval Example (Viper)

The query input.

(4)

An Image Retrieval Example

(Viper)

(5)

User feedback.

(6)

Refined results. Better?

(7)

Another query for paintings.

(8)

Painting Search Result

The shortlist returned from the search.

(9)

Tartalomalapú keresés

Hogy adjuk meg a keresési feltételt:

– Szöveges leírással.

– Egy vagy több mintaképet, hangot, klippet adunk meg.

– Rajzolunk egy egyszerű vázlatot, például sötét háttérben egy narancssárga kört, ha naplementés képeket keresünk.

– Kombináljuk fentieket.

Az eredmény formája

– Egy lista, például kicsinyített képekkel.

– Azok a klipprészek, amit kerestünk, például, ahol autós üldözési jelenet látható.

– Az eredményből strukturált weboldalt vagy más dokumentumot állítunk elő.

– A felhasználó finomíthatja, véleményezheti az eredményt, ezzel tovább jaívthatja a kereső algoritmust.

A kereséshez az alapvető feladata médiaadatok összehasonlítása.

(10)

Hogyan értékeljük ki a keresési algoritmus jóságát?

 Pontosság (Precision) és Recall

– Precision = (# releváns elemek) / (# összes visszaadott elem)

– Recall = (# releváns elemek) / ( # reof related items in the dataset)

 The procedure of drawing a Recall-Precision Curve:

– Compute the relevance score for each item in the database.

– Sort the list.

– Assume the sorted list is like r r r n n r r r n n …

and we have total 6 relevant items in the database

(11)

The Recall-Precision Curve

1/6 2/6 3/6 4/6 5/6 1

1 Precision

Recall Short list is like: r r r n n r r r n n …

Q: Why do not we just use a single value instead of a curve?

(12)

The “Best” Recall-Precision Curve

1 Precision

Recall

1/(# of relevant items) (# of relevant items)/

(# of total items)

1

(13)

Image Retrieval Methods

 To find images in a database, we have to compare images quantitatively based on “features”.

 We can compare the images as a whole using features like:

– Color, textures and their spatial layouts.

 We can also segment images into regions and use similar features in object detection.

 In some recent systems, people use salient features such as SIFT (Scale Invariant Transform) like

features, learning and pattern recognition methods.

(14)

Szín-hisztogram módszer

Sok rossz találat is lesz.

(15)

Szín-hisztogram javítása

 Válasszuk szét az előteret és a hátteret.

Foreground Background

(16)

Szín-hisztogram továbbfejlesztése

 Definiáljunk térbeli tartományokat és köztük lévő kapcsolatokat.

Color Blob 2

Color Blob 1

(17)

Keressünk hasonló alakzatokat

 Finding similar shapes is a very useful tool in managing large number of images.

 Chamfer matching is a standard method to compare the similarity of shapes.

 General Hough Transform can also be used to find shapes in images.

(18)

Shape Context

 Shape context is another widely used feature in shape retrieval.

Cij is the distance of shape contexts hi and hj

(19)

Improve Matching Efficiency

 Fast pruning in matching

– Reprehensive shape contexts

– Shapemes

Greg Mori, Serge Belongie, and Jitendra Malik,

Shape Contexts Enable Efficient Retrieval of Similar Shapes, CVPR, 2001

(20)

Example Results

(21)

Current Trends and Challenges

 We now show a more “recent” work

L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to

unsupervised One-Shot learning of Object categories. ICCV 2003.

The goal is to detect whether an object appears in an image.

(22)

SIFT features are used.

The good features are In fact learned from Small set of training images.

(23)

Motor bike Results.

(24)

Competitions about Object Recognition

 http://www.pascal-

network.org/challenges/VOC/voc2007/

(25)

Retrieve Other Multimedia Data

 Audio retrieval

– Find a audio clip in a large database.

 Video retrieval

– Find a specific video clip.

– Find a video short that has specific person or action.

– Browsing video …

(26)

Data Structures in Media Retrieval

 In multimedia data retrieval we often need to find the

“nearest Neighbor” in the database from the exemplar.

 We can abstract each media object as a feature

vector. Our goal is to organize the database so that we can locate the most similar vector as quickly as possible.

 Q: Think of some data structures that help to improve the searching.

(27)

K-d Tree

 A 2D k-d tree

a b c

d e

f

a

b

c d f

(28)

Summary

 Content based multimedia retrieval is still not mature.

Many problems still need to be solved.

 There is no single method that solves all the problems.

 We need better object detection and classification schemes.

 Other related problems like multimedia data mining are also attracting more and more interest.

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

The purpose of the paper also includes clarifying the difference between chairside and labside systems, discuss the features of intraoral impression- taking such as accuracy, the

The purpose of the paper also includes clarifying the difference between chairside and labside systems, discuss the features of intraoral impression- taking such as accuracy, the

Editors of the IBVS decided to use an OpenAIRE grant for the renewal of the journal, and keeping some key features, abandoning certain enhanced functions, and gaining some

Th ere is no reason to directly interpret them as central places with strategic (military) signifi cance, and neither can we jus- tify existence of such features in the Middle

Our approach is different in that we do not up- date our word representations for the different tasks and most importantly that we use successfully the features derived from

To highlight the most important features of the shar- ing economy using a network theory approach, we use the case of a regional ride share company, Oszkár, based in

In this paper the focus is on fusion models in general (giving support for multisensory data processing) and some related automotive applications such as object detection, traffic

Detection of Laughter in Children's Speech Using Spectral and Prosodic Acoustic Features.... 1398 Hrishikesh Rao,