Information systems dealing with multimedia data (i.e. image, text, video, audio data, etc.) provided by heterogeneous and distributed databases require effective methods for retrieving data items being relevant to a specified information need. Different approaches from the field of information retrieval yield an efficient starting-point for research on this topic, although further investigation is indispensable in order to take the distributed aspect of networking information systems into consideration.
Moreover, distributed information systems should enable the selection of a desired quality of service by utilizing the improved bandwidths of the underlying communication facilities. Current information systems do not cover this requirement in a satisfying manner. Hence, it will be necessary to integrate storage systems or DBMS into such an information system which can manage multimedia data adequately.
The goal of the Dsmily project is to develop a scalable information retrieval system (Search Engines) for physically distributed multimedia data. To achieve this goal two priorities are set up:
Development of a model for distributed information retrieval:
The probabilistic model for distributed IR stems from
the Probability Ranking Principle:
Having computed individual document rankings
correlated to different subcollections, these
local rankings are stepwise merged into a
final ranking list where the
documents are ordered
according to their probability of relevance.
The documents
(or document passages, respectively, if
the documents are multimedia documents)
of different subcollections
are assumed to be indexed using different indexing vocabularies.
Moreover,
local rankings
may be computed by individual probabilistic
retrieval methods.
This way, the integration of documents of arbitrary type is supported.
The underlying data volume is
arbitrarily scalable.
A criterion for effectively
limiting the ranking process to
a subset of
subcollections
while taking cost factors into consideration
extends the model.
Currently, the model is evaluated using disk 2 from the TREC document
collection.
Specification and implementation of a prototype for a
distributed information retrieval system:
The system consists of
several components which communicate through CORBA. It is intended
to include some of the concepts and moduls from the non-scalable
SPIDER
information retrieval system that is under development at the Swiss
Federal Institute of Technology (ETH) Zurich. Furthermore, a
relational database system and the media object server
KANGAROO
currently developed at TU Dresden will be integrated into
the prototype.
All research within the Dsmily project is done in close cooperation with the Graduiertenkolleg.
| Created on 15.05.1997 by Christoph Baumgarten. |
| Last modified on 02.09.2003 by Maciej Suchomski. |