Current development in the area of pervasive/ubiquitous computing demands
a new infrastructure in the context of individual information supply. Due
to the loosely coupled characteristics of mobile devices in combination with
a highly dynamic set of information producers, the usual way of interacting
on a tightly coupled request/response basis seems no longer adequate. New
technologies based on the publish/subsribe communication paradigm promise to
provide a robust and flexible mechanism for an omnipresent information
supply. Using this technology, consumers subscribe to certain information
to be delivered under certain constraints (e.g. weather information in
San Jose every morning at 8AM). A subscription system receives information
from multiple producers (publishers) and propagates the individually
filtered information to the corresponding subscribers.
PubScribe - What is it all about?
Propagating information from multiple producers to a huge set of potential
consumers is usually performed by notification systems. The subscriber
specifies filtering conditions. The efficient evaluation is then left to the
notification system. Prominent systems in this area are
Elvin, Gryphon,
and Siena.
However, the demand of personalized information delivery is beyond 'simple'
content-based filtering and proactive notification but requires a
combination of data from multiple data sources and transformation
capabilities with regard to higher level operators like projection,
grouping, windowing, etc. Providing such a framework is the overal strategy
of PubScribe.
PubScribe - Sub-Projects
Since the PubScribe framework is addressing multiple issues at the same
time, we split the project into the following parts:
Web-Extraction: PubScribe utilizes XML-structured documents for internal
processing, i.e. combination, filtering and transformation. Since many data
sources are regular web sites the XWeb approach provides a rule-based
interface for web site scraping and transformation into valid XML
documents.
Data Integration: From a user point of view, the combination of multiple
data sources should end up in a fully consistent global state which then
serves as a common basis for individual data propagation according to the
subscription rules. However, in the highly dynamic and flexible world of
information propagation, we have to relax this notion of strict consistency
and provide an at least controlled semi-consistent state.
Generating semi-consistent states is the primary focus of the
SCINTRA project.
Data Propagation: Once a semi-consistent global state of incoming messages
is generated, user subscriptions have to be evaluted in the most efficient
manner. PubScribe is based on an incremental approach, propagating only
delta information from a temporarily stored global state of all newly
published messages. Single delta propagation streams for individual
subscriptions are merged in a second optimization step so that redundant
execution of delta propagation is avoided to a high degree.
Hybrid Data Propagation Model: Within this project, all modeling
perspectives are subsumed. PubScribe is based on a hybrid data model
combining the set-oriented idea from relational databases with a
sequence-oriented approach from message oriented middleware systems.