This document briefly describes active research projects at the unit for programming and modelling languages. Also available is a list of publications and a list of open topics for project/diploma theses (for students of the University of Munich).

Xcerpt [http://www.xcerpt.org] is a declarative, rule-based query and transformation language for the Web and the Semantic Web, inspired by logic programming. Instead of the path-based navigational approach taken by languages like XSLT [http://www.w3.org/Style/XSL/] and XQuery [http://www.w3.org/XML/Query], Xcerpt uses pattern-based, positional queries, where a pattern is an "example" of the database containing variables for binding content.

Positional Selection

Widespread query languages for XML, like XQuery and XSLT, are based on a navigational selection, in which the user needs to specify path navigations through the data tree. In our opinion, this approach is rather well suited for simple queries but tends to be very complicated for more complex queries. In contrast, Xcerpt is based on a positional selection in which a user specifies an incomplete pattern for the data tree in which the parts that are to be selected are represented by variables. It is our conviction that such a pattern based selection is more suitable to querying XML data.

Rule-Based Querying

Xcerpt programs in general consist of several rules. Each rule specifies a transformation of a data tree (or graph) into a possibly different data tree (graph). Like in logic programming, a rule may query the result of another rule ("rule chaining" or "inference").

Access to all layers of the Semantic Web Architecture/Tower

Xcerpt gives rise to posing queries to all layers of the Semantic Web Architecture/Tower. In particular one can, using Xcerpt, query standard Web data formatted in HTML or in any language defined as XML application, RDF or OWL data. Within a same Xcerpt query program one can access and combine HTML, XML, RDF, and OWL data. The answer to an Xcerpt query program can be expressed in HTML, XML, RDF or OWL.

more ...

XChange [http://www.xcerpt.org/XChange] is a high-level language for programming reactive behaviour on the Web. XChange builds upon the query language Xcerpt and provides advanced, Web-specific capabilities, such as propagation of changes on the Web (change) and event-based communications between Web sites (exchange).

Rule-Based Language

An XChange program consists of so-called ECA (Event-Condition-Action) rules and Xcerpt rules. By means of ECA rules one can specify how to react to particular events that occur on the Web.

Pattern-Based Update and Event Specification

With XChange one can specify updates that are to be executed on Web resources' data, events that are to be sent to one or more (XChange-aware) Web sites, and, in order to detect situations of interest, queries to incoming events. An XChange specification is a pattern for the data to be updated, to be sent, or to be queried.

Complex Event Queries

Through XChangeEQ (see below), XChange has the capability to detect and react to complex events, i.e. (possibly time-related) combinations of events that occur on the Web.

more ...

Complex Event Processing

The emergence of event-driven architectures, automation of business processes, drastic cost-reductions in sensor technology, and a growing need to monitor IT systems (as well as other systems) due to legal, contractual, or operational considerations lead to an increasing generation of events . This development is accompanied by a growing demand for managing and processing events in an automated and systematic way. Complex Event Processing (CEP) encompasses the (automatable) tasks involved in making sense of all events in a system by deriving higher-level knowledge from lower-level events while the events occur, i.e., in a timely, online fashion and permanently.

Event Queries

At the core of CEP are queries which monitor streams of "simple" events for so-called complex events, that is, events or situations that manifest themselves in certain combinations of several events occurring (or not occurring) over time and that cannot be detected from looking only at single events. Querying events is fundamentally different from traditional querying and reasoning with database or Web data, since event queries are standing queries that are evaluated over time against incoming streams of event data. In order to express complex events that are of interest to a particular application or user in a convenient, concise, cost-effective and maintainable manner, special purpose Event Query Languages (EQLs) are needed.

High-Level Language

This project investigates practical and theoretical issues related to querying complex events, covering the spectrum from language design over declarative semantics to operational semantics for incremental query evaluation. Its central topic is the development of the high-level event query language XChangeEQ.

XChangeEQ's language design recognizes the four querying dimensions of data extractions, event composition, temporal relationships, and, for non-monotonic queries involving negation or aggregation, event accumulation. XChangeEQ deals with complex structured data in event messages, thus addressing the need to query events communicated in XML formats over the Web. It supports deductive rules as an abstraction and reasoning mechanism for events. To achieve a full coverage of the four querying dimensions, it builds upon a separation of concerns of the four querying dimensions, which makes it easy-to-use and highly expressive. XChangeEQ builds upon Xcerpt for querying events communicated as XML messages and can be embedded as event query language into the reactive language XChange.

Typing and schema languages based on tree grammars are widely accepted in the XML and Web community. In particular DTD, XML Schema and Relax NG are based on regular tree grammars. The purpose of such formalisms inspired by tree grammars range from documentation, data validation to type checking of query and transformation languages such as XQuery and XSLT.

Although graph structured XML data - built using reference mechanisms such as ID/IDREF - is widespread, common schema languages have no notion of typed references. In other words, formalisms based on tree grammars do not fully convey the graph structures of XML and semistructured data. Graph grammars aim at filling this gap. Regular (rooted) graph grammars are proposed as an extension of regular tree grammars, providing means to explicitly model typed references and dereferenciation in XML and semistructured data.

Regular graph grammars are usable as schema formalism for XML and semistructured data containing references (e.g. the translation of ACE sentences in XML as mentioned earlier in the context of VOXX) and as datatype formalism for typed variants of Xcerpt and XChange.

As XML data becomes a de facto standard on the Web, querying XML data is becoming an issue of great importance, a real research challenge. Several solutions have been proposed in the last years to this challenge, described in various working drafts of W3C or research papers. The navigational query language XPath is a core of this common effort.
Stream-based processing comes as an attempt to overcome pitfalls of the traditional DOM-based processing:

  • XML documents can be too big to be stored in main memory, as in DOM. Such documents are often encountered in natural language processing, biological and astronomical data;
  • unbounded XML streams that are automatically generated (e.g. news) impose new querying approaches to be considered, where the memory usage does really matter;

Progressive processing, i.e. stream-based processing generating partial results as soon as they are available, even before the full input is completely read, gives rise to a more efficient evaluation in certain contexts:

  • Selective dissemination of information(SDI), where documents are filtered before being distributed to the subscribers;
  • Data integration, where the input of slow sources can be processed before the full data is retrieved over the internet;
  • Pipelined processing, where the input is sent through a chain of processors each of which taking the output of the preceding processor as input. A good example of pipelined processing is Cocoon.

The project aims at providing techniques for efficient XML query evaluation on huge XML repositories and on (possibly unbounded) XML streams. The first outcome of this project is a query evaluator called SPEX (Streamed and Progressive Evaluation of queries against XML streams).

more ...

CaTTS is a static typed calendar and time language that allows for declarative modelling of various calendars (e.g. Gregorian calendar, business calendars, holiday calendars, etc.) as types and time tabling problems over calendar and time data defined by some CaTTS calendar using its object constraint language. CaTTS is designed to extend any programming language or (Semantic) Web query language (such as Xcerpt - cf. supra) in terms of a library integrated into its host language.

CaTTS is powerful and expressive enough to define most calendars in use today. The object constraint language of CaTTS enables declarative modelling of time tabling problems which will be solved by means of temporal constraint reasoning.

Reasoning on locations normally operates on a numerical level (e.g. coordinates) or on a symbolic level (e.g. graphs). Extensive research has been conducted in either case, hence there is a broad choice of proven sets of calculi and algorithms to solve the respective tasks. Whenever possible, reasoning should be conducted on higher level data, which is usually less heavy on the device's resources.

The fundamental insight is that many queries pertaining to location information are closely related to the problem of route planning and way-finding, which reduces the problem domain considerably. There are two reasons for this. (1) Whenever a certain location is sought after, chances are, that the inquirer intends to visit the location. Cases like these result in classic route planning tasks. (2) When people refer to the ``distance'' between two locations in the sense of locomotion, nearly never are they talking about distances per se (metres, kilometres) but the time needed to overcome these distances (``a ten minute walk'' or ``a half hour by train''). In fact, in many scenarios the exact distance between two points has a rather subordinate meaning from a traveller's viewpoint - especially in urban environments.

MPLL (Maple) aims for producing a set of reasoning techniques and for providing a framework for Sematic Web applications. This framework will include suitable data types and ontologies for processing and presenting location data.


Last modified: Tue Feb 1 17:32:07 CET 2005 Valid XHTML 1.0! Valid CSS! Any Browser!