Lehr- und Forschungseinheit für Programmier- und Modellierungssprachen

Diploma Thesis: Well-Ordered XPath Queries

Evaluation of XPath Queries against XML streams

The evaluation of XPath queries against XML streams can require in general memory linear in the size of the input XML stream [SPEX]. This happens in cases where potential answers to queries have to be buffered until the decision regarding their appartenance to the result can be met. However, such a time-limited buffering is not always possible, especially when the XML stream is unbounded or infinite. In such cases, the evaluation of queries could require unbounded (or even infinite) memory. Several approaches are proposed to ameliorate this situation, mainly focusing on enforcing hard memory constraints for the query evaluation, e.g., [SAQL], at the price of retrieving approximate instead of exact answers.

It would be interesting to identify classes of XPath queries, for which there can be always inferred a bound on the required memory for query evaluation. More precisely, such classes of queries should be syntactically characterizable and memory bounds should be inferrable as functions only of the input query, or also of the input stream. In this sense, some classes can be described independent of the structure of the stream, or dependent on the stream schema.

Well-ordered XPath Queries: As a prime target towards describing such classes, one can envisage as a first step the characterization of XPath queries, the evaluation of which does not require at all buffering of potential answers. Such a query property can be detected, e.g., by checking whether all constraints (or predicates) of a potential answer are fulfilled along the stream before encountering that potential answer, i.e., before fully evaluating the head path that leads to the answers. However, a statical analysis of the queries, i.e., without the inspection of the stream and before the query evaluation, could be sufficient in some cases to detect such a query property. As early investigation of the supervisors shows, such a property, called ad-hoc well-orderedness, could be detected statically for some classes of queries. The intuition behind ther property name well-orderedness resides in the informal observation that for a well-ordered query its predicates are always evaluated before before its head path, thus there exists an order of the evaluation of subqueries that ensures the buffer-freeness of the evaluation of the whole query.

Task Description

The thesis should attack and provide solutions to the following problems

  1. identify classes of well-ordered queries independent of data instances and schemas; i.e., identify classes of queries, the evaluation of which on any data instance does not consider potential answers, and thus no buffering;
  2. identify classes of well-ordered queries dependent of schemas expressed as regular tree grammars; i.e.,identify classes of queries, the evaluation of which on any data instance of a given schema does not consider potential answers, and thus no buffering;
  3. given an XPath query, infer the schemas with respect to which the given query is well-ordered; also, given a schema, infer all queries that are well-ordered with respect to that schema;
  4. find, for a given query and schema, the minimal changes to the query/schema that ensures the well-orderedness of the query with respect to the schema;
  5. provide for all the aforementioned problems sound and complete algorithms and their implementation in Java. The algorithms should be integrated in the SPEX XPath query evaluator against XML streams;
  6. Tests should be conducted on the improved SPEX query evaluator in order to analyse experimentally the benefit of the compile-time identification and awareness of the well-orderedness property of XPath queries to be evaluated.

Contact Persons

Francois Bry, Dan Olteanu [firstname.lastname@ifi.lmu.de] and Markus Spannagel [firstname.lastname@stud.ifi.lmu.de].