Project description
Processing and querying streams, XML streams in particular, has recently become a widely recognized area of interest both in research and in industry. In the context of the “XPath Evaluation on XML Streams” Project [BFMO01], a novel evaluation model for querying XML streams based on simple deterministic pushdown transducers is proposed: SPEX (streamed and progressive evaluation of XPath queries against XML streams). These transducers, corresponding to the operators used in the underlying query, can be arranged in a very flexible manner into an evaluation network, specifying the flow of the data from the stream through the operators.
In contrast to traditional query evaluation for databases, where multiple queries against the same data can be evaluated sequentially, for a streamed environment only the simultaneous execution of multiple queries is feasible, as the sequential evaluation requires multiple passes over the stream.
This project aims at extending the basic SPEX engine with support for multiple queries. [F03] presents an overview of techniques for optimizing multiple queries posed against a stream of XML data. Building upon the SPEX query engine [K02, OKB03], the problem how to find a cost-optimal query plan that allows the simultaneous evaluation of multiple queries against the same stream is presented and shown to be not only hard to solve but also hard to approximate, if arbitrary parts, and not only common prefixes as in previous approaches, can be shared among query plans. Several heuristics are investigated and compared, in particular with respect to their complexity. Furthermore, it is shown how to extend the SPEX query engine into MQSPEX (multi-query aware SPEX) to support such query plans for multiple queries. This extension proves to be both natural and efficient. An extensive experimental evaluation shows that sharing arbitrary operators under a realistic cost function results in query plans that have consistently lower cost for reasonable sets of queries than query plans where only common prefixes are considered. In most cases, the relative improvement is higher than 50%. Although the time for generating such query plans is higher than for query plans where only common prefixes are shared, the increase in time is within an acceptable margin.
Further work in this project is focused on improving the ideas presented [F03] and providing an easily accessible prototype of MQSPEX. Possible directions for improvement are described below under future work. In the following, after an extended motivation describing exciting areas of application for the proposed techniques, the important contributions of this project are identified. Striking results of the extensive experimental evaluation given in [F03] are highlighted as well as future directions for reasearch in this project.
Motivation
Processing of data streams or sequences of blocks of data (usually called elements of the stream) has triggered rising interest, both in research [CCCC02, Y03] and in industry [C00, MB02]. Stream processing differs from conventional methods of data processing in main memory or on data bases in the requirement to process data immediately on its arrival in the stream instead of creating an appropriate data structure that is on which the actual processing is performed afterwards. In particular, if the data to be processed changes fast, only small fragments of the data have to be processed repeatedly, the data arrives at a high rate, or the amount of data is too large to be efficiently stored and processed, stream processing provides clear advantages over traditional methods based on in-memory data structures or databases. Indeed, in most applications for stream processing at least one of these characteristics can be observed, including some of the most exciting areas of application for streams:
- With increasing complexity of electronic and information systems, monitoring and analysis of these systems, performed on streams of data provided either by the system itself or by specific sensors information, more and more essential. Recent work in this area includes the analysis of network traffic [SH98, C00, DG01] and monitoring of sensor networks [BGS01, MF02].
- Publish-subscribe systems provide capabilities to selectively disseminate information or publications for a large number of users or subscribers based on their profile or subscription [F96]. These systems even find applications ranging from news distribution networks [RD98] over event notification systems alerting users of certain events [CRW01] to the dissemination of relevant information for different needs in a military engagement [DMS97].
- Another emerging application is the real-time integration of fast “news feeds” from diverse sources, similar to Google News [http://news.google.com] or NewsIsFree [http://www.newsisfree.com/]. In the context of web services, the automatic on-the-fly syndication of streams generated by heterogeneous services is viewed as an emmerging challenge.
- With the advent of MPEG-7 [M02] it is expected that an increasing number of multimedia streams accompanied by detailed meta-data information have to be efficiently filtered, transformed, and routed from sources to consumers.
- Pipeline processing allows several independent processors to be used in a pipeline, where the output stream of one processor is the output stream of the next one, therefore allowing each processor to perform its task as soon as the previous one delivers new data. Pipeline processing is of particular importance where large amounts of data have to be processed by diverse components, e.g. in astronomic data analysis [MPR99, HPP01].
All these areas of applications share an inherent heterogeneity of information sources in respect to the data
delivered, the parameters of the provided service, or simply the administrative responsibility. For example,
monitors on devices from various vendors in network analysis might generate monitoring data with differing
resolution, precision and in varying intervals, the various news agencies used as sources for syndication as
well as dissemination will likely focus on different stories or aspects of the same story. These heterogeneous
information sources require some agreed upon means for information encoding as provided by XML [http://www.w3.org/XML/] dialects for the various areas such as
Summing up, a clear need for the efficient processing of XML streams can be identified in all the above application areas. In particular, all the described applications require some capability to query the incoming stream: Sensor networks have to correlate and monitor incoming data based on specifications of interesting events (e.g., “send an alert, if the value of sensor A is above a certain limit for more than 5 minutes”). Publish-subscribe systems route data based on the subscriptions of their users, essentially filtering the stream of data with large numbers of queries, such as “send all publications containing a certain keyword both in the title and the first paragraph”. Syndication systems correlate information from different sources and often, again based on user profiles, select only potentially interesting data. Finally multimedia streams have to be filtered and routed based on queries over the meta-data, it is even possible to select only parts of a multimedia stream, e.g. “return only those scenes in the movie where a certain speaker uses a certain word”.
As noted above, one of the advantages of stream processing is the progressive delivery of result in a single pass over the stream. One-pass processing is required in particular on large or unbounded streams or if the results have to be delivered continuously even before all data has arrived, as it is common in monitoring and event notification systems. This manner of processing imposes three important challenges to a query engine:
- It is not feasible to look back in a stream (except possibly within a certain small window). Hence, queries containing such backward navigation either have to be rewritten, as proposed in [OMFB02], or disallowed.
- As a stream is always processed sequentially, traditional optimization techniques, such as database indices, that are tailored to minimize the amount of access to data and the result of intermediary results, are not applicable to streams. Here, the optimization objective is to minimize the number of operations per element in the stream rather than the number of elements visited. This shift in the optimization objective is reflected in the notion of a stream index employed in the XML Toolkit [AGOR02] that does not provide efficient access paths to data relevant for a query but rather allows to skip elements not relevant for a query reducing the number of operations for each such element to one.
- If several queries, such as monitoring conditions or subscriber profiles, have to be evaluated against the same stream, it is not feasible to evaluate the queries sequentially as in traditional databases but they have to be processed simultaneously.
Contributions
This project focuses on the third observation: The requirement to execute multiple queries concurrently against the same stream combined with the second observation, that the number of operations per element is crucial for fast processing of streams, there is an evident need for novel methods to optimize multiple queries for a single processing against a stream. Indeed, all proposals [AF00, CGR02, DAFZ02] of query engines for publish-subscribe systems are tailored to process large amounts of queries, but due to their application area restricted to the filtering of small self-contained documents from a stream of such documents . Among the general query engines recently proposed [GMOS03, GS03, PC03, BCGR03, OKB03] only the [GMOS03, GS03] consider the optimization of multiple queries. All these approaches have in common, that the optimization is based on common prefixes in the queries (for an in-depth comparison see [F03]). The approach developed as part of this project improves on these by considering not only prefixes but rather any kind of shared subparts techniques for multi-query optimization on XML streams in several points:
- A framework for multi-query optimization together with a formal representation of (logical) query plans on XML streams is presented.
- Based on this representation, the general problem to derive the (cost-) minimum common super-plan for executing a set of queries is defined precisely and its complexity and approximability properties are investigated.
- Several heuristic algorithms to construct the minimum common super-plan for a set of query plans are given and compared in respect to their complexity.
- A cost model for navigational query languages against XML streams is proposed, based on experience with the SPEX processor.
- The SPEX evaluation model is extended to allow the execution of multiple queries according to a query plan generated by the proposed algorithms.
- An extensive experimental evaluation of the proposed algorithms based on the cost model introduced proves the feasibility of the proposed methods if the queries to be processed are known before processing (in contrast to being updated during processing).
Some of these contributions are highlighted in the following to give an impression of the idea.
Query plans for XML streams
The core objective of traditional query optimization, e.g. for relational databases, is to find for a given query a set of operators together with an order in which these operators are to be applied for evaluating that query. Such a set of operators together with the specification of the data flow is refered to as a query plan. Often, one distinguishes between logical and physical query plans: For relational databases, logical query plans are usually based on the (operators and rules of the) relational algebra, whereas physical query plans entail also operators that are concerned about the actual access to the data, e.g., operators for index access, table scans, various join variants, etc. The by-standing image shows the usual stages of the optimization process in a query optimizer: The query (specified in some supported query language) is parsed and the parse tree is used to generate the initial logical query plan that in turn is optimized according to the rules of the logical algebra. This optimized version of the logical query plan is the basis for selecting the prefered physical plan which in turn is used for the actual query evaluation.
[F03] proposes a logical query algebra for querying XML streams with relation operators similar to the XPath axes: Two elements in the stream are related, e.g., by the child operator, if one is the child of the other. Furthermore, elements can be queried by their properties, such as the label of an element or the text contained in an element.
The following two images show query plans for different but similar queries: In the first figure, all elements from the stream (indicated by the access operator in) are first processed by the child relation operator, that retains only elements that are child of some other element. From these elements only those that have also an “a” label are retained by the next operator, etc. Where the first figure selects all “c” descendants of following-siblings of an “a”, if that “a” is a child of some element and has itself a “b” child with content “v”, the second figure selects all “c” descendants of children of such “a”s.
One of the most important techniques for optimizing queries against streams, in particular XML streams, is the sharing of operators. In particular, if several queries shall be evaluated simultaneously, operator sharing can result in considerable increase in processing time. For the two queries from above, the following two pictures show possible common query plans, where the first one only shares operators in a prefix of the query plan, whereas the second one considers all operators for sharing.
MCSP Problem
The problem how to compute the minimal common query plans for several queries (together with a evaluation strategy for each query represented as a query plan) is formally defined as an NP optimization problem MCSP (minimum common super-plan) in [F03]. Clearly, the difficulty of computing a common query plan depends highly on the kind of graphs that are valid query plans under a certain evaluation model. The cost estimation for a query plan depends on the evaluation model as well. It can be shown, that the general case (where no assumptions about the evaluation model can be made) is NP-hard and NPO PB-complete, i.e., can not be approximated within nε for any ε > 0. It is an open issue, whether for certain (interesting) evaluation models better complexity and approximability results can be obtained.
A large set of heuristics of the approximating the MCSP problem are proposed and compared in [F03]. Two classes of such heuristic algorithms are identified based on differing assumptions about the underlying evaluation model: The first set assumes that cost estimation and validity check for query plans can be extended to partial solutions of the MCSP, the second set assumes that there is some algorithm for transforming one common query plan for multiple queries into another common query plan. Theoretical analysis of the algorithms reveals, that the first assumptions allows for slightly more efficient algorithms than the second one, but might be harder to fullfil.
This theoretical result is uphold in the experimental evaluation (see below).
Multi-query SPEX
For the experimental evaluation, the SPEX evaluation engine has been extended to support the evaluation of multiple queries according to a given common query plan (as computed by one of the heuristics mentioned above).
The SPEX evaluation engine is based on a network of simple transducers each implementing a certain operator from the query plans shown above. These networks resemble very closely the shape of a query plan with a determination network added at the sinks that shadows the structure of the query plan. This determination network is used for “determining”, whether all predicates are fullfilled: A predicate is indicated by the predicate operator [ ] and has the semantics, that each of the sinks after that operator has to yield result at least once for the same element selected by the predicate operator. This test is performed by the determination network as shown in the following figure: Only if both the “c” and the “d” label operator yield result for the same “b” the determination operator ?2 accepts that result. For more details, please refer to [F03].
The extension of the SPEX evaluation engine to multi-query plans proves to be very natural and easy: Consider the multi-query plan shown in the following figure. In this query plan the two queries 1 and 2 share a common path in the middle of the query plan.
In such a query plan, a new operator, referred to as shared path begin operator and depicted as a box labeled S, is introduced at the point where the shared path starts. When a selected element flows through this operator from one of the incoming edges it is annotated with a condition. It is furthermore recorded for which of the incoming edges (or queries), a selected element is encountered, since it is very possible (in this case actually guaranteed) that the element is selected only on one of the incoming edges. Here, an element is selected either by query 1, if it is an “a” element, by query 2, if it is an “b” element, or by none of them.
In the determination network at the end of the query plan, the queries are separated again if necessary and for each query a determination operator is used that determines a condition only, if a selected element has been encountered for that query by the corresponding shared path begin operator when the condition was created.
Interestingly, this extension to the SPEX network proves to be very efficient: The overhead introduced by sharing a path is not larger than the cost of a predicate operator and occurs only at the beginning of the path. The more costly the shared path is (e.g., if it is long or entails expensive operators), the lower is the overhead of sharing per operator.
Experimental results
A prototype implementation of the proposed query plans and algorithms for solving the MCSP in Java [http://java.sun.com] based on a graph library specifically optimized for efficient iteration over the incident edges of a vertex, efficient implementations of the most important graph operations for a query plan are provided. The following tests have all been performed using Sun Hotspot JRE 1.4.1 under Linux 2.2 running on a AMD Athlon with 1,3 GHz and 500 MB of main memory.
Several workloads of 10000 tree-shaped queries each simulating common application scenarios have been used for these tests: Two of the workloads are using queries generated by an automated query generator that are all confirming to a given DTD (in the respect that the queries do not violate any of the constraints specified in the DTD). In the following, only two workloads consisting in queries generated from a rather small DTD allowing only slight variations (COURSES-workload), from a complex, highly recursive DTD (NITF-workload), and a third one with entirely random queries (RANDOM-workload) are considered. The following figures give an idea of the kind of queries contained in one of the workloads.
The following figures show the resulting query plans for the first 2, 5, 10, and 50 queries from the COURSES-workload with the two most interesting heuristics: The first four figures show the result of the tree prefix merger, which consideres only prefixes of tree-shaped queries for sharing, the remaining four depict the outcome of the initial greedy merger considering also non-prefix nodes for sharing using a simple greedy heuristic. Both merger fall into the first class of heuristics as discussed above.
In the following, the tree prefix merger and the initial greedy merger are compared over all three workloads with two additional mergers for reference, the random merger that selects a random but valid merging and the plain merger that constructs a common query plan without sharing any operators. For all workloads and mergers, the cost of the resulting query plan (estimated by some cost function, cf. [F03]) and the time for computing the common query plan are depicted.
On the COURSES-workload, the initial greedy pair merger produces by far the best solution in a reasonable time. Note in particular, that the random merger (as on all workloads) performs far worse not only in cost but also in time. This is due to the fact, that the common query plan generated by the random merger increases much faster in size when more queries are considered than in the case of the initial greedy merger who tries to find possible sharings. Therefore, determining a valid solution becomes increasingly harder for the random merger.
For the NITF-workload, a similar picture is obtained, but due to the higher complexity of the DTD and therefore the increased diversity in queries, the time for both initial greedy and random merger increases considerably. Nevertheless, the initial greedy merger still offers the best trade-off between cost of generated solution and time for generation.
The RANDOM-workload is not so much tailored to a certain application area, but rather showing a theoretical worst-case, since in all practical applications a higher degree of similarity among the queries can reasonably be expected. As expected, neither of the mergers can improve the solution quality much in comparison to the plain merger. The random merger actually constructs a solution that is worse than when no operators are shared (this is possible, as any merging introduces some overhead as discussed above that is reflected by the cost function).
Summing up, the initial greedy merger clearly outperforms the tree prefix merger that is mimicing previous approaches for multi-query optimization on all realistic workloads w.r.t. the quality of the solution produced. Although, the time for generating a solution is slightly higher for the initial greedy merger, the increase is acceptable as the common query plan can be precomputed during query compilation.
Future Work
For the future, several open issues are worth investigating in the context of this project:
-
In [F03] several strategies for finding an approximate solution to the
MCSP are proposed, but only one of them is further investigated. For example, adapting techniques for frequent
sub-structure discovery [IWM00] in biological data to the problem at
hand might prove very beneficial. Furthermore, clustering of the queries, ei- ther in a preprocessing step or
during the construc- tion of the super-plan, can drastically reduce the com- plexity of the problem. Related
work on graph cluster- ing, reviewed in [B00], could provide hints for
such an extension.
Genetic algorithms have shown considerable potential as heuristics for solving several graph theoretical problems [CWH96, M96]. Although finding a crossover operation might not be trivial, it could prove very beneficial to investigate such a heuristic.
Finally, we have not considered combining the proposed heuristics in a reasonable way, e.g., to use certain incremental pair mergers to provide starting points for the local search pair merger. - Aside of improving the heuristics, there is one other important open question: Can we find classes of cost functions that are still interesting but for which the stable minimum common super-plan problem becomes easier to approximate or even easier to solve precisely? Although we do not believe, that there is a class of interesting cost functions where the stable minimum common super-plan problem becomes easier to solve, i.e., not NP-hard, there might be classes, where it is easier to approximate, i.e., one can find a heuristic such that a solution constructed by that heuristic has a performance ratio bounded by nε for some ε > 0.
- Currently, the cost estimation is performed without knowledge of the actual data to be queried against (as this data is not yet known). If the data is however either generated or processed before (e.g., in a pipeline) by some component, that component could be extended to provide statistics on the data for estimating the cost of a query plan based on the expected selectivity of certain queries or sub-queries.
- This concides with another possible extension: So far, it is assumed that all queries are known in advance and that the common query plan is constructed once at the begin of the processing. Adding queries during query evaluation or adapting the query plan based on the data and results considered so far in the processing is not supported but might be considered as a future extension.
References
-
[K02] Kiesling, T.:
Towards a Streamed XPath Evaluation.
Diploma thesis, University of Munich, 2002.
http://www.pms.informatik.uni-muenchen.de/publikationen/index.html#DA:Tobias.Kiesling -
[OKB03] Olteanu, D., Kiesling, T., and Bry, F.:
An Evaluation of Regular Path Expressions with Qualifiers against XML Streams.
In Proc. of Int. Conf. on Data Engineering (ICDE), 2003.
http://www.pms.informatik.uni-muenchen.de/publikationen/index.html#PMS-FB-2002-12 -
[BFMO01] Bry, F., Meuss, H., Furche, T.,
Olteanu, D., Spannagel, S., and Schwald, D.: XPath Evaluation Research
project, University of Munich, Institute of Computer Science, Teaching and Research Unit for Programming
and Modelling Languages, 2001-still running.
http://www.pms.informatik.uni-muenchen.de/forschung/xpath-eval.html - [CCCC02] Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S.: Monitoring streams: A new class of data management applications. In Proc. of the Int. Conf. on Very Large Databases (VLDB), 2002.
- [Y03] Yu, P. S. (Ed.): IEEE Transactions on Knowledge and Data Engineering: special section on online analysis and querying of continuous data streams. Vol. 15. IEEE Computer Society, 2003.
-
[C00] Cisco Systems: Cisco IOS netflow technology data
sheet. 2000.
http://www.cisco.com/warp/public/cc/pd/iosw/prodlit/iosnf_ds.pdf - [MB02] Megginson, D. and Brownell, D.: SAX: The simple API for XML. 2002. http://www.saxproject.org/
- [SH98] Sullivan, M. and Heybey, A.: Tribeca: A system for managing large databases of network traffic. In Proc. of the USENIX Annual Technical Conference, 1998.
- [DG01] Duffield, N. G. and Grossglauser, M.: Trajectory sam- pling for direct traffic observation. IEEE/ACM Transactions on Networking (TON) 9(3), 2001.
- [BGS01] Bonnet, P., Gehrke, J., and Seshadri, P.: Towards sensor database systems. In Proc. of the Int. Conf. on Mobile Data Management (ICMDM), 2001.
- [MF02] Madden, S. and Franklin, M. J. 2002: Fjording the stream: An architecture for queries over streaming sensor data. In Proc. of the Int. Conf. on Data Engineering (ICDE), 2001
- [F96] Franklin, M. J. (Ed.): Special Issue on Data Dissemina- tion. Data Engineering Bulletin, 19(3). IEEE Computer Society, 1996.
- [RD98] Ramakrishnan, S. and Dayal, V.: The pointcast net- work. In Proc. of the ACM SIGMOD International Conference on Management of Data. ACM Press, 1998.
- [CRW01] Carzaniga, A., Rosenblum, D. S., and Wolf, A. L.: Design and evaluation of a wide- area event notification service. ACM Transactions on Com- puter Systems (TOCS) 19(3), 2001.
- [DMS97] Douglass, R., Mork, J., and Suresh, B.: Battlefield awareness and data dissemination (BADD) for the warfighter. In Proc. of the SPIE, 1997.
-
[M02] Martínez, J. M. (Ed.): MPEG-7
overview. Tech. Rep. N4980, ISO/IEC JTC1/SC29/WG11, 2002.
http://mpeg.telecomitalialab.com/standards/mpeg-7/mpeg-7.htm - [MPR99] Mehringer, D. M., Plante, R. L., and Roberts, D. A. (Eds.): Astronomical Data Analysis Software and Systems VIII: Data Pipelines. ASP (Astronomical Soci- ety of the Pacific) Conference Series, vol. 172, 1999.
- [HPP01] Harnden, F. R., Primini, F. A., and Payne, H. E. (Eds.): Astronomical Data Analysis Software and Sys- tems X: Science Data Pipelines. ASP (Astronomical So- ciety of the Pacific) Conference Series, vol. 238, 2001.
-
[B02] Botts, M. (Ed.): Sensor model language (SensorML) for
in-situ and remote sensors spec- ification. Discussion paper 02-026r4, Open GIS Consortium, 2002.
http://www.opengis.org/techno/discussions/02-026r4.pdf -
[BR03] Botts, M. and Reichardt, M.: Sensor web
enablement. White paper, Open GIS Consortium, 2003.
http://www.opengis.org/pressrm/summaries/SensorWebWhPpr030512.doc - [OMFB02] Olteanu, D., Meuss, H., Furche, T., and Bry, F.: XPath: Looking forward. In Proc. of the EDBT Workshop on XML Data Management (XMLDM). Lec- ture Notes on Computer Science (LNCS), vol. 2490, 2002.
- [AGOR02] Avila-Campillo, I., Gupta, A., Onizuka, M., Raven, D., and Suciu, D.: An XML toolkit for scalable XML stream processing. In Proc. of the Workshop on Programming Language Tech- nologies for XML (PLAN-X), 2002. Proc. available at http://www.research.avayalabs.com/user/wadler/planx/planx-eproceed/proceed.html.
-
[DAFZ02] Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and
Fischer, P.: Path sharing and predicate evalua- tion for high-performance XML filtering. Submitted for
publication, 2002.
www.cs.berkeley.edu/~diaoyl/publications/yfilter-public.ps - [CFGR02] Chan, C.-Y., Felber, P., Garofalakis, M., and Rastogi, R: Efficient filtering of XML documents with XPath expressions. The VLDB Journal (Special Issue on XML Data Management), 2002.
- [AF00] Altinel, M. and Franklin, M. J.: Efficient filtering of XML documents for selective dissemination of in- formation. In Proc. of the International Conference on Very Large Databases (VLDB), 2000.
- [GS03] Gupta, A. K. and Suciu, D.: Stream processing of XPath queries with predicates. In Proc. of the ACM SIGMOD International Conference on Management of Data, 2003.
- [PC03] Peng, F. and Chawathe, S. S.: XPath queries on streaming data. In Proc. of the ACM SIGMOD International Conference on Management of Data, 2003.
- [GMOS03] Green, T. J., Miklau, G., Onizuka, M., and Suciu, D.: Processing XML streams with deterministic automata. In Proc. of the International Conference on Database Technology (ICDT), 2003.
- [BCGR03] Barton, C., Charles, P., Goyal, D., Raghavachari, M., Fontoura, M., and Josifovski, V.: Streaming XPath processing with forward and backward axes. In Proc. of the International Conference on Data Engi- neering (ICDE), 2003.
- [IWM00] Inokuchi, A., Washio, T., and Motoda, H.: An apriori-based algorithm for mining frequent sub-structures from graph data. In Proc. of the European Conference on Principles and Practice of Knowledge Discovery and Data Mining (PKDD2000), 2000.
- [CWH96] Cross, A. D. J., Wilson, R. C., and Hancock, E. R.: Genetic search for structural matching. In Computer Vision -- ECCV '96, R. C. B. Buxton, Ed. LNCS 1064, 1996.
- [B00] Bunke, H: Recent developments in graph matching. In Proc. of the International Conference on Pattern Recognition (ICPR), vol. 2, 2000.
- [M96] Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 2nd ed. Springer Verlag, 1996.
