Automata and Logic (CSC2428) Typechecking for XML Transformations

ial size=-1 color=blue>
« back to results for ""
Below is a cache of http://www.cs.toronto.edu/~libkin/csc2538/alvinchin.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive. Yahoo! is not affiliated with the authors of this page or responsible for its content.
Automata and Logic (CSC2428) Typechecking for XML Transformations Automata and Logic (CSC2428)
Typechecking for XML Transformations
Alvin Chin
achin@cs.toronto.edu
November 25, 2005 1
Introduction
XML pervades the web, and XML is the language for data exchange among disparate data sources
like a relational database, an XML database, and a spatial database. A data provider or party creates
a common XML schema and agrees to produce only XML data conforming to that schema. Since
dierent parties have their own schemas, the challenge is how to communicate among dierent parties
and exchange data. Transformations need to be done to convert from one XML document to another
according to the associated schema. People perform queries on the web to get responses to questions.
As illustrated by Figure 1 , the query response is an XML document which is a result from the many
XML transformations and aggregation of data pulled together to satisfy that query.
Figure 1: Performing a query on the web and how XML is used to return the query response [2]
2
The Typechecking Problem
The problem that arises from querying and performing transformations for XML data exchange on
the web is called the typechecking problem and can be illustrated in Figure 2. The main reference from
which this paper is based on is from Milo, Suciu and Vianu called Typechecking for XML Transformers
from the PODS 2000 conference [3].
The question is this: Does every XML document that comes from an XML transformation T , and
an XML schema S
in
(input schema) satisfy the output schema S
out
? For example, a mobile device
wants to view information from Google Base but Google Base has its own XML schema (S
out
) and
the mobile devices browser is in WML (Wireless Markup Language) (Sin).
2.1
Types of typechecking
In order to solve the typechecking problem, we need to look at the dierent types of typechecking that
exist. There are two types of typechecking: dynamic typechecking and static typechecking. Dynamic
1 Figure 2: The typechecking problem [2]
typechecking involves validating each XML document at run time. The drawbacks of dynamic type-
checking are that it is resource-intensive and the XML document has to be parsed in real-time, which
makes it slow. Static typechecking, on the other hand, validates each XML document at compile time.
The focus of this paper is on static typechecking.
2.2
Addressing the typechecking problem
2.2.1
Issues
There are basically two issues in addressing the typechecking problem. First, there is no generally
accepted and universally agreed upon XML transformation language. The authors [3] address this
problem by dening an abstract model called a k-pebble tree transducer which allows all transforma-
tions in XML query languages to be expressed without joins on data values. Second, regular tree
languages can be used to formalize schemas without having to consider the type of schema (like DTD
or XML Schema). In fact, it is known that a DTD forms a regular tree language.
2.2.2
Previous work
Several researchers have addressed the typechecking problem, notably Alon et al. who conclude that
typechecking becomes undecidable when there are data or attribute values [1]. From the authors, they
conclude that typechecking is decidable for a large fragment of tree transformations using structural
properties [3].
2.3
Formal denition of the typechecking problem
We can provide a formal denition of the typechecking problem. Given an input tree language
in
, an
output tree language
out
, and XML transformation T , verify that:
t
in
=
T (t)
out
3
Background to Solving the Typechecking Problem
The solution for solving the typechecking problem involves modelling the XML schemas as regular
trees and XML transformations [5] as k-pebble tree transducers. Before showing the solution, since we
know regular trees, we need to know what a k-pebble tree transducer is.
2 3.1
k-Pebble Tree Transducer
A k-pebble tree transducer uses k pebbles to mark certain nodes in the tree. The pebbles are ordered
and numbered from 1,2,,k. Pebbles are placed in order, and removed in reverse order like pushing and
popping from a stack. Only the highest-numbered pebble on the tree can be moved.
3.1.1
How a
k-pebble tree transducer works
Figure 3 illustrates the k -pebble tree transducer in action on a tree with k =5.
3
2
1
Current pebble, i = 3
Figure 3: Example of k-pebble tree transducer with k=5
First, pebble 1 is placed on the root. The next pebble is placed onto the root which pushes the
other pebbles down the tree. As can be seen from the above gure, 3 pebbles have been placed on
the tree. We can only move the highest-numbered pebble (in this case, it is pebble 3) by examining
the current state, the symbol under the current pebble, and the presence or absence of the other
i-1 pebbles where the current pebble being moved is i. Transitions for pebble i are either move and
output. Valid moves are down-left, down-right, up-left, up-right, stay, place-new-pebble and pick-
current-pebble. The output node is returned with a symbol from the output alphabet.
The output from the k -pebble tree transducer is as follows. First, we start with a single computa-
tion branch and no output nodes. Then, the computations result in some top fragment of the output
tree. The remaining output subtrees are then computed. The outputs a
can be binary (a
2
)
in which there are two computation branches that compute the left and right child, or they can be
nulary (a

0
) in which only the leaf is output and the branch of computation halts. a is the output
symbol which is from the output alphabet ,
2
indicates the output alphabet for the left and right
child, and
0
indicates the output alphabet for the leaf.
3.1.2
Formal denition of a
k-pebble tree transducer
A k -pebble tree transducer is
T = (, , Q, q
0
, P ) where :
1. , are the ranked input and output alphabets
2. Q is a nite set of states and is partitioned into: Q = Q
1
...
Q
k
3. q
0
Q
1
is the initial state
3 4. P is a nite set of transitions:
(a, b, q
(i)
1
)
(q
(i)
2
, stay)
(a, b, q
(i)
1
)
(q
(i)
2
, down
left)
(a, b, q
(i)
1
)
(q
(i)
2
, down
right)
(a, b, q
(i)
1
)
(q
(i)
2
, up
left)
(a, b, q
(i)
1
)
(q
(i)
2
, up
right)
(a, b, q
(i)
1
)
(q
(i+1)
2
, place
new pebble)
(a, b, q
(i)
1
)
(q
(i1)
2
, pick
current pebble)
(a, b, q
(i)
1
)
(a
0
, output0)
(a, b, q
(i)
1
)
(a
2
(q
(i)
1
, q
(i)
2
), output2)
b
{0, 1}
i1
, f or i = 1, b =
What this means is that P has a set of transitions in which the move operations for current pebble
i are stay, down-left, down-right, up-left, and up-right given an input symbol a, the presence/absence
of the previous i-1 pebbles denoted by b
{0, 1}
i1
(obviously if i = 1, then there is no b), then the
state transitions from q
1
to q
2
. If the move is to place a new pebble, then pebble i+1 is placed and
the state moves from q
1
to q
2
. If the current pebble is picked, then pebble i is in state q
1
, and the
previous pebble i-1 enters the q
2
state. For output transitions, if the output is nulary, then the leaf is
output a
0
, otherwise if it is binary output, then left child enters q
1
, the right child enters q
2
and the
output is the left and right child along with the output symbol a
2
.
3.1.3
Example of a
k-pebble transducer (1-pebble)
This example copies the input tree to the output tree and is obtained from Example 3.2 of [3].
T = (, , q, q
1
, q
2
, q
0
, q, P ) where :
1. , are the ranked input and output alphabets
2. P is the following:
(a
2
, q)
(a
2
(q
1
, q
2
), output2) for all a
2

2
(a
2
, q
1
)
(q, down left)
(a
2
, q
2
)
(q, down right)
(a
0
, q
1
)
(a
0
, output0) for all a
0

0
As can be seen, every input symbol is then output. output0 is a leaf, while output2 is the left and
right child. Note that there is no b because k = 1, and b =
so it is omitted.
There are other examples of k -pebble transducers like in Example 3.3 and 3.4 of [3].
3.1.4
Complexity of
k-pebble transducer
Let T be a xed k -pebble transducer. Then for each input tree t,
1. the set T (t) is a regular tree language
2. one can construct in PTIME (in the size of t) a regular tree automaton
A
t
that accepts the
language T (t)
4 3.1.5
Restating the typechecking problem - formal denition
Given an input tree type
in
, an output tree type
out
, and a k -pebble tree transducer T , verify:
t
in
=
T (t)
out
4
Solution to the Typechecking Problem
Now that we have the background information on k -pebble tree transducers, we can solve the type-
checking problem. In solving the typechecking problem, the authors examine a closely related problem
which is type inference. The idea is that if type inference can be solved, then that immediately leads
to solving the typechecking problem