Simple disjunctive multiplicity expressions (DMEs)

Get Complete Project Material File(s) Now! »

Intractability of unordered regular expres-sions

In this section, we study the reasons of the intractability of UREs w.r.t. the following two fundamental decision problems: membership and containment. In Section 1.3.1 we show that membership is NP-complete even under sig-nificant restrictions on the UREs while in Section 1.3.2 we show that the containment is P2-hard (and in 3-EXPTIME). We notice that the proofs of both results rely on UREs allowing repetitions of the same symbol. Conse-quently, we disallow such repetitions and we show that this restriction does not avoid intractability of the containment (Section 1.3.3). We observe that the proof of this result employs UREs with arbitrary use of disjunction and intervals, and therefore, in Section 1.4 we impose further restrictions and define the disjunctive interval multiplicity expressions (DIMEs), a subclass for which we show that the two problems of interest become tractable.

Membership

In this section, we study the problem of deciding the membership of an unordered word to the language of a URE. First of all, note that this problem can be easily reduced to testing the membership of a vector to the Parikh image of a regular language, known to be NP-complete [KT10], and vice versa. We show that deciding the membership of an unordered word to the language a URE remains NP-complete even under significant restrictions on the class of UREs, a result which does not follow from [KT10]. Theorem 1.3.1 Given an unordered word w and an expression E of the grammar E :: a | E? | pE“||”Eq, deciding whether w P LpEq is NP-complete. Proof To show that this problem is in NP, we point out that a nondeter-ministic Turing machine guesses a permutation of w and checks whether it is accepted by the NFA corresponding to E with the unordered concatenation replaced by standard concatenation. We recall that w has unary representa-tion. Next, we prove the NP-hardness by reduction from SAT1-in-3 i.e., given a 3CNF formula, determine whether there exists a valuation such that each clause has exactly one true literal (and exactly two false literals). The SAT1-in-3 problem is known to be NP-complete [Sch78]. The reduction works as follows. We take a 3CNF formula ’ c1 ^ : : : ^ ck over the variables tx1, : : : , xnu. We take the alphabet td1, : : : , dk, v1, : : : , vnu. Each di corre-sponds to a clause ci (for 1 ⁄ i ⁄ k) and each vj corresponds to a variable xj (for 1 ⁄ j ⁄ n). We construct the unordered word w’ d1 : : : dkv1 : : : vn and the expression E’ X1 || : : : || Xn, where for 1 ⁄ j ⁄ n: Xj pvj || dt1 || : : : || dtl q? || pvj || df1 || : : : || dfm q?, and dt1 , : : : , dtl (with 1 ⁄ t1, : : : , tl ⁄ k) correspond to the clauses that use the literal xj, and df1 , : : : dfm (with 1 ⁄ f1, : : : , fm ⁄ k) correspond to the clauses that use the literal xj. For example, for the formula ’0 px1 _ x2 _ x3q ^ p x1 _ x3 _ x4q, we construct w’0 d1d2v1v2v3v4 and E’0 pv1 || d1q? || pv1 || d2q? || v2? || pv2 || d1q? || pv3 || d1 || d2q? || v3? || v4? || pv4 || d2q?. We claim that ’ P SAT1-in-3 iff w’ P LpE’q. For the only if case, let V : tx1, : : : , xnu Ñ ttrue, falseu be the SAT1-in-3 valuation of ’. We use V to construct the derivation of w’ in LpE’q: for 1 ⁄ j ⁄ n, we take pvj || dt1 || : : : || dtl q from Xj if V pxjq true, and pvj || df1 || : : : || dfm q from Xj otherwise. Since V is a SAT1-in-3 valuation of ’, each di (with 1 ⁄ i ⁄ k) occurs exactly once, hence w’ P LpE’q. For the if case, we assume that w’ P LpE’ q. Since w’pvjq 1, we infer that w’ uses exactly one of the expressions of the form pvj || : : :q?. Moreover, since w’pdiq 1, we infer that the valuation encoded in the derivation of w’ in LpE’ q validates exactly one literal of each clause in ’, and therefore, ’ P SAT1-in-3. Clearly, the described
reduction works in polynomial time.

Complexity of disjunctive interval multi-plicity schemas

In this section, we present the complexity results for DIMSs. First, we show the tractability of schema satisfiability and containment. Then, we provide an algorithm for deciding membership in streaming i.e., that processes an XML document in a single pass and using memory depending on the height of the tree and not on its size. Finally, we point out that the complexity of query satisfiability, implication, and containment in the presence of the schema follow from existing results. First, we show the tractability of schema satisfiability and schema con-tainment. Proposition 1.6.1 SATDIMS and CNTDIMS are in PTIME. Proof A simple algorithm based on dynamic programming can decide the satisfiability of a DIMS. More precisely, given a schema S prootS, RS q, one has to determine for every symbol a of the alphabet whether there exists a (finite) tree t that satisfies S1 pa, RSq. Then, the schema S is satisfiable if there exist such a tree for the root label rootS. Moreover, testing the containment of two DIMSs reduces to testing, for each symbol in the alphabet, the containment of the associated DIMEs, which is in PTIME (Theorem 1.4.9).
Next, we provide an algorithm for deciding membership in streaming i.e., that processes an XML document in a single pass and uses memory depending on the height of the tree and not on its size. Our notion of streaming has been employed in [SV02] as a relaxation of the constant-memory XML validation against DTDs, which can be performed only for some DTDs [SV02, SS07]. In general, validation against DIMSs cannot be performed with constant memory due to the same observations as in [SV02, SS07] w.r.t. the use of recursion in the schema. Hence, we have chosen our notion of streaming to be able to have an algorithm that works for the entire class of DIMSs. We assume that the input tree is given in XML format, with arbitrary ordering of sibling nodes. Moreover, the proposed algorithm has earliest rejection i.e., if the given tree does not satisfy the given schema, the algorithm outputs the result as early as possible. For a tree t, heightptq is the height of t defined in the usual way. We employ the standard RAM model and assume that subsequent natural numbers are used as labels in .
Proposition 1.6.2 MEMBDIMS is in PTIME. There exists an earliest re-jection streaming algorithm that checks membership of a tree t in a DIMS S in time Op|t| | |2q and using space Opheightptq | |2q. Proof We propose Algorithm 1 for deciding the membership of a tree t to the language of a DIMS S. The input tree t is given in XML format, with some arbitrary ordering of sibling nodes. We assume a well-formed stream rt topen, closeu representing a tree t and a procedure readprtq that returns the next pair p , bq in the stream, where P topen, closeu and b P . The algorithm works for every arbitrary ordering of sibling nodes. To validate a tree t against a DIMS S prootS, RSq, one has to run Algorithm 1 after reading the opening tag of the root.
For a given node, the algorithm constructs the compact representation of the characterizing tuple of its label (line 1), which requires space Op| |2q (cf. Lemma 1.4.8). The algorithm also stores for a given node the number of occurrences of each label in among its children. This is done using the array count, which requires space Op q. Initially, all values in the array count are set at 0 (lines 2-3) and they are updated after reading the open tag of the children (lines 4-6). During the execution, the algorithm maintains a stack whose height is the depth of the currently visited node. Naturally, the bound on space required is Opheightptq | |2q.
The algorithm has earliest rejection since it rejects a tree as early as pos-sible. More precisely, this can be done after reading the opening tag for nodes that violate the maximum value for the allowed cardinality for their label (lines 7-8) or violate some conflicting pair of siblings (lines 9-10). If it is not the case, the algorithm recursively validates the corresponding subtree (lines 11-12). After reading all children of the current node, the algorithm checks whether the components of the characterizing tuple are satisfied: the extended cardinality map (lines 14-15), the collections of required symbols (lines 16-17), and the counting dependencies (lines 18-19). Notice that since we have checked the conflicting pairs of siblings after reading each opening tag, we do not need to check them again after reading all children. However, we still need to check the extended cardinality map at this moment to see whether the number of occurrences of each label is in the allowed interval. When we have read the opening tag, we were able to reject only if the maxi-mum value for the allowed number of occurrences has been already violated. As for the collections of required symbols and the counting dependencies, we are able to establish whether they are satisfied or not after reading all children. If none of the constraints imposed by the characterizing tuple is violated, the algorithm returns true (line 20). As we have already shown with Lemma 1.4.1 and Lemma 1.4.5, the compact representation of the character-izing tuple captures precisely the language of a given DIME. Consequently, the algorithm returns true after reading the root node iff the given tree satisfies the given schema.

READ  Application to the determination of the minimum ignition energy of the micrometric aluminum and wheat starch

Complexity of disjunction-free interval mul-tiplicity schemas

Although query satisfiability and query implication in the presence of schema are intractable for DIMSs, we prove that they become tractable for IMSs (Section 1.7.4). We also show a considerably lower complexity for query containment in the presence of schema: coNP-completeness for IMSs instead of EXPTIME-completeness for DIMSs (Section 1.7.4). Additionally, we point out that our results for IMSs allow also to characterize the complexity of query implication and query containment in the presence of disjunction-free DTDs (i.e., restricted DTDs using regular expressions without disjunction operator), which, to the best of our knowledge, have not been previously studied (Section 1.7.5). To prove our results, we develop a set of tools that we present next: dependency graphs (Section 1.7.1), generalized definition of embedding (Section 1.7.2), family of characteristic graphs (Section 1.7.3).

Table of contents :

I Schema definition and translation 
1 Schema formalisms for unordered XML 
1.1 Context
1.2 Basic notions
1.3 Intractability of unordered regular expressions
1.3.1 Membership
1.3.2 Containment
1.3.3 Disallowing repetitions
1.4 Disjunctive interval multiplicity expressions (DIMEs)
1.4.1 Characterizing tuples
1.4.2 Grammar of DIMEs
1.4.3 Tractability of DIMEs
1.5 Interval multiplicity schemas
1.6 Complexity of disjunctive interval multiplicity schemas
1.7 Complexity of disjunction-free interval multiplicity schemas
1.7.1 Dependency graphs
1.7.2 Generalizing the embedding
1.7.3 Family of characteristic graphs
1.7.4 Complexity results
1.7.5 Extentions to disjunction-free DTDs
1.8 Expressiveness of interval multiplicity schemas
1.9 Related work
2 Relational-to-graph data exchange 
2.1 Context
2.2 Problem setting
2.3 Background
2.3.1 Relational data exchange
2.3.2 Graph data exchange
2.4 Complexity results
2.4.1 Complexity of target egds
2.4.2 Complexity of target tgds
2.5 Towards universal solution
II Learning schemas and queries 
3 Learning schemas for unordered XML 
3.1 Context
3.2 Learning framework
3.3 Intractability of the general case
3.4 Learning from positive examples only
3.4.1 Simple disjunctive multiplicity expressions (DMEs)
3.4.2 Learning DMEs from positive examples
3.4.3 Learning DMSs from positive examples
3.5 Learning from positive and negative examples
3.6 Related work
4 Learning relational join queries 
4.1 Context
4.2 Join queries
4.3 Learning from a set of examples
4.3.1 Learning framework
4.3.2 Consistency checking
4.3.3 Learnability results
4.4 Learning from user interactions
4.4.1 Interactive scenario
4.4.2 Instance-equivalent join predicates
4.4.3 Uninformative and informative tuples
4.5 Strategies
4.5.1 General interactive inference algorithm
4.5.2 Settings where the signatures coincide
4.5.3 Lookahead strategies for general settings
4.6 Experiments
4.6.1 Setup of experiments on TPC-H
4.6.2 Setup of experiments on synthetic data
4.6.3 Discussion
4.7 Related work
5 Learning path queries on graph databases 
5.1 Context
5.2 Graph databases and queries
5.3 Learning from a set of examples
5.3.1 Learning framework
5.3.2 Learning algorithm
5.3.3 Learnability results
5.3.4 Extensions to binary and n-ary semantics
5.4 Learning from user interactions
5.4.1 Interactive scenario
5.4.2 Informative nodes and practical strategies
5.5 Experiments
5.5.1 Datasets
5.5.2 Static experiments
5.5.3 Interactive experiments
5.6 Related work
Conclusions

GET THE COMPLETE PROJECT

Related Posts