An extended Datalog Syntax for Existential Rules and
Transcription
An extended Datalog Syntax for Existential Rules and
An extended Datalog Syntax for Existential Rules and Datalog+/GraphIK Team, LIRMM, Montpellier July 2012 Abstract This document defines a textual format for the existential rule framework, also called Datalog+/-. It is meant to be an exchange format at once human-friendly, concise and easy to parse. This format can be seen as an extension of the commonly used format for plain Datalog. It is called “dlgp” for “Datalog Plus”. A file may contain four kinds of knowledge elements: facts, existential rules, negative constraints and conjunctive queries. For a quick introduction to the framework, please see the document “Brief Overview of the Existential Rule Framework”. 1 Syntax of the dlgp format As usually in Datalog, variables begin with an uppercase letter and constants with a lowercase letter. We distinguish between regular constants (called constants hereafter) and literals, which are values belonging to some datatype. Literals are given as double-quoted strings or numeric values (integers and floats). Predicates are either simple strings (i.e., composed of letters, digits and ), the equality symbol = (with infix notation) or double-quoted strings (in this case, the newline, backslash and doublequote characters that occur in the string are replaced by \n, \\ et \" when necessary). In the following the syntax is specified by a grammar in BNF style: nonterminals are enclosed between the pair <>, terminals are in bold, the | symbol indicates a choice, parts enclosed between the pair ([]) are optional, choice1 ..choicen indicates a choice within an interval. We first define elements used to build tokens, then tokens, then the global grammar. Finally, comments and annotations are added. A file name has the extension .dlgp or .dlp. Character encoding is assumed to be UTF-8. 1 1.1 Elements used to define tokens <uppercase-letter> ::= <lowercase-letter> ::= <letter> ::= <digit> ::= <simple-char> ::= <char> ::= <sign> ::= <expo> ::= <simple-string> ::= <normal-string> ::= <number> ::= <exponent> ::= 1.2 A..Z a..z <uppercase-letter> | <lowercase-letter> 0..9 <letter> | <digit> | any char except doublequote, newline, backslash | \" | \n | \\ +|e|E <simple-char> | <simple-char><simple-string> <char> | <char><normal-string> <digit> | <digit><number> <expo> [<sign>] <number> Tokens Characters ’.’, ’:’, ’ !’, ’ ?’, ’(’, ’)’, ’,’, ’=’, ’[’, ’]’, ’%’, ’@’ are token separators. Space, tab and newline characters are non-interpreted token separators. However ’.’ is not a separator inside a float and no character is a separator inside a quotedstring, except double-quotes (unless it preceded by a \). <u-ident> ::= <uppercase-letter>[<simple-string>] <l-ident> ::= <lowercase-letter>[<simple-string>] <double-quoted-string> ::= "<normal-string>" <integer> ::= 0 | [-] 1..9 [<number>] <float> ::= [-] <number> . [<number>] [<exponent>] | [-] . <number> [<exponent>] | [-] <number> <exponent> equality mark: = end of statement mark: . absurd mark: ! rule mark: :question mark: ? start of statement name: [ end of statement name: ] start of terms: ( end of terms: ) separator: , 1.3 Global Grammar A dlgp document is a sequence of statements, with each statement introducing a fact, a rule, a negative constraint or a (conjunctive) query. 2 <document> ::= <statement-list> ::= <statement> ::= <fact> ::= <rule> ::= <head> ::= <body> ::= <constraint> ::= <query> ::= <conjunction> ::= <atom> ::= <equality> ::= <std-atom> ::= <term-list> ::= <term> ::= <var-list> ::= <predicate> ::= <variable> ::= <constant> ::= <literal> ::= <identifier> ::= 1.4 <statement-list> <statement> | <statement> <statement-list> <fact> | <rule> | <constraint> | <query> [ [<identifier>] ] <conjunction>. [ [<identifier>] ] <head> :- <body>. <conjunction> <conjunction> [ [<identifier>] ] ! :- <conjunction>. [ [<identifier>] ] ? [(<var-list>)] :- <conjunction>. <atom> | <atom>, <conjunction> <std-atom> | <equality> <term> = <term> <predicate>(<term-list>) <term> | <term>, <term-list> <variable> | <constant> | <literal> <variable> | <variable>, <var-list> <l-ident> | <double-quoted-string> <u-ident> <l-ident> <double-quoted-string> | <integer> | <float> <u-ident> | <l-ident> Comments and Annotations The symbol % (when it occurs outside of a double-quoted string) introduces a comment: all characters after % are ignored until the end of the line. The symbol @ (when it occurs outside of a double-quoted string) introduces an annotation, i.e., a directive for the file parser. The role of an annotation is to define a section in the file as well as the nature of statements in this section. Annotations allow to inform tools sooner about the nature of the statements they analyze. For instance, without annotations, it is not possible to distinguish between a fact and a rule before the sequence of atoms composing the fact or the rule head has been completely analyzed. If a parser has to analyze a big fact, then it is better that it knows it before it starts analyzing it, so that it can process it more efficiently. An annotation is introduced by one of the following tokens: @Facts, @Rules, @Constraints or @Queries. It defines a section that spans to the next annotation or the end of the file. An annotation cannot be introduced inside a statement. It is an indication for the parser, which can choose to ignore it. 3 <document> ::= <directive-list> ::= <annotation> ::= <fact-note> ::= <rule-note> ::= <constraint-note> ::= <query-note> ::= 2 <directive-list> <statement> | <annotation> | <statement> <directive-list> | <annotation> <directive-list> <fact-note> | <rule-note> | <constraint-note> | <query-note> @Facts @Rules @Constraints @Queries File Examples A syntactically correct but not easy-to-read file: [f1] p(X),r(X,b),q(b). [f2]t(X,a), s(a,z). [c1] ! :- r(X,X) . % This is a comment [q1]?(X) :- p(X),r(X,Z), t(a,Z). [r1]r(X,Y):-p(X) , t(X,Z). [constraint 2]! :- X = X. [f3] q(a),r(a,X), % This is another comment t(a,z). s(X,Y),s(Y,Z) :q(X),t(X,Z). s(Z):-a=b,X=Y. @Facts t(X,a), r(Y,z). @Rules [rA 1]p(X):-q(X). !:-p(X),q(X). ? :- p(X). The same logical content organized in a more readable way: % facts [f1] p(X), r(X,b), q(b). [f2] t(X,a), s(a,z). [f3] q(a), r(a,X), t(a,z). t(X,a), r(Y,z). % constraints [c1] ! :- r(X,X). [constraint 2] ! :- X=X. ! :- p(X), q(X). % rules [r1] r(X,Y) :- p(X), t(X,Z). 4 s(X,Y), s(Y,Z) :- q(X),t(X,Z). s(Z) :- a=b, X=Y. [rA 1] p(X) :- q(X). % queries [q1] ? (X) :- p(X), r(X,Z), t(a,Z). ? :- p(X). Again the same logical content while using annotations instead of simple comments to distinguish between the different kinds of knowledge: @Facts [f1] p(X), r(X,b), q(b). [f2] t(X,a), s(a,z). [f3] q(a), r(a,X), t(a,z). t(X,a), r(Y,z). @Constraints [c1] ! :- r(X,X). [contrainte 2] ! :- X=X. ! :- p(X), q(X). @Rules [r1] r(X,Y) :- p(X), t(X,Z). s(X,Y), s(Y,Z) :- q(X),t(X,Z). s(Z) :- a=b, X=Y. [rA 1] p(X) :- q(X). @Queries [q1] ? (X) :- p(X), r(X,Z), t(a,Z). ? :- p(X). Note on the scope of variable identifiers. While the scope of a constant or a predicate identifier is the whole document, the scope of a variable is local to a <statement>. Thus two different facts, rules or constraints actually do not share any variable (more precisely, variables with the same name in different statements are each bound by their own quantifier). Example: p(X,a), q(X,Y). q(X,b). is logically interpreted as ∃X∃Y (p(X, a) ∧ q(X, Y )) ∧ ∃Xq(X, b) while: p(X,a), q(X,Y), q(X,b). is logically interpreted as ∃X∃Y (p(X, a) ∧ q(X, Y ) ∧ q(X, b)). 5