An extended Datalog Syntax for Existential Rules and

Transcription

An extended Datalog Syntax for Existential Rules and
An extended Datalog Syntax
for Existential Rules and Datalog+/GraphIK Team, LIRMM, Montpellier
July 2012
Abstract
This document defines a textual format for the existential rule
framework, also called Datalog+/-. It is meant to be an exchange
format at once human-friendly, concise and easy to parse. This format
can be seen as an extension of the commonly used format for plain
Datalog. It is called “dlgp” for “Datalog Plus”. A file may contain
four kinds of knowledge elements: facts, existential rules, negative constraints and conjunctive queries.
For a quick introduction to the framework, please see the document
“Brief Overview of the Existential Rule Framework”.
1
Syntax of the dlgp format
As usually in Datalog, variables begin with an uppercase letter and constants with a lowercase letter. We distinguish between regular constants
(called constants hereafter) and literals, which are values belonging to some
datatype. Literals are given as double-quoted strings or numeric values
(integers and floats). Predicates are either simple strings (i.e., composed
of letters, digits and ), the equality symbol = (with infix notation) or
double-quoted strings (in this case, the newline, backslash and doublequote
characters that occur in the string are replaced by \n, \\ et \" when necessary).
In the following the syntax is specified by a grammar in BNF style: nonterminals are enclosed between the pair <>, terminals are in bold, the |
symbol indicates a choice, parts enclosed between the pair ([]) are optional,
choice1 ..choicen indicates a choice within an interval.
We first define elements used to build tokens, then tokens, then the global
grammar. Finally, comments and annotations are added.
A file name has the extension .dlgp or .dlp. Character encoding is
assumed to be UTF-8.
1
1.1
Elements used to define tokens
<uppercase-letter> ::=
<lowercase-letter> ::=
<letter> ::=
<digit> ::=
<simple-char> ::=
<char> ::=
<sign> ::=
<expo> ::=
<simple-string> ::=
<normal-string> ::=
<number> ::=
<exponent> ::=
1.2
A..Z
a..z
<uppercase-letter> | <lowercase-letter>
0..9
<letter> | <digit> |
any char except doublequote, newline, backslash | \" | \n | \\
+|e|E
<simple-char> | <simple-char><simple-string>
<char> | <char><normal-string>
<digit> | <digit><number>
<expo> [<sign>] <number>
Tokens
Characters ’.’, ’:’, ’ !’, ’ ?’, ’(’, ’)’, ’,’, ’=’, ’[’, ’]’, ’%’, ’@’ are token separators. Space, tab and newline characters are non-interpreted token separators. However ’.’ is not a separator inside a float and no character is a
separator inside a quotedstring, except double-quotes (unless it preceded by
a \).
<u-ident> ::=
<uppercase-letter>[<simple-string>]
<l-ident> ::=
<lowercase-letter>[<simple-string>]
<double-quoted-string> ::= "<normal-string>"
<integer> ::=
0 | [-] 1..9 [<number>]
<float> ::=
[-] <number> . [<number>] [<exponent>] |
[-] . <number> [<exponent>] |
[-] <number> <exponent>
equality mark:
=
end of statement mark:
.
absurd mark:
!
rule mark:
:question mark:
?
start of statement name:
[
end of statement name:
]
start of terms:
(
end of terms:
)
separator:
,
1.3
Global Grammar
A dlgp document is a sequence of statements, with each statement introducing a fact, a rule, a negative constraint or a (conjunctive) query.
2
<document> ::=
<statement-list> ::=
<statement> ::=
<fact> ::=
<rule> ::=
<head> ::=
<body> ::=
<constraint> ::=
<query> ::=
<conjunction> ::=
<atom> ::=
<equality> ::=
<std-atom> ::=
<term-list> ::=
<term> ::=
<var-list> ::=
<predicate> ::=
<variable> ::=
<constant> ::=
<literal> ::=
<identifier> ::=
1.4
<statement-list>
<statement> | <statement> <statement-list>
<fact> | <rule> | <constraint> | <query>
[ [<identifier>] ] <conjunction>.
[ [<identifier>] ] <head> :- <body>.
<conjunction>
<conjunction>
[ [<identifier>] ] ! :- <conjunction>.
[ [<identifier>] ] ? [(<var-list>)] :- <conjunction>.
<atom> | <atom>, <conjunction>
<std-atom> | <equality>
<term> = <term>
<predicate>(<term-list>)
<term> | <term>, <term-list>
<variable> | <constant> | <literal>
<variable> | <variable>, <var-list>
<l-ident> | <double-quoted-string>
<u-ident>
<l-ident>
<double-quoted-string> | <integer> | <float>
<u-ident> | <l-ident>
Comments and Annotations
The symbol % (when it occurs outside of a double-quoted string) introduces
a comment: all characters after % are ignored until the end of the line. The
symbol @ (when it occurs outside of a double-quoted string) introduces an
annotation, i.e., a directive for the file parser. The role of an annotation
is to define a section in the file as well as the nature of statements in this
section. Annotations allow to inform tools sooner about the nature of the
statements they analyze. For instance, without annotations, it is not possible to distinguish between a fact and a rule before the sequence of atoms
composing the fact or the rule head has been completely analyzed. If a
parser has to analyze a big fact, then it is better that it knows it before it
starts analyzing it, so that it can process it more efficiently.
An annotation is introduced by one of the following tokens: @Facts,
@Rules, @Constraints or @Queries. It defines a section that spans
to the next annotation or the end of the file. An annotation cannot be
introduced inside a statement. It is an indication for the parser, which can
choose to ignore it.
3
<document> ::=
<directive-list> ::=
<annotation> ::=
<fact-note> ::=
<rule-note> ::=
<constraint-note> ::=
<query-note> ::=
2
<directive-list>
<statement> | <annotation> | <statement> <directive-list> |
<annotation> <directive-list>
<fact-note> | <rule-note> | <constraint-note> | <query-note>
@Facts
@Rules
@Constraints
@Queries
File Examples
A syntactically correct but not easy-to-read file:
[f1] p(X),r(X,b),q(b).
[f2]t(X,a), s(a,z).
[c1] ! :- r(X,X) .
% This is a comment
[q1]?(X) :- p(X),r(X,Z), t(a,Z).
[r1]r(X,Y):-p(X) , t(X,Z).
[constraint 2]! :- X = X.
[f3] q(a),r(a,X), % This is another comment
t(a,z).
s(X,Y),s(Y,Z)
:q(X),t(X,Z).
s(Z):-a=b,X=Y.
@Facts t(X,a), r(Y,z).
@Rules
[rA 1]p(X):-q(X).
!:-p(X),q(X).
? :- p(X).
The same logical content organized in a more readable way:
% facts
[f1] p(X), r(X,b), q(b).
[f2] t(X,a), s(a,z).
[f3] q(a), r(a,X), t(a,z).
t(X,a), r(Y,z).
% constraints
[c1] ! :- r(X,X).
[constraint 2] ! :- X=X.
! :- p(X), q(X).
% rules
[r1] r(X,Y) :- p(X), t(X,Z).
4
s(X,Y), s(Y,Z) :- q(X),t(X,Z).
s(Z) :- a=b, X=Y.
[rA 1] p(X) :- q(X).
% queries
[q1] ? (X) :- p(X), r(X,Z), t(a,Z).
? :- p(X).
Again the same logical content while using annotations instead of simple
comments to distinguish between the different kinds of knowledge:
@Facts
[f1] p(X), r(X,b), q(b).
[f2] t(X,a), s(a,z).
[f3] q(a), r(a,X), t(a,z).
t(X,a), r(Y,z).
@Constraints
[c1] ! :- r(X,X).
[contrainte 2] ! :- X=X.
! :- p(X), q(X).
@Rules
[r1] r(X,Y) :- p(X), t(X,Z).
s(X,Y), s(Y,Z) :- q(X),t(X,Z).
s(Z) :- a=b, X=Y.
[rA 1] p(X) :- q(X).
@Queries
[q1] ? (X) :- p(X), r(X,Z), t(a,Z).
? :- p(X).
Note on the scope of variable identifiers. While the scope of a constant or a predicate identifier is the whole document, the scope of a variable
is local to a <statement>. Thus two different facts, rules or constraints
actually do not share any variable (more precisely, variables with the same
name in different statements are each bound by their own quantifier).
Example:
p(X,a), q(X,Y).
q(X,b).
is logically interpreted as ∃X∃Y (p(X, a) ∧ q(X, Y )) ∧ ∃Xq(X, b)
while:
p(X,a), q(X,Y), q(X,b).
is logically interpreted as ∃X∃Y (p(X, a) ∧ q(X, Y ) ∧ q(X, b)).
5