8 Extended Database Concepts

Transcription

Vorlesung Datenbanken
8
Wintersemester 2012/13
Extended Database Concepts
DOOD
• deductive and
• object–oriented databases
DOOD databases offer advanced features for
• data modelling and
• database programming
for complex data structures.
Prof. Dr. Dietmar Seipel
854
8.1 Deductive Databases and Logic Programming
The ease of handling the data structure of terms and the powerful built–in
control structure of backtracking are features that distinguish P ROLOG from
other programming languages.
P ROLOG is very well–suited for embedded database programming.
In the database context, frequently a restricted version is used, which is
called DATALOG – the basis of deductive databases.
• P ROLOG and DATALOG are declarative languages; they can access
databases and X ML documents.
• Relations and complex objects (like, e.g., X ML documents) can be
represented as term structures.
• With the help of declarative rules, we can represent integrity constraints
and inference rules for deriving conclusions from given information.
855
8.1.1
P ROLOG as a Database Language
1. P ROLOG can be used for representing tables from relational databases.
The tuples of a table become P ROLOG facts with the same predicate
symbol – usually, the table name is used.
2. The data dictionary of a relational database can also be represented
using P ROLOG facts. This can be done using P ROLOG terms that
correspond to an X ML representation of the data dictionary.
3. Queries and integrity constraints can be represented as P ROLOG rules.
Conjunctive queries are posed in the form of P ROLOG goals, which are
then evaluated using the P ROLOG rules.
4. DATALOG is a restricted version of P ROLOG, which ensures termination
and the efficient evaluation of recursive queries.
5. The deductive database system DD BASE combines P ROLOG and DATALOG.
856
Database Tables
in MyS QL:
E MPLOYEE
FNAME
MINIT
LNAME
SSN
BDATE
ADDRESS
SEX
SALARY
SUPERSSN
DNO
John
B
Smith
444444444
1955-01-09
731 Fondren, Houston, TX
M
30000
222222222
5
Franklin
T
Wong
222222222
1945-12-08
638 Voss, Houston, TX
M
40000
111111111
5
Alicia
J
Zelaya
777777777
1958-07-19
3321 Castle, Spring, TX
F
25000
333333333
4
Jennifer
S
Wallace
333333333
1931-06-20
291 Berry, Bellaire, TX
F
43000
111111111
4
Ramesh
K
Narayan
555555555
1952-09-15
975 Fire Oak, Humble, TX
M
38000
222222222
5
Joyce
A
English
666666666
1962-07-31
5631 Rice, Houston, TX
F
25000
222222222
5
Ahmad
V
Jabbar
888888888
1959-03-29
980 Dallas, Houston, TX
M
25000
333333333
4
James
E
Borg
111111111
1927-11-10
450 Stone, Houston, TX
M
55000
NULL
1
A database table p can be represented by a set of P ROLOG facts, namely one
fact p(t1 , . . . , tn ) for each tuple (t1 , . . . , tn ) of the table.
857
W ORKS _O N
ESSN
PNO
HOURS
111111111
20
NULL
222222222
2
10.0
222222222
3
10.0
333333333
20
15.0
PNAME
PNUMBER
PLOCATION
DNUM
333333333
30
20.0
ProductX
1
Bellaire
5
444444444
1
32.5
ProductY
2
Sugarland
5
444444444
2
7.5
ProductZ
3
Houston
5
555555555
3
40.0
Computerization
10
Stafford
4
666666666
1
20.0
Reorganization
20
Houston
1
666666666
2
20.0
Newbenefits
30
Stafford
4
777777777
10
10.0
777777777
30
30.0
888888888
10
35.5
888888888
30
5.0
P ROJECT
858
Database Tables in P ROLOG:
employee(’John’, ’B’, ’Smith’, 444444444,
1955-01-09, ’731 Fondren, Houston, TX’,
’M’, 30000, 222222222, 5).
employee(’Franklin’, ’T’, ’Wong’, ...).
...
works_on(444444444, 1, 32.5).
works_on(444444444, 2, 7.5).
...
department(’Research’, 5, 222222222, 1978-05-22).
...
project(’ProductX’, 1, ’Bellaire’, 5).
...
We do not quote the date values. Then, they are terms, and we can access
their components more conveniently without string parsing.
859
Export from MyS QL to X ML
Using the P ROLOG library D DK, we can also export a MyS QL database or
table to X ML:
?- mysql_database_to_xml(mysql, company, Xml),
dwrite(xml, Xml).
<database name="company">
<table name="employee"> ... </table>
<table name="works_on">
<row ESSN="111111111" PNO="20" HOURS="0.0"/>
<row ESSN="222222222" PNO="2" HOURS="10.0"/> ...
</table> ...
</database>
?- mysql_database_table_to_xml(
mysql, company:employee, Xml).
860
Data Dictionary
Using the P ROLOG library D DK, we can export the data dictionary of a
relational database from MyS QL to an X ML representation:
?- mysql_database_schema_to_xml(company, Xml),
dwrite(xml, Xml).
This produces an X ML element with one table sub–element for every table:
<database name="company">
<table name="department"> ... </table>
<table name="employee"> ... </table>
<table name="dependent"> ... </table>
<table name="dept_locations"> ... </table>
<table name="project"> ... </table>
<table name="works_on"> ... </table>
</database>
861
<table name="employee">
<attribute name="FNAME" type="varchar(15)" is_nullable="NO"/>
<attribute name="MINIT" type="char(1)" is_nullable="YES"/>
<attribute name="LNAME" type="varchar(15)" is_nullable="NO"/>
<attribute name="SSN" type="varchar(9)" is_nullable="NO"/>
<attribute name="BDATE" type="date" is_nullable="YES"/>
<attribute name="ADDRESS" type="varchar(30)" is_nullable="YES"/>
<attribute name="SEX" type="char(1)" is_nullable="YES"/>
<attribute name="SALARY" type="decimal(10,2)" is_nullable="YES"/>
<attribute name="SUPERSSN" type="varchar(9)" is_nullable="YES"/>
<attribute name="DNO" type="int(11)" is_nullable="NO"/>
<primary_key> <attribute name="SSN"/> </primary_key>
<foreign_key> <attribute name="SUPERSSN"/>
<references table="employee">
<attribute name="SSN"/> </references> </foreign_key>
<foreign_key> <attribute name="DNO"/>
<references table="department"> <attribute name="DNUMBER"/>
</references> </foreign_key>
</table>
862
Data Dictionary as a P ROLOG Term
table:[name:employee]:[
attribute:[name:’FNAME’,
type:’varchar(15)’, is_nullable:’NO’]:[],
attribute:[name:’MINIT’, ...]:[],
attribute:[name:’LNAME’, ...]:[],
attribute:[name:’SSN’, ...]:[],
...
attribute:[name:’SUPERSSN’, ...]:[],
attribute:[name:’DNO’, ...]:[],
primary_key:[ attribute:[name:’SSN’]:[] ],
...
foreign_key:[ attribute:[name:’DNO’]:[],
references:[table:’department]:[
attribute:[name:’DNUMBER’]:[] ] ] ]
This P ROLOG representation of X ML can be queried and transformed using
the D DK library F N Query.
863
A foreign key
foreign_key:[
attribute:[name:A1]:[], ..., attribute:[name:An]:[],
references:[table:T]:[
attribute:[name:B1]:[], ..., attribute:[name:Bn]:[] ] ]
can be represented in short form as
[A1,...,An] -> T:[B1,...,Bn].
Then, the list of all foreign keys becomes a P ROLOG term
foreign_keys:[fk1,...,fkm].
Similarly, the list of attributes and the primary key can be simplified to a
short form.
DD BASE stores a P ROLOG fact schema(table:[...]:[...]) with
the simplified term representation for every database table.
864
Data Dictionary as P ROLOG Facts (Short Form)
schema( table:[name:employee, database:company]:[
attributes:[’FNAME’, ’MINIT’, ’LNAME’, ’SSN’, ’BDATE’,
’ADDRESS’, ’SEX’, ’SALARY’, ’SUPERSSN’, ’DNO’],
primary_key:[’SSN’],
foreign_keys:[ [’SUPERSSN’]->employee:[’SSN’],
[’DNO’]->department:[’DNO’] ] ] ).
schema( table:[name:works_on, database:company]:[
attributes:[’ESSN’, ’PNO’, ’HOURS’],
primary_key:[’ESSN’, ’PNO’],
foreign_keys:[ [’ESSN’]->employee:[’SSN’],
’[PNO’]->project:[’PNO’] ] ] ).
schema( table:[name:department, ...]:[...] ).
schema( table:[name:project, ...]:[...] ).
865
Views and Queries vs. Rules and Goals
• S QL V IEW:
CREATE
SELECT
FROM
WHERE
AND
VIEW QUERY_1 AS
LNAME, PNAME, HOURS
EMPLOYEE, WORKS_ON, PROJECT
EMPLOYEE.SSN = WORKS_ON.ESSN
PROJECT.PNUMBER = WORKS_ON.PNO
• P ROLOG rule:
query_1(LNAME, PNAME, HOURS) :employee(_,_, LNAME, SSN, _,_,_,_,_,_),
project(PNAME, P, _,_),
works_on(SSN, P, HOURS).
866
• S QL S ELECT:
SELECT *
FROM
QUERY_1
The S ELECT statement calls the view.
• P ROLOG goal:
?- query_1(LNAME, PNAME, HOURS).
The query is submitted to the P ROLOG interpreter as a goal.
The goal corresponds to the S ELECT statement calling the view.
867
Recursive Queries: Transitive Closure (Version 1)
The following recursive rule set derives the transitive supervisor relation on
the social security numbers:
supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_2).
supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_3),
supervisor(SSN_3, SSN_2).
SSN_1
?direct s.
SSN_3
supervisor
?
SSN_2
direct_supervisor(SSN_1, SSN_2) :employee(_, _, _, SSN_2, _, _, _, _, SSN_1, _).
The following query assigns names to the social security numbers:
query_2(F1-M1-L1, F2-M2-L2) :supervisor(SSN_1, SSN_2),
employee(F1, M1, L1, SSN_1, _, _, _, _, _, _),
employee(F2, M2, L2, SSN_2, _, _, _, _, _, _).
868
Transitive closure queries cannot be formulated in standard S QL systems.
Some relational database systems, however, offer limited forms of
recursion – cf. S QL–99.
CREATE
SELECT
FROM
UNION
SELECT
FROM
WHERE
RECURSIVE VIEW supervisor(Emp, Sup) AS
Emp, Sup
direct_supervisor
D.Emp, S.Sup
direct_supervisor D, supervisor S
D.Sup = S.Emp
This assumes a table direct_supervisor with the attributes Emp
and Sup. Obviously, this S QL implementation is structurally equivalent to
the following shorter rule implementation (“;” means “or”).
supervisor(Emp, Sup) :( direct_supervisor(Emp, Sup)
; direct_supervisor(Emp, X), supervisor(X, Sup) ).
869
Further Applications of Recursion
• computation of aggregate functions
• parts–of–list resolution
Meta–Predicates: Transitive Closure (Version 2)
Using the generic meta–predicate transitive_closure, the previous
two rules for supervisor can be replaced by a single and much more
compact and abstract rule:
supervisor(SSN_1, SSN_2) :transitive_closure(
direct_supervisor, SSN_1, SSN_2 ).
870
Aggregation Queries
The meta–predicate ddbase_aggregate/3 in the following query
groups over the employees:
• for every employee – given by FNAME,MINIT,LNAME,SSN –
• the corresponding list of all tuples [PNO,HOURS] is computed:
?- ddbase_aggregate( [F, M, L, S, list([P,H])],
( works_on(S, P, H),
employee(F, M, L, S, _,_,_,_,_,_) ),
Tuples ),
Attributes =
[’FNAME’,’MINIT’,’LNAME’,’SSN’,’[PNO,HOURS]’],
xpce_display_table(Attributes, Tuples).
The result is displayed as a table in the X PCE extension of S WI P ROLOG.
871
Tuples = [
[’Ahmad’, ’V’, ’Jabbar’, ’888888888’, [[10, 35.5], [30, 5.0]]],
[’Alicia’, ’J’, ’Zelaya’, ’777777777’, [[10, 10.0], [30, 30.0]]],
... ]
Thus, DD BASE can produce nested (NF2 ) tables, which is not possible
in S QL.
872
Transitive Closure (Version 3)
We can also compute the list of subordinates for each employee in P ROLOG:
?- findall( Boss-Emp,
( employee(_,_,_, Emp, _,_,_,_, Boss, _),
Boss \= ’$null$’ ),
Edges ),
edges_to_ugraph(Edges, Graph),
transitive_closure(Graph, Tc_Graph).
Tc_Graph = [
’111111111’-[’222222222’, ’333333333’,
’444444444’, ’555555555’, ’666666666’,
’777777777’, ’888888888’],
’222222222’-[’444444444’, ’555555555’, ’666666666’],
’333333333’-[’777777777’, ’888888888’],
’444444444’-[], ’555555555’-[], ’666666666’-[],
’777777777’-[], ’888888888’-[] ].
873
Firstly, we compute a list Edges of pairs Boss-Emp of social security
numbers in DD BASE, such that Boss is the boss of Emp and Boss is not the
NULL value.
Secondly, we transform Edges to an adjacency representation Graph using
the predicate edges_to_ugraph/2 from S WI P ROLOG:
Graph = [
’111111111’-[’222222222’, ’333333333’],
’222222222’-[’444444444’, ’555555555’, ’666666666’],
’333333333’-[’777777777’, ’888888888’],
’444444444’-[], ’555555555’-[], ’666666666’-[],
’777777777’-[], ’888888888’-[] ].
Thirdly, the predicate transitive_closure/2 from S WI P ROLOG
computes the transitive closure of Graph. It infers, e.g., that
’111111111’ is the transitive supervisor of all the other employees.
874
In P ROLOG, the edges of a graph G = (N, E), where
• nodes N = { a, . . . , d } and
• edges E = { (a, b), (b, c), (c, a), (c, d) },
can be represented as a list
Edges = [ a-b, b-c, c-a, c-d ].
G:
aY
?
b
*c
-d
In S WI P ROLOG, the call
edges_to_ugraph(Edges, Graph)
converts Edges to an adjacency list representation
Graph = [ a-[b], b-[c], c-[a,d], d-[] ].
For every node V, a tuple V-Vs is given, such that Vs consists of all
successor nodes of V.
875
Termination Issues in P ROLOG and DATALOG
• Version 1 can be evaluated both in P ROLOG and in DATALOG.
The DATALOG evaluation always terminates, whereas the P ROLOG
evaluation is only suitable for acyclic graphs; it may not terminate for
cyclic graphs.
• The Versions 2 and 3 can only can be evaluated in P ROLOG.
The predicates transitive_closure/3 from the D DK and
transitive_closure/2 from S WI P ROLOG ensure termination for
arbitrary graphs.
Graph Representations
The Versions 1 and 2 work on facts. Version 3 works on a list representation
of the graph edges.
876
Basic Syntax of P ROLOG
Constant Symbol:
a, 10, ’Smith, John B.’
Variable Symbol:
X, Lname (starts with a capital letter)
Term:
f (t1 , . . . , tn ),
with function symbol f and terms ti
a, X (constant and variable symbols are terms),
f(g(a,b),X,10), a*(b+c) (complex terms),
[LNAME, . . . , DNO] (this is a list)
Predicate Symbol:
employee, attributes, query_1, transitive_closure
Atom:
p(t1 , . . . , tn ),
with predicate symbol p and terms ti .
877
Terms in Infix / Prefix Form
• The infix term 1955-01-09 representing a date has the prefix form
-(-(1955,01),09).
• The infix term a*(b+c) representing an arithmetic expression has the
prefix form *(a,+(b,c)).
The operator trees for the terms above are given in the following:
-
1955
*
R
09
a
R
R
01
b
+
R
c
878
Term Representation for X ML
An X ML element
<table name="employee">
<attribute name="FNAME"/>
</table>
can be represented by a complex term in field notation (FN):
table:[name:employee]:[
attribute:[name:’FNAME’]:[] ].
This infix form is using the binary functor ”:”.
The sub–term name:employee could be equivalently represented in prefix
form as :(name, employee).
Lists are denoted as ”[X1 ,...,Xn ]”, and ”[]” is the empty list – above
the list of sub–elements of the attribute element is empty.
879
Term Representation for Lists
In term notation, a non–empty list is represented as .(X, Xs), where
• X is the first element (head) and
• Xs represents the rest of the list (tail).
The list functor ”.” is binary, and the empty list is given by ”[]”.
[b] = .(b, [])
[a, b] = .(a, []) = .(a, .(b, []))
For communicating lists with the user, P ROLOG uses the compact list
notation [X1 ,...,Xn ], which is called syntactic sugar.
It helps the user to better comprehend the list.
880
When an infix operator ⊙ is used multiple times in a term a ⊙ b ⊙ c, then
there are rules in P ROLOG that determine whether a and b or b and c are
joined first in the prefix form.
• The infix term 1955-01-09 representing a date has the prefix form
-(-(1955,01),09).
• The infix term T:As:Es representing an X ML element has the prefix
form :(T,:(As,Es)).
The operator trees for the terms above are given in the following:
-
1955
:
R
09
R
01
T
R
As
:
R
Es
881
Thus, the term attribute:[name:’FNAME’]:[], which is equivalent
to :(attribute, :(.(:(name,’FNAME’), []), [])), has the
following operator tree:
882
Facts, Rules, and Goals
Literal:
atom A oder negated atom not(A)
Fact:
A
with atom A; e.g.,
employee(’John’, ’B’, ’Smith’, ...)
Rule:
A :- B1 , . . . , Bm
|{z}
{z
}
|
head
body
with atom A and literals Bi , example later
Goal:
:- B1 , . . . , Bm
with literals Bi
A set of facts for the same predicate symbol corresponds to a relation in
databases. Rules generalize views. Goals are used for expressing queries.
883
Argument Positions vs. Field Notation (FN)
• Like in other programming languages, the arguments ti of an atom
p(t1 , . . . , tn ) are handed over by position in P ROLOG. E.g., in
works_on(S, P, H),
the first position t1 = S is the social security number of an employee
who has worked on the project with the number t2 = P (second position)
for t3 = H hours (third position).
• In the database context, we could use a meta–interpreter for accessing
arguments in field notation – in a more abstract way – by their
corresponding attribute name. Then, according to the database schema,
works_on(’PNO’:P, ’ESSN’:S)
means that the employee with the social security number S has worked
on the project with the number P, independently of the order of the
arguments – and it is, e.g., not necessary to refer to the hours.
884
Integrity Constraints in P ROLOG
• Primary Key Constraint for Employee:
primary_key_violation(employee, X, Y) :X = employee(_,_,_, SSN, _,_,_,_,_,_),
Y = employee(_,_,_, SSN, _,_,_,_,_,_),
call(X), call(Y), X \= Y.
• Foreign Key Constraint for Employee:
foreign_key_violation(
employee(’DNO’), department(’DNUMBER’), X) :X = employee(_,_,_,_,_,_,_,_,_, DNO),
call(X),
not(department(_, DNO, _,_)).
In DD BASE, the primary and foreign key contraints of a relational database
are transformed to such rules, which are then tested on database updates.
885
In a less elegant, naive implementation, we have to assign variable symbols
for all the argument positions of the two violating employee facts:
• Primary Key Constraint for Employee:
primary_key_violation(employee, X, Y) :employee(A,B,C, SSN, D,E,F,G,H,I),
employee(J,K,L, SSN, M,N,O,P,Q,R),
X = employee(A,B,C, SSN, D,E,F,G,H,I),
Y = employee(J,K,L, SSN, M,N,O,P,Q,R),
X \= Y.
Moreover, we have to repeat all these variable symbols when we define the
return arguments X and Y of the call
primary_key_violation(employee, X, Y).
The many variable symbols and their repetition makes the rule more
error–prone.
886
In the shorter, first primary key rule above, we use the templates
X = employee(_,_,_, SSN, _,_,_,_,_,_),
Y = employee(_,_,_, SSN, _,_,_,_,_,_),
to avoid the naming and the repeated writing of all the arguments.
call(X), call(Y), X \= Y.
calls the templates in the P ROLOG database and tries to assign values to all
argument positions – even the ones with anonymous variables “_” – and tests
if X and Y represent two different database tuples.
If the primary key constraint is violated, then the instantiated templates are
returned.
Analogously, we proceed for the foreign key constraint.
As a general purpose programming language, P ROLOG offers a great
functionality for defining integrity constraints.
887
Semantic Constraints in Field Notation (FN)
• No employee should earn more than his manager:
trigger(salary, X, Y) :employee(’SSN’:X, ’SALARY’:S1, ’SUPERSSN’:Y),
employee(’SSN’:Y, ’SALARY’:S2),
S1 > S2.
• Which employee works on a foreign project ?
trigger(employee_works_on_foreign_project, E, P) :works_on(’ESSN’:E, ’PNO’:P),
employee(’SSN’:E, ’DNO’:D1),
project(’PNUMBER’:P, ’DNUM’:D2),
D1 \= D2.
FN abstracts from argument positions: employee(’SSN’:E, ’DNO’:D1)
corresponds to employee(_,_,_, E, _,_,_,_,_, D1).
888
Bottom–Up Evaluation of DATALOG
• The set of all given facts for a predicate corresponds to a relation.
• A rule without function symbols corresponds to a VIEW statement
defining a relation for the head predicate.
• The relations for the body predicates are derived using rules themselves.
Thus, it can happen that a rule transitively helps to derive tuples for one
of its body predicates (recursion).
E.g., the second rule for supervisor is directly recursive.
• The bottom–up evaluation iteratively enlarges the relations for the
predicates by repeatedly evaluating all rules until a fixpoint is reached.
Thus, e.g., all transitive supervisors can be derived, which is provably
not possible using standard S QL.
889
Example (Recursion and Transitive Closure)
111111111
j
222222222
?
333333333
j
R
444444444 555555555 666666666 777777777 888888888
The following recursive rule set derives the transitive supervisor relation on
the social security numbers:
supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_2).
supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_3),
supervisor(SSN_3, SSN_2).
direct_supervisor(SSN_1, SSN_2) :employee(_,_,_, SSN_2, _,_,_,_, SSN_1, _).
890
The first iteration derives the facts for direct_supervisor from the
facts for employee:
direct_supervisor(111111111,
222222222).
333333333).
444444444).
555555555).
666666666).
777777777).
888888888).
The second iteration translates these facts to the corresponding 7 facts for
supervisor.
supervisor(111111111, 222222222).
...
supervisor(333333333, 888888888).
891
The third iteration derives the 5 new facts that 111111111 is the transitive
(indirect) supervisor of the employees 444444444 to 888888888:
supervisor(111111111,
444444444).
555555555).
666666666).
777777777).
888888888).
Since the hierarchy is of limited depth 2 here, the relations corresponding to
these facts could also be derived in S QL.
For arbitrary hierarchies of unlimited depth, however, it is not possible to
derive the transitive supervisors in S QL.
892
In principle, all rules can be used in all iterations. But, a rule can only fire and
derive facts, as soon as facts for the body atoms have been derived in previous
iterations. From then on, the rule can always be used to derive the same facts.
One of the purposes of efficient bottom–up evaluation is to avoid these
redundant derivations – especially in the presence of recursion.
The rule for query_2 fires in iteration 3 for the first time and derives 7 facts
for direct supervisors:
query_2(’James’-’E’-’Borg’, ’Franklin’-’T’-’Wong’).
...
query_2(’Jennifer’-’S’-’Wallace’, ’Ahmad’-’V’-’Jabbar’).
Finally, in iteration 4, the 5 facts for transitive supervisors are derived.
Iteration 5 does not derive any new facts.
Thus, a fixpoint is reached, and the iteration terminates.
893
Comparison with S QL
• Non–recursive DATALOG could be simulated in S QL by mapping the
rules to View statements – or to INSERT statements whose result is
computed using a SELECT statement.
• Recursion brings higher expressivity to DATALOG.
• There are DATALOG extensions which allow for default negation and
aggregate operations as well.
• The rule–based approach of DATALOG supports modularization:
instead of one single, complex VIEW or SELECT statement in S QL, a set
of simpler and more compact DATALOG rules can be used.
The deductive database system DDBASE also supports update operations
such as INSERT and DELETE, and it can connect to relational databases.
894
8.1.2 The Deductive Database System DDBASE
The deductive database system DD BASE, which is part of the D DK, can
process
• relational databases and
• X ML documents
within the same query using O DBC and F N Query, respectively:
DD BASE
O DBC
RDB
F N Query
U
X ML
This extends database programming languages (DBPL) by X ML capabilities.
895
O DBC
The following P ROLOG rule accesses a relational database – given by the
connection handle mysql – using the O DBC library of S WI P ROLOG.
generate_html_table(Salary, table:Rows) :concat(’SELECT fname, minit, lname, salary \
FROM employee WHERE salary >= ’, Salary, Query),
Types = [types([atom,atom,atom,integer])],
findall( Row,
( odbc_query(mysql, Query, row(F,M,L,S), Types),
Row = tr:[td:[F], td:[M], td:[L], td:[S]] ),
Rows ).
The query string Query is obtained by concatenating a partial select
statement with the value for the salary. Types gives the types of the
components of the result tuples.
896
The findall Statement
• The call odbc_query(mysql, Query, row(F,M,L,S),
Types) returns the values F,M,L,S for the attributes fname,
minit, lname, salary of the table employee.
• By backtracking, the findall statement produces a list Rows of
P ROLOG terms Row of the form tr:[td:[F], td:[M],
td:[L], td:[S]], which represent X ML elements in F N Query.
• For a given Salary, the call generate_html_table(Salary,
table:Rows) produces a P ROLOG term table:Rows, which
represents the following H TML table in F N Query.
897
The generated H TML table
<table>
<tr><th>Fname</th><th>Minit</th><th>Lname</th><th>Salary</th></tr>
<tr><td>John</td><td>B</td><td>Smith</td><td>30000</td></tr>
<tr><td>Franklin</td><td>T</td><td>Wong</td><td>40000</td></tr>
<tr><td>Jennifer</td><td>S</td><td>Wallace</td><td>43000</td></tr>
<tr><td>Ramesh</td><td>K</td><td>Narayan</td><td>38000</td></tr>
<tr><td>James</td><td>E</td><td>Borg</td><td>55000</td></tr>
</table>
can be rendered in a web browser:
898
By O DBC, we can make S QL tables available in DD BASE:
employee(A,B,C,D,E,F,G,H,I,J) :Goal = company:employee(A,B,C,D,E,F,G,H,I,J),
ddbase_call(odbc(mysql), Goal).
works_on(A,B,C) :Goal = company:works_on(A,B,C),
ddbase_call(odbc(mysql), Goal).
It is also possible to generate these rules in DD BASE, which avoids the
error–prone, repeated use of so many variable symbols.
The call ddbase_connect(odbc(mysql), M, Database:Table)
asserts a corresponding rule in a P ROLOG module M.
The following two aggregation statements refer to the predicate
employee/10 provided by O DBC. The facts for works_on/3 are derived
using F N Query from an X ML document works_on.xml.
899
Aggregation on RDB and X ML
For every Ssn in the table employee, the following query groups all
corresponding entries from the document works_on.xml:
?- ddbase_aggregate( [Ssn, list([Pno, Hours])],
( employee(_,_,_, Ssn, _,_,_,_,_,_),
Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn],
Pno := Row@’PNO’, Hours := Row@’HOURS’ ),
Tuples ).
Tuples = [
[’111111111’, [[’20’, ’0.0’]]],
[’222222222’, [[’2’, ’10.0’], [’3’, ’10.0’]]], ... ]
The resulting list Tuples represents an NF2 relation.
A query optimizer could rearrange the Goal in ddbase_aggregate/3 by
changing the order of the calls to the predicate employee/10 and the X ML
document works_on.xml.
900
In DD BASE, we can define arbitary binary aggregation predicates.
ddbase_aggregate/3 groups over all variable symbols that occur
standalone in the result template [Ssn, list([Pno, Hours])];
in this case, this is Ssn.
• For every Ssn, the above call to ddbase_aggregate/3 computes
the list Xs of all corresponding pairs [Pno, Hours].
• Then, the call list(Xs, Pairs), which will be explained in a little
while, simply passes Xs to Pairs.
• Thus, ddbase_aggregate/3 produces a nested tuple [Ssn,
Pairs] for every Ssn. Pairs is a list of lists; it represents a relation.
The resulting list Tuples is the output.
901
The following statement aggregates the working hours of the employees of
the departments:
?- ddbase_aggregate( [Dno, sum(Hours)],
( employee(_,_,_, Ssn, _,_,_,_,_, Dno),
H := Row@’HOURS’, atom_number(H, Hours) ),
Tuples ).
Tuples = [[1, 0.0], [4, 115.5], [5, 140.0]]
The attribute value H of the attribute ’Hours’ of Row is an atom that has to
be converted to a number Hours.
The template [Dno, sum(Hours)] leads to a grouping on the department
numbers. For every Dno, first the list Xs of all corresponding Hours is
computed, and then the sum is computed by the call sum(Xs, Sum); thus,
we obtain a standard result tuple [Dno, Sum].
902
For explaining the effect of the template [Dno, sum(Hours)], we
abstract the second argument of the call above as follows:
dno_hours(Dno, Hours) :employee(_,_,_, Ssn, _,_,_,_,_, Dno),
H := Row@’HOURS’, atom_number(H, Hours).
The intermediate variable symbols Ssn, Row, and H do not become
arguments of dno_hours/2, since they are not used in the template.
Then, the following call has the same result as the call above:
?- ddbase_aggregate( [Dno, sum(Hours)],
dno_hours(Dno, Hours),
Tuples ).
903
E.g., for Dno=4, first the list Xs of all working hours of employees from
department 4 is computed by dno_hours(4, Hours) in the following
functional set notation, and then the sum Sum is computed:
?- Xs <= { Hours | dno_hours(4, Hours) },
Sum <= sum(Xs).
Xs = [15.0, 20.0, 10.0, 30.0, 35.5, 5.0],
Sum = 115.5.
These functional notations, which are possible in the D DK, can even be
nested to get rid of the intermediate variable symbol Xs:
?- Sum <= sum({ Hours | dno_hours(4, Hours) }).
Sum = 115.5.
The functional notation Sum <= sum(Xs) is equivalent to the relational
notation sum(Xs, Sum) which includes the return value as the last
argument. Thus, sum should be defined as a binary predicate in P ROLOG.
904
Aggregation Predicates
In DD BASE, arbitrary user–defined aggregation predicates can be used.
The predicate list/2 simply passes the input to the output:
list(Xs, Xs).
The predicate sum/2 uses an accumulator, which is initialized to 0. sum/3
traverses the input list recursively. The list head X is added to the accumulator
Acc, and then sum/3 is called recursively on the list tail Xs and the new
accumulator Acc_2; if the list is empty, then Acc becomes the output:
sum(Xs, Sum) :sum(Xs, 0, Sum).
sum([X|Xs], Acc, Sum) :Acc_2 is Acc + X, sum(Xs, Acc_2, Sum).
sum([], Acc, Acc).
905
Lightweight Fact Database
A relational database can also be imported into a lightweight fact
representation in P ROLOG.
The following sequence of statements loads the data dictionary from the
MyS QL database company in a module c. Subsequently, the corresponding
relations are imported from the MyS QL database, and a summary is shown.
?- ddbase_load(odbc(mysql), company, c),
ddbase_load_tables(c), ddbase_show_tables(c).
906
We can describe the schema of a database table based on the data dictionary
of MyS QL:
?- ddbase_describe_table(company:works_on).
<table name="works_on">
<attribute name="ESSN" type="char(9)" is_nullable="NO"/>
<attribute name="PNO" type="int(11)" is_nullable="NO"/>
<attribute name="HOURS" type="decimal(3,1)" is_nullable="NO"/>
<primary_key>
<attribute name="ESSN"/> <attribute name="PNO"/>
</primary_key>
<foreign_key> <attribute name="ESSN"/>
<references table="employee"> <attribute name="SSN"/>
</references>
</foreign_key>
<foreign_key> <attribute name="PNO"/>
<references table="project"> <attribute name="PNUMBER"/>
</references>
</foreign_key>
</table>
true.
907
We can display a complete database or single relations.
?- ddbase_facts_to_display(c).
?- ddbase_facts_to_display(c:works_on/3).
908
Of course, the P ROLOG representation of the relational database can be
queried in the standard way in P ROLOG.
Moreover, we can also execute update statements, which respect the integrity
constraints of the relational database. After an insertion or deletion in the
database, the primary and foreign key constraints from the data dictionary are
checked.
?- ddbase_insert(c, works_on(’666666666’, 10, 3)),
ddbase_insert(c, works_on(’666666666’, 10, 4)),
ddbase_delete(c, works_on(’666666666’, 10, 3)).
The second update is rejected, since it violates a primary key constraint.
All tuples from all relations of a database can be deleted in one step.
The data dictionary remains unchanged.
?- ddbase_drop_database(c).
909
Complex Computations with F N Query
element_to_subtree(Xml, Course_1, Course_2) :[T] := Course_1/’Title’/content::’*’,
( Ps := Course_1@’Prerequisites’ ->
let( Trees := Xml/’Course’::[
@’CourseNr’ = N, name_contains_name(Ps, N) ]
/call::element_to_subtree(Xml) )
; Trees = [] ),
Course_2 = ’Course’:[’Title’:T]:Trees.
?- dread(xml, ’Uni.xml’, [Xml]),
let( Trees := Xml/descendant::’Course’
/call::element_to_subtree(Xml) ),
dwrite(xml, ’CourseHierarchy’:Trees).
In F N Query, the attribute value of an element C is accessed by C@A,
whereas in XPATH, it is accessed by C/@A.
910
The call element_to_subtree(Xml, Course_1, Course_2)
takes an X ML document and a course element Course_1 and produces
another course element Course_2:
• Firstly, T becomes the content of the Title element of the Course
element Course_1.
• If Course_1 has prerequisites, then we determine a list Trees of
X ML terms using let.
For every course in the document, we check whether the course number
N is contained in the list Ps of prerequisites of Course_1.
In that case, we call the predicate element_to_subtree/3
recursively on that course to produce an element of Trees.
The global X ML document is also a parameter of the call.
• If Course_1 has no prerequisites, then we determine the empty list
Trees = [ ].
911
The main call reads the X ML document Uni.xml into a P ROLOG variable
Xml using dread/3.
Subsequently, let/1 calls element_to_subtree/3 on every
descendant Course element.
The resulting list Trees represents a list of X ML elements, which are then
packed into a CourseHierarchy element.
The corresponding P ROLOG term ’CourseHierarchy’:Trees
represents the X ML output of the whole computation, and it can be written to
standard output (the screen) using dwrite/2.
912
8.1.3
P ROLOG as a Programming Language
In the following, we will present P ROLOG implementations of well–known
algorithms for searching in graphs and binary search trees.
The benefits of P ROLOG are
• the elegant handling of data structures (lists, trees, X ML),
• (implicit) backtracking, and
• the compact representation of case distinctions in different rules.
The algorithms are typically recursive. Recursion can be formulated nicely
due to the compact list access.
Also meta–predicates support a compact and elegant encoding.
913
Graph Search
Labyrinth:
914
Computation of Simple Paths by Backtracking
The predicate graph_search/2 computes a simple path from a given
node to a sink of a graph:
% graph_search(+Node, ?Path) <graph_search(X, Path) :graph_search(X, [X], Path).
Another predicate graph_search/3 with the same predicate symbol but a
different arity is called.
The graph is given by facts for the prediactes graph_arc/2 and
graph_sink/1.
Notation for arguments in the comment line:
+: bound, -: free, ?: either bound or free
915
Path =
Visited [Y1 =Y, . . . ,Yn =Z]
-X-Y
-Z
• A call graph_search(X, Visited, Nodes) with a bound
argument X, which is not a sink, and a list Visited of already visitied
nodes
– uses an edge from X to not yet vistited successor node Y, and then
– calculates a path Path from Y to a sink Z, which does not visit Y and
the nodes in Visited.
If no path from Y to a sink can be found, then another successor node of
X must be used (Backtracking).
• The result Nodes = [X|Path] is a simple path from X to a sink of
the graph.
916
The predicate graph_search/3 is recursive, because of its second rule:
% graph_search(+Node, +Visited, ?Path) <graph_search(X, _, [X]) :graph_sink(X).
graph_search(X, Visited, [X|Path]) :Path =
[Y1 =Y,
Visited
graph_edge(X, Y),
-X-Y
not(member(Y, Visited)),
write(user, ’->’), write(user, Y),
graph_search(Y, [Y|Visited], Path).
. . . ,Yn =Z]
-Z
Termination is ensured by the fact that already visited nodes cannot be visited
again.
917
• The initial call graph_search(X, [X], Path) calculates a
simple path from X to a sink of the graph.
– If X is a sink, then the first rule for graph_search/3 computes
Path as an empty list.
– Otherwise, the recursive, second rule choses a successor node Y using
graph_edge(X, Y), and then it continues searching from there.
• Further paths can be searched for by backtracking.
– Alternative successor nodes Y can be used in the second rule.
– In the implementation above, we can continue searching beyond a
sink by using the second rule instead of the fist one.
918
Implicit and Explicit Backtracking
In P ROLOG, backtracking is used automatically (implicitly).
In a procedural language, backtracking has to be implemented explicitly.
In a direct translation of the code above to a procedural environment, a call
graph_edge(X, Y) can only produce a single successor node Y of X –
if there is no path from Y to a sink, then the computation fails.
Moreover, at most one solution could be computed.
If we implement the graph search procedurally using explicit backtracking,
then we get more code than in P ROLOG.
919
Representation of a Graph by P ROLOG Facts
Labyrinth:
a
b
c
d
e
f
g
h
i
-
graph_arc(i,
graph_arc(i,
graph_arc(h,
graph_arc(g,
graph_arc(d,
graph_arc(d,
graph_arc(a,
graph_arc(b,
f).
h).
g).
d).
e).
a).
b).
c).
6
graph_sink(c).
920
The following rule symmetrisises the predicate graph_arc/2:
graph_edge(X, Y) :( graph_arc(X, Y)
; graph_arc(Y, X) ).
Thus, it is not necessary to explicitely list the inverse edges:
graph_edge(i,
graph_edge(f,
graph_edge(i,
graph_edge(h,
...
f).
i).
h).
i).
921
Computation
• The predicate graph_search/2 use depth first search, and it
calculates simple paths (without duplicate nodes).
• With the call graph_search(+Node, -Path), we can calculate
all simple paths from Node to a sink (graph_sink) by backtracking:
?- graph_search(i, Path).
->f->h->g->d->e->a->b->c
Path = [i, h, g, d, a, b, c]
?- graph_search(e, Path).
->d->a->b->c
Path = [e, d, a, b, c] ;
->g->h->i->f
No
922
• If we add another edge graph_arc(e, b) to the graph (i.e., we tear
down the wall between e and b), then there appears another simple path
[e, b, c] from e to the sink c.
• All results can be calculated by backtracking and findall/3:
graph_arc(e, b).
?- findall( Path,
graph_search(e, Path),
Paths ).
Paths = [[e, d, a, b, c], [e, b, c]]
Yes
923
The Meta–Predicate findall/3
Finding of all solutions for a goal:
findall( X,
goal(X),
Xs )
The D DK allows for the following equivalent set notation:
Xs <= { X | goal(X) }
Further important meta–predicates are checklist/2 and maplist/3 for
lists, as well as the predicates for loops (control structures) from the library
loops.pl (e.g., foreach-do).
924
Binary Search Trees
% search_in_tree(+Key, +Tree) <search_in_tree(Key, Tree) :parse_tree(Tree, Root, Lson, Rson),
( Key = Root
; Key < Root ->
search_in_tree(Key, Lson)
; Key > Root ->
search_in_tree(Key, Rson) ).
arguments: +: bound, -: free, ?: either bound or free
925
Search Tree
X ML Representation:
P ROLOG Representation:
<node key="5">
<node key="4"/>
<node key="9">
<node key="6"/>
<node key="10"/>
</node>
</node>
alternative P ROLOG representation:
node:[key:5]:[
node:[key:4]:[],
node:[key:9]:[
node:[key:6]:[],
node:[key:10]:[]
]
5
]
4
9
[5, [4], [9, [6], [10]]]
6
10
926
Encapsulation of the Tree Access
% parse_tree(+Tree, ?Key, ?Lson, ?Rson) <% parse_tree(?Tree, +Key, +Lson, +Rson) <parse_tree(Tree, Key, Empty, Empty) :Tree = node:[key:Key]:[],
Empty = node:[]:[].
parse_tree(Tree, Key, Lson, Rson) :Tree = node:[key:Key]:[Lson, Rson].
% binary_tree_empty(?Tree) <binary_tree_empty(node:[]:[]).
The same code for parse_tree/4 can be called both for extracting the
root key and the two subtrees of a binary tree (+,-,-,-) and for
constructing a binary tree from a key and two subtrees (-,+,+,+) .
927
Examples:
?- Tree = node:[key:5]:[
node:[key:4]:[],
node:[key:9]:[node:[key:6]:[], ...] ],
parse_tree(Tree, Key, Lson, Rson).
Key = 5,
Lson = node:[key:4]:[],
Rson = node:[key:9]:[node:[key:6]:[], ...]
?- Key = 9,
Lson = node:[key:6]:[],
Rson = node:[key:10]:[],
parse_tree(Tree, Key, Lson, Rson).
Tree = node:[key:9]:[node:[key:6]:[], node:[key:10]:[]]
928
Alternative Encapsulation of the Tree Access
parse_tree([Root, Lson, Rson], Root, Lson, Rson).
parse_tree([Root], Root, [], []).
binary_tree_empty([]).
5
4
9
6
10
Example:
?- Tree = [5, [4], [9, [6], [10]]],
parse_tree(Tree, Root, Lson, Rson).
Root = 5, Lson = [4], Rson = [9, [6], [10]]
929
% insert_into_tree(+Key, +Tree, ?New_Tree) <insert_into_tree(Key, Tree, New_Tree) :parse_tree(Tree, Root, Lson, Rson),
( Key = Root ->
New_Tree = Tree
; Key < Root ->
insert_into_tree(Key, Lson, L),
parse_tree(New_Tree, Root, L, Rson) )
; K > Root ->
insert_into_tree(Key, Rson, R),
parse_tree(New_Tree, Root, Lson, R) ).
insert_into_tree(Key, _, New_Tree) :binary_tree_empty(Empty),
parse_tree(New_Tree, Key, Empty, Empty).
If the tree is empty, then parse_tree(Tree, Root, Lson, Rson)
fails. Then the second rule builds a new tree with two empty subtrees using
parse_tree(New_Tree, Key, Empty, Empty).
930
Important Concepts
• Terms (for Data and Control Structures) and Unification
• Backtracking
• SLDNF–Resolution
P ROLOG allows for
• declarative programming,
• compact programs, and
• rapid prototyping, agile software development.
We are using the X PCE extension of S WI P ROLOG.
931
Top–Down Evaluation of P ROLOG: SLDNF–Resolution
• Like in conventional programming, P ROLOG is evaluated top–down:
a call to a predicate looks for an applicable rule with the predicate in
head and then successively calls the statements in the body.
• Unlike in conventional programming languages, there can be many such
rules, which are then used successively – comparably to the different
options of a case–statement. The evaluation of a call using a rule can
fail; then, the next applicable rule is used (backtracking). This is done
until finally the complete computation is successive.
• Since the arguments of a rule head can be partially instantiated, the
passing of paramenters is done using unification, which suitably extends
the standard way of paramenter passing.
932
• A negated call succeeds, if the corresponding positive call fails
(negation–as–finite–failure).
• Using backtracking, it is possible to compute the list of all answers to a
given call (query). This corresponds to query answering in relational
databases using S QL.
In practical P ROLOG systems, there exists a large collection of pre–defined
built–in predicates and also meta–predicates (i.e., predicates, some of whose
arguments can be predicates themselves).
Moreover, there can be side–effects – mainly for I/O and access to the
internal fact database (assert, retract).
933
Data Structures, Operations, and Control Structures
• The restriction to a few basic data types and a single complex data type,
namely the terms, which is generic and subsumes all the other types,
standardizes the data structures.
• There are no explicit type declarations.
• There exists a large collection of generic operations that are applicable to
terms – and thus to all data types.
• Frequently, meta–predicates are used.
• Actually, control structures are meta–predicates as well. In addition to
standard control structures, such as branching (if–then–else), loops (for,
while), and recursion, user–defined control structures can be built as
meta–predicates.
934
Software Engineering Aspects
P ROLOG supports abstraction and compact code, and thus stimulates
refactoring:
• The generic type of terms with generic operations supports abstraction
and code reuse.
• User–defined control structures allow for further abstraction.
• Unification, implicit backtracking, and abstaining from explicit type
declarations, result in very compact code and support rapid prototyping.
• Declarativity makes the code much more readable and thus extensible.
Switching from conventional programming languages to the logic
programming paradigm is difficult and usually requires a lot of training
and effort.
935
Disjunctive Logic Programming
Sink in a Network
Fact Base: a network is represented as node facts and arc facts.
b
a
d
c
node(a).
node(b).
node(c).
node(d).
arc(a,b); arc(a,c).
arc(b,d).
arc(c,d).
Either there exists an arc from node a to b or from a to c (disjunction).
936
Rule Base: A node X is a sink, if there is no other node Y for which there is
no transitive connection (transitive closure, tc) from Y to X.
sink(X) :node(X), not(not_sink(X)).
not_sink(X) :node(X), node(Y), X \= Y, not(tc(Y, X)).
tc(X, Y) :arc(X, Z), tc(Z, Y).
tc(X, Y) :arc(X, Y).
Query:
?- sink(X)
X = d
937
Course on Deductive Databases
Topics:
• foundations and applications of P ROLOG and DATALOG,
data modelling and programming;
• the deductive database system DD BASE;
• efficient evaluation of DATALOG programs;
• further language constructs in the D DK (D IS L OG Developers’ Kit):
– complex data structures,
– default negation and disjunction;
• applications on the basis of P ROLOG and DD BASE.
938
8.2 Semantic Web Databases
Knowledge Engineering in the Semantic Web (Web 2.0) is based on
ontologies and logic.
Reasoning Tasks:
• supporting the search (query answering);
• in knowledge engineering / modelling: analysis of the structure of the
ontologies for anomalies.
Knowledge engineering and reasoning in the Semantic Web can be supported
by deductive databases and logic programming techniques.
939
In the Semantic Web, it is possible to reason about
• the ontology / taxonomy (i.e., the schema) and
• the instances.
This is called terminological or assertional (T–Box or A–Box) reasoning,
respectively. This makes search in the Semantic Web more effective.
• In the following printer ontology, we could search for a printer from HP,
and the result could be a laser–jet printer from HP, since the system
knows that hpLaserJetPrinter is a sub–class of hpPrinter.
• It can also be derived, that all laser–jet printers from HP are no laser
writers from Apple; in this case, this is very easy, since it is explicitely
stored in the ontology.
Moreover, we will show in the following how to support knowledge
engineering by detecting anomalies in OWL ontologies.
940
The Web Ontology Language (OWL)
In OWL, we can mix concepts from
• rdf (Resource Description Framework) for defining instances and
• rdfs (rdf Schema) for defining the schema
of an application. Moreover, tags with the namespace owl are allowed.
The Semantic Web Rule Language (S WRL) incorporates logic programming
rules into OWL ontologies.
There exist well–known, powerful tools for asking queries on and for
reasoning with OWL ontologies.
941
The Printer Ontology
product
hpProduct
printer
personalPrinter
ibmLaserPrinter
laserJetPrinter
appleLaserWriter
hpPrinter
hpLaserJetPrinter
{disjoint}
hpApplePrinter
942
The Printer Ontology in OWL
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XLMSchema#"
xmlns="file:/protege/Ontologies/p.owl#">
<owl:Ontology rdf:about="">
<owl:VersionInfo> Printer Example, Version 1.3, 02.02.2013
</owl:VersionInfo> </owl:Ontology>
<owl:Class rdf:ID="printer"/>
<owl:Class rdf:ID="laserJetPrinter">
<rdfs:subClassOf rdf:resource="#printer"/> </owl:Class>
...
</rdf:RDF>
943
The following owl:Class element defines the class appleLaserWriter:
<owl:Class rdf:ID="appleLaserWriter">
<rdfs:comment>
Apple laser writers are laser jet printers
</rdfs:comment>
<rdfs:subClassOf rdf:resource="#laserJetPrinter"/>
<owl:disjointWith rdf:resource="#hpLaserJetPrinter"/>
</owl:Class>
The rdfs:subClassOf sub–element states that appleLaserWriter is a
sub–class of laserJetPrinter. The owl:disjointWith sub–element
states that appleLaserWriter is disjoint from hpLaserJetPrinter.
Observe, that we refer using the attribute rdf:resource and a “#”, whereas
the owl:Class element uses the attribute rdf:ID and no “#”.
944
The following owl:Class element defines a class of printers from a joint
venture of HP and Apple:
<owl:Class rdf:ID="hpApplePrinter">
<rdfs:comment>
Printers from a joint venture of HP and Apple
</rdfs:comment>
<rdfs:subClassOf rdf:resource="#hpLaserJetPrinter"/>
<rdfs:subClassOf rdf:resource="#appleLaserWriter"/>
</owl:Class>
The existence of such printers would contradict the disjointWith
restriction between the classes hpLaserJetPrinter and
apperLaserWriter.
The emptiness of the class hpApplePrinter can be detected by reasoners
in the ontology editor Protégé.
945
Every laserJetPrinter is a printer, and every hpPrinter is an
hpProduct:
<owl:Class rdf:ID="printer"/>
<owl:Class rdf:ID="laserJetPrinter">
<rdfs:subClassOf rdf:resource="#printer"/>
</owl:Class>
<owl:Class rdf:ID="hpProduct"/>
<owl:Class rdf:ID="hpPrinter">
<rdfs:subClassOf rdf:resource="#hpProduct"/>
</owl:Class>
946
Redundant subClassOf Relation
Since hpLaserJetPrinter is a sub–class of hpPrinter and hpPrinter
is a sub–class of hpProduct, it is redundant to explicitely state that
hpLaserJetPrinter is a sub–class of hpProduct.
<owl:Class rdf:ID="hpLaserJetPrinter">
<rdfs:subClassOf rdf:resource="#laserJetPrinter"/>
<rdfs:subClassOf rdf:resource="#hpPrinter"/>
<rdfs:subClassOf rdf:resource="#hpProduct"/>
<owl:disjointWith rdf:resource="#appleLaserWriter"/>
</owl:Class>
This redundancy is not an error. We could simply consider it as an anomaly,
that should be reported to the knowledge engineer.
This anomaly is not reported by reasoners in the ontology editor Protégé.
947
Instances
Finally, we have some instances of some of the defined classes:
<appleLaserWriter rdf:ID="1001"/>
<appleLaserWriter rdf:ID="1002"/>
<hpLaserJetPrinter rdf:ID="1003"/>
<hpLaserJetPrinter rdf:ID="1004"/>
As mentioned before, there cannot exist instances of the class
hpApplePrinter.
948
The Ontology Editor Protégé
949
The ontology editor Protégé has some plugged in reasoners, such as
• FaCT++,
• HermiT, and
• Racer.
In the session that is shown in the screenshot above, the emptiness of the class
hpApplePrinter was be detected by the ontology reasoner FaCT++.
It is inferred that the class hpApplePrinter is EquivalentTo the
empty class Nothing. By clicking the question mark, an explanation can be
shown.
There are also databases for handling rdf data, so called triple stores, such
as Sesame or Jena. They use extensions of S QL– most notably SPARQL – as
a query language.
950
Declarative Queries in F N Query
Complex X ML data structures in P ROLOG:
’owl:Class’:[’rdf:ID’:’appleLaserWriter’]:[
’rdfs:comment’:[’Apple laser ...’],
’rdfs:subClassOf’:[
’rdf:resource’:’#laserJetPrinter’]:[],
’owl:disjointWith’:[
’rdf:resource’:’#hpLaserJetPrinter’]:[] ]
An X ML element is represented as a term structure T:As:C, called
FN–triple.
• T is the tag of the element,
• As is the list of the attribute/value pairs A:V of the element, and
• C is a list of FN–triples for the sub–elements.
951
F N S ELECT
In an OWL knowledge base Owl, there exists an isa relation between two
classes C1 and C2, if a subclassOf relation is stated explicitely, or
if C1 was defined as the interesection of C2 and some other classes:
% isa(+Owl, ?C1, ?C2) <isa(Owl, C1, C2) :C := Owl/’owl:Class’::[@’rdf:ID’=C1],
( R2 := C/’rdfs:subClassOf’@’rdf:resource’
; R2 := C/’owl:intersectionOf’/’owl:Class’@’rdf:about’ ),
owl_reference_to_id(R2, C2).
% owl_reference_to_id(+Reference, ?Id) <owl_reference_to_id(Reference, Id) :( concat(’#’, Id, Reference)
; Id = Reference ).
952
Disjointness of Classes
% disjointWith(+Owl, ?C1, ?C2) <disjointWith(Owl, C1, C2) :R2 := Owl/’owl:Class’::[@’rdf:about’=R1]
/’owl:disjointWith’@’rdf:resource’,
owl_reference_to_id(R1, C1),
owl_reference_to_id(R2, C2).
In the following, we often suppress the ontology argument Owl.
Transitive Closure of isa
% subClassOf(?C1, ?C2) <subClassOf(C1, C2) :isa(C1, C2).
subClassOf(C1, C2) :isa(C1, C), subClassOf(C, C2).
953
Anomalies in Ontologies
Cycle
?- isa(C1, C2), subClassOf(C2, C1).
C1 = personalPrinter,
C2 = printer
Partition Error
?- disjointWith(C1, C2),
subClassOf(C, C1), subClassOf(C, C2).
C = hpApplePrinter,
C1 = hpLaserJetPrinter,
C2 = appleLaserWriter
The class C is a sub–class of two disjoint classes C1 and C2.
954
Incompleteness
?- isa(C1, C), isa(C2, C), isa(C3, C),
disjointWith(C1, C2), not(disjointWith(C2, C3)).
C
C1
C2
C3
=
=
=
=
laserJetPrinter,
hpLaserJetPrinter,
appleLaserWriter,
ibmLaserPrinter
The class C has three sub–classes C1, C2 and C3, from which only the two
sub–classes C1 and C2 are declared as disjoint in the knowledge base.
The fact that C2 and C3 are disjoint and that C1 and C3 are disjoint as well,
possibly was forgotten by the knowledge engineer during the creation of the
knowledge base.
955
Redundant subClassOf/instanceOf Relations
% redundant_isa(?Chain) <redundant_isa(C1->C2->C3) :isa(C1, C2), subClassOf(C2, C3),
isa(C1, C3).
?- redundant_isa(Chain).
Chain = hpLaserJetPrinter -> hpPrinter -> hpProduct
The sub–class relation between C1 and C3 can be derived by transitivity over
the class C2.
Here, isa(C1, C2), subClassOf(C2,
done over at least two levels.
C3),
requires that this deduction is
956
Undefined Reference
During the development of an ontology in OWL, it is possible that we
reference a class that we have not yet defined.
% undefined_reference(+Owl, ?Ref) <undefined_reference(Owl, Ref) :rdf_reference(Owl, Ref),
not(owl_class(Owl, Ref)).
rdf_reference(Owl, Ref) :( R := Owl/descendant_or_self::’*’@’rdf:resource’
; R := Owl/descendant_or_self::’*’@’rdf:about’ ),
owl_reference_to_id(R, Ref).
owl_class(Owl, Ref) :Ref := Owl/’owl:Class’@’rdf:ID’.
If we load such an ontology into Protégé, then the ontology reasoners may
produce wrong results, even for unrelated parts of the ontology.
957
8.3 Object–Oriented Databases
Application Domains
• engineering (CAD/CAM, CIM)
• image and graphics databases
• scientific applications
• geo–databases
• multimedia systems
• integration of heterogenous databases
958
Influences and concepts from other areas of computer science:
• programming languages:
abstract data typs and encapsulation
completeness (w.r.t. expressivity)
• software engineering:
modularisation, code extensibility and reuse
• artificial intelligence:
concepts for knowledge representation, classification
• databases:
(semantic) data modelling
959
8.3.1 Complex Objects
Every object has a unique object identifier OID.
This value is invisible for the user. It is only used internally by the system to
identify an object and to allow for references between different objects.
An object o is represented by a triple h i, c, v i:
• i is the unique object identifier,
• c is a type constructor,
• v is the value of o.
Type Constructors: atom, tuple, set, list, array
Domains for atomic values: integer, real, string, boolean, date, . . .
960
Given an object o = h i, c, v i.
• If c = atom, then v is an atomic value.
• If c = tuple, then
v = h a1 : i1 , a2 : i2 , . . . , an : in i
is a tuple with attribute names aj and OID’s ij .
• If c = set, then
v = { i1 , i2 , . . . , in }
is a set of OID’s ij .
• If c = list/array, then v is an ordered list / an array of OID’s.
961
A complex object can be represented by a graph.
Two complex objects o1 = h i1 , c1 , v1 i and o2 = h i2 , c2 , v2 i are called
• deeply equal, if c1 = c2 and v1 = v2 .
• shallow equal, if their graphs are isomorphic and the atomic values in
the corresponding leaves are the same.
OODDL: Object–Oriented Data Definition Language
Nowadays, complex objects are frequently stored and managed using X ML
databases.
962
Example (Complex Objects)
The complex objects o1 , o2 , and o3 contain the atomic objects o4 , o5 , and o6
as sub–objects:
o1 = h i1 , tuple, h a1 : i4 , a2 : i6 i i,
o4 = h i4 , atom, 10 i,
o5 = h i5 , atom, 10 i,
o6 = h i6 , atom, 20 i.
o1 and o2 are deeply equal; o1 and o3 are shallow equal.
963
i1:
i :
3
tuple
o1
tuple
a1
i4:
atom
a2
o4
o
i : 6
6
atom
<a 1:10, a 2 :20>
a
o3
1
o
i : 5
5
atom
a2
o6
i :
6
atom
<a :10, a 2 :20>
1
Identity after resolving the references
964
Nested Relations
Given a set U of attributes with domains dom(A), A ∈ U .
Formats R and domains dom(R) are defined recursively:
• R = (A1 , . . . , An , R1 , . . . , Rm ) with Ai ∈ U , 1 ≤ i ≤ n, and formats
Rj , 1 ≤ j ≤ m, is a format with
dom(R) = dom(A1 ) × . . . × dom(An ) × 2dom(R1 ) × . . . × 2dom(R2 ) .
• If m = 0, then R is a basic format.
A nested tuple over a format R is an element of dom(R).
A nested relation or NF2 –Relation (Non–First–Normal–Form) over R is a
subset of dom(R).
965
Example (NF2 –Relation)
formats
Children = (Cname, BDate, Sex)
Graduations = (Type, Date)
Employees = (Id, Name, Address, Children, Graduations)
NF2 –Relation over the format Employees:
Employees
Id
Name
Children
Address
Cname
100
200
Joe
Theo
LA
NY
Graduations
Bdate
Sex
Mary
120261
F
Peter
041465
M
John
082270
M
Mary
051578
F
Laura
051578
F
Type
Date
driv_lic
121255
phd_cs
021565
driv_lic
082686
966
8.3.2 Features of Object–Orientation
Encapsulation of Structure and Behaviour
In the relational data model there exist generic operatios for searching,
inserting, deleting, and updating tuples, which can be applied to arbitrary
relation schemas.
In object–oriented databases there are visible and hidden attributes.
• The visible attributes can be accessed by a declarative query language.
• The hidden attributes are accessed by sending messages (message
passing) between the objects.
Each object type “has” integrity conditions, which are realised in the access
operations.
967
Type and Class Hierarchies, Inheritance
A type is given by its
• type name,
• attributes, and
• operations (methods).
As a generalization of attribute and method we use the term function.
968
A type hierarchy is an acyclic, binary relation of the set of types:
Person
Student
?
Grad_Student
R
Faculty
Supertype
?
Subtype
specialization: ↓
generalization: ↑
969
A sub–type inherits the functions of the super–type (inheritance).
Additionally, the sub–type has its own functions.
→ multiple inheritance, selective inheritance
A class is a set of objects, which usually are of the same type.
Usually, the set of all stored objects of each type forms a class.
Classes can form hierarchies, too.
970
The type system in OODBs can be extended at run time.
Frequently, the non–standard data type BLOB (binary large object) is used for
• raster pixel pictures and
• long text strings.
These are supported as abstract data types with suitable access operations.
971
Polymorphism (Operator Overloading)
The same operator name can have different implementations.
The implementation which is suitable for a certain object is determined at run
time, when the type of the object is known (late binding).
E.g., the function “area” for calculating the area is implemented differently
for different geometrical objects.
GEOMETRY_OBJEKT: Shape, Area, Centerpoint
RECTANGLE subtype-of GEOMETRY_OBJECT
(Shape=’rectangle’): Width, Height
TRIANGLE subtype-of GEOMETRY_OBJECT
(Shape=’triangle’): Side1, Side2, Angle
CIRCLE subtype-of GEOMETRY_OBJECT
(Shape=’circle’): Radius
972
Multiple and Selective Inheritance
• Multiple inheritance occurs in a type hierarchy, if a type T is a sub–type
of several super–types T1 , . . . , Tn :
T1
T2
...
Tn
RU T
Then T inherits the functions of T1 , . . . , Tn ; this can lead to ambiguities.
• Selective inheritance occurs, if a type should only inherit some special
functions of one of its super–types T ′ . In this case, the undesired
functions are excluded (EXCEPT clause).
973
Versions and Configurations
Many database applications require the management of different versions
versions of complex objects:
• software projects
• CAD applications.
A version graph shows the relations between the different versions of an
object.
A configuration of a complex object is a composition of compatible versions
for the sub–objects.
974
8.3.3 Examples: C OMPANY and U NIVERSITY Database
In the following we will see the
1. types,
2. classes,
3. methods, and
4. some queries
for two examples.
975
The C OMPANY Database as an OODB
i8 :
tuple
DNAME
i :
5
atom
o
v
5
5
DNUMBER
MGR
i :
4
atom
i :
9
tuple
o
4
v
4
o
8
LOCATIONS
o
i : o
7
7
set
9
10
set
: o
i
10
11
set
: o
11
v
10
7
v
11
5
i :
1
atom
i : o
2
2
atom
o1
v
v
1
Houston
MANAGER
i
PROJECTS
v
9
v
Research
EMPLOYEES
2
Bellaire
i : o
3
3
atom
v
i :... i :... i :...
15
17
16
tuple
tuple
tuple
3
Sugarland
MANAGERSTARTDATE
i :
6
atom
o
6
v
6
i 13 :...
22-May-78
tuple
i 14 :...
tuple
i 12 :...
tuple
976
Complex Objects
o1
=
h i1 , atom, Houston i,
o2
=
h i2 , atom, Bellaire i,
o3
=
h i3 , atom, Sugarland i,
o4
=
h i4 , atom, 5 i,
o5
=
h i5 , atom, Research i,
o6
=
h i6 , atom, 22-May-78 i,
o7
=
h i7 , set, { i1 , i2 , i3 } i,
o8
=
h i8 , tuple, h DNAME : i5 , DNUMBER : i4 , MGR : i9 ,
LOCATIONS : i7 , EMPLOYEES : i10 , PROJECTS : i11 i i,
o9
=
h i9 , tuple, h MANAGER : i12 , MANAGERSTARTDATE : i6 i i,
o10
=
h i10 , set, { i12 , i13 , i14 } i,
o11
=
h i11 , set, { i15 , i16 , i17 } i, . . .
977
Data Types
define type Date:
tuple( year: integer, month: integer, day: integer );
define type Employee:
tuple( name: string, ssn: string,
birthdate: Date, sex: char, dept: Department );
define type Department:
tuple( dname: string, dnumber: integer,
mgr: tuple( manager: Employee, startdate: Date ),
locations: set(string),
employees: set(Employee),
projects: set(Project) );
978
Classes
define class Employee:
type tuple( name: string,
ssn: string,
birthdate: Date,
sex: char,
dept: Department );
operations
age(e: Employee): integer,
create_new_emp: Employee,
destroy_emp(e: Employee): boolean;
979
define class Department:
type tuple ( dname: string, dnumber: integer,
mgr: tuple( manager: Employee, startdate: Date ),
locations: set (string),
employees: set (Employee),
projects: set (Project) );
operations
number_of_emps(d: Department): integer,
create_new_dept: Department,
destroy_dept(d: Department): boolean,
add_emp(d: Department, e: Employee): boolean,
remove_emp(d: Department, e: Employee): boolean;
980
define class DepartmentSet:
type set (Department);
operations
create_dept_set: DepartmentSet,
destroy_dept_set(ds: DepartmentSet): boolean,
add_dept(ds: DepartmentSet, d: Department): boolean,
remove_dept(ds: DepartmentSet, d: Department): boolean;
persistent name AllDepartments: DepartmentSet;
/* AllDepartments is a persistent named object of type set(Department) */
...
d := create_new_dept;
/* creates new department object in the variable d */
b := add_dept(AllDepartments, d);
/* makes d persistent by adding it to a persistent named object */
981
The U NIVERSITY Database as an OODB
Data Types
type Phone: tuple(
area_code: integer,
number: integer );
type Date: tuple(
year: integer,
month: integer,
day: integer );
982
Classes
class Person
type tuple(
ssn: string,
name: tuple( firstname: string, middlename: string, lastname: string ),
address: tuple( number: integer, street: string, apt_no: string,
city: string, state: string, zipcode: string ),
birthdate: Date,
sex: character );
method age: integer
end
983
class Student inherit Person
type tuple(
class: string, majors_in: Department, minors_in: Department,
registered_in: set(Section),
transcript: set ( tuple(
grade: character, ngrade: real, section: Section ) ) );
method grade_point_average: real, change_class: boolean,
change_major(new_major: Department): boolean;
end
class Grad_Student inherit Student
type tuple(
degrees: set ( tuple ( college: string, degree: string, year: integer ) ),
advisor: Faculty );
end
984
class Faculty inherit Person
type tuple(
salary: real, rank: string, foffice: string, fphone: Phone,
belongs_to: set(Department),
grants: set(Grants), advises: set(Student) ),
method promote_faculty, give_raise(percent: real),
end
class Department
type tuple(
dname: string, office: string, dphone: Phone,
members: set(Faculty), majors: set(Student),
chairperson: Faculty, courses: set(Course) ),
method add_major(s: Student), remove_major(s: Student):boolean
end
985
class Section
type tuple(
sec_num: integer, qtr: Quartar, year: Year,
students: set ( tuple( stud: Student, grade: character ) ),
course: Course,
teacher: Instructor ),
method change_grade(s: Student, g: character);
end
class Course
type tuple(
cname: string, cnumber: string, cdescription: string,
sections: set(Section),
offering_dept: Department );
end
986
Methods
method body age: integer in class Person
{
int a;
Date d;
d=today();
a=d->year - self->birthdate->year;
if ((d->month < self->birthdate->month) ||
(d->month == self->birthdate->month) &&
(d->day < self->birthdate->day))
--a;
return a;
}
987
method body grade_point_average: real in class Student
{
float sum=0.0;
int count=0;
struct {
char gr;
float ngrade;
o2_Section sec;
} t;
for (t in self->transcript) {
/* increment sum by ngrade, count by 1 */
sum += t->ngrade; ++count;
}
return sum/count;
}
988
method body
change_major (new_major: Department): boolean in class Student
{
if (self->majors_in->remove_major(self)) {
return 0;
}
else {
new_major->add_major(self);
self->majors_in=new_major;
return 1;
}
}
989
method body
remove_major(s: Student): boolean in class Department
{
if (s in self->majors) {
/* –= apply set difference to remove object s from set of majors */
self->majors –= set(s);
return 1;
}
else return 0;
}
990
method body
add_major(s: Student) in class Department
{
/* += apply set union to add object s to set of majors */
self->majors += set(s);
}
/* a persistent root to hold all persistent Person objects */
name All_Persons: set(Person);
/* a persistent root to hold a single Person object */
name John_Smith: set(Person);
991
run body {
/* create a new Person object p */
o2 Person p = new Person;
*p = tuple (
ssn: ”222222222”,
name: tuple(
firstname: ”Franklin”, middlename: ”T”, lastname: ”Wong” ),
address: tuple( number: 638, street: ”Voss Road”,
city: ”Houston”, state: ”Texas”, zipcode: ”77079” ),
birthdate: tuple( year: 1945, month: 12, day: 8 ),
sex: M );
/* p becomes persistent by attaching to persistent root */
All_Persons += set(p);
992
/* now put values in persitent named object John_Smith */
John_Smith->ssn=”444444444”,
John_Smith->name: tuple(
firstname: ”John”, middlename: ”B”, lastname: ”Smith”),
John_Smith->address: tuple( number: 731, street: ”Fondren Road”,
city: ”Houston”, state: ”Texas”, zipcode: ”77036” ),
John_Smith->birthdate: tuple( year: 1955, month: 1, day: 9 ),
John_Smith->sex:M;
}
993
Queries
select tuple (
fname: s.name.firstname, lname: s.name.lastname )
from s in Student
where s.majors_in.dname = ”Computer Science”
select tuple(
fname: s.name.firstname, lname: s.name.lastname,
transcript: select tuple(
cname: sc.section.course.cname, sec_no: sc.section.sec_num,
quarter: sc.section.qtr, year: sc.section.year, grade: sc.grade )
from sc in sec )
from s in Student, sec in s.transcript
where s.majors_in.dname = ”Computer Science”
994

8 Extended Database Concepts

Transcription

Documents pareils

An angler´s paradise

European Master of Social Work and Social Economics (Sowosec

Presentation of Prof. Dr. Christian Daniel Assoun

Das Theodor Bilharz Research Institute, Ministry of High

Analyse de Tissu Biologique Histopathologie Numérique

Cancer Immunotherapies

Sagittal Balance Course

Dr. Luc Arnal “Neural systems and oscillatory organization

Physics of cellular form and function - Societas physico