8 Extended Database Concepts
Transcription
8 Extended Database Concepts
Vorlesung Datenbanken 8 Wintersemester 2012/13 Extended Database Concepts DOOD • deductive and • object–oriented databases DOOD databases offer advanced features for • data modelling and • database programming for complex data structures. Prof. Dr. Dietmar Seipel 854 Vorlesung Datenbanken Wintersemester 2012/13 8.1 Deductive Databases and Logic Programming The ease of handling the data structure of terms and the powerful built–in control structure of backtracking are features that distinguish P ROLOG from other programming languages. P ROLOG is very well–suited for embedded database programming. In the database context, frequently a restricted version is used, which is called DATALOG – the basis of deductive databases. • P ROLOG and DATALOG are declarative languages; they can access databases and X ML documents. • Relations and complex objects (like, e.g., X ML documents) can be represented as term structures. • With the help of declarative rules, we can represent integrity constraints and inference rules for deriving conclusions from given information. Prof. Dr. Dietmar Seipel 855 Vorlesung Datenbanken 8.1.1 Wintersemester 2012/13 P ROLOG as a Database Language 1. P ROLOG can be used for representing tables from relational databases. The tuples of a table become P ROLOG facts with the same predicate symbol – usually, the table name is used. 2. The data dictionary of a relational database can also be represented using P ROLOG facts. This can be done using P ROLOG terms that correspond to an X ML representation of the data dictionary. 3. Queries and integrity constraints can be represented as P ROLOG rules. Conjunctive queries are posed in the form of P ROLOG goals, which are then evaluated using the P ROLOG rules. 4. DATALOG is a restricted version of P ROLOG, which ensures termination and the efficient evaluation of recursive queries. 5. The deductive database system DD BASE combines P ROLOG and DATALOG. Prof. Dr. Dietmar Seipel 856 Vorlesung Datenbanken Wintersemester 2012/13 Database Tables in MyS QL: E MPLOYEE FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO John B Smith 444444444 1955-01-09 731 Fondren, Houston, TX M 30000 222222222 5 Franklin T Wong 222222222 1945-12-08 638 Voss, Houston, TX M 40000 111111111 5 Alicia J Zelaya 777777777 1958-07-19 3321 Castle, Spring, TX F 25000 333333333 4 Jennifer S Wallace 333333333 1931-06-20 291 Berry, Bellaire, TX F 43000 111111111 4 Ramesh K Narayan 555555555 1952-09-15 975 Fire Oak, Humble, TX M 38000 222222222 5 Joyce A English 666666666 1962-07-31 5631 Rice, Houston, TX F 25000 222222222 5 Ahmad V Jabbar 888888888 1959-03-29 980 Dallas, Houston, TX M 25000 333333333 4 James E Borg 111111111 1927-11-10 450 Stone, Houston, TX M 55000 NULL 1 A database table p can be represented by a set of P ROLOG facts, namely one fact p(t1 , . . . , tn ) for each tuple (t1 , . . . , tn ) of the table. Prof. Dr. Dietmar Seipel 857 Vorlesung Datenbanken Wintersemester 2012/13 W ORKS _O N ESSN PNO HOURS 111111111 20 NULL 222222222 2 10.0 222222222 3 10.0 333333333 20 15.0 PNAME PNUMBER PLOCATION DNUM 333333333 30 20.0 ProductX 1 Bellaire 5 444444444 1 32.5 ProductY 2 Sugarland 5 444444444 2 7.5 ProductZ 3 Houston 5 555555555 3 40.0 Computerization 10 Stafford 4 666666666 1 20.0 Reorganization 20 Houston 1 666666666 2 20.0 Newbenefits 30 Stafford 4 777777777 10 10.0 777777777 30 30.0 888888888 10 35.5 888888888 30 5.0 Prof. Dr. Dietmar Seipel P ROJECT 858 Vorlesung Datenbanken Wintersemester 2012/13 Database Tables in P ROLOG: employee(’John’, ’B’, ’Smith’, 444444444, 1955-01-09, ’731 Fondren, Houston, TX’, ’M’, 30000, 222222222, 5). employee(’Franklin’, ’T’, ’Wong’, ...). ... works_on(444444444, 1, 32.5). works_on(444444444, 2, 7.5). ... department(’Research’, 5, 222222222, 1978-05-22). ... project(’ProductX’, 1, ’Bellaire’, 5). ... We do not quote the date values. Then, they are terms, and we can access their components more conveniently without string parsing. Prof. Dr. Dietmar Seipel 859 Vorlesung Datenbanken Wintersemester 2012/13 Export from MyS QL to X ML Using the P ROLOG library D DK, we can also export a MyS QL database or table to X ML: ?- mysql_database_to_xml(mysql, company, Xml), dwrite(xml, Xml). <database name="company"> <table name="employee"> ... </table> <table name="works_on"> <row ESSN="111111111" PNO="20" HOURS="0.0"/> <row ESSN="222222222" PNO="2" HOURS="10.0"/> ... </table> ... </database> ?- mysql_database_table_to_xml( mysql, company:employee, Xml). Prof. Dr. Dietmar Seipel 860 Vorlesung Datenbanken Wintersemester 2012/13 Data Dictionary Using the P ROLOG library D DK, we can export the data dictionary of a relational database from MyS QL to an X ML representation: ?- mysql_database_schema_to_xml(company, Xml), dwrite(xml, Xml). This produces an X ML element with one table sub–element for every table: <database name="company"> <table name="department"> ... </table> <table name="employee"> ... </table> <table name="dependent"> ... </table> <table name="dept_locations"> ... </table> <table name="project"> ... </table> <table name="works_on"> ... </table> </database> Prof. Dr. Dietmar Seipel 861 Vorlesung Datenbanken Wintersemester 2012/13 <table name="employee"> <attribute name="FNAME" type="varchar(15)" is_nullable="NO"/> <attribute name="MINIT" type="char(1)" is_nullable="YES"/> <attribute name="LNAME" type="varchar(15)" is_nullable="NO"/> <attribute name="SSN" type="varchar(9)" is_nullable="NO"/> <attribute name="BDATE" type="date" is_nullable="YES"/> <attribute name="ADDRESS" type="varchar(30)" is_nullable="YES"/> <attribute name="SEX" type="char(1)" is_nullable="YES"/> <attribute name="SALARY" type="decimal(10,2)" is_nullable="YES"/> <attribute name="SUPERSSN" type="varchar(9)" is_nullable="YES"/> <attribute name="DNO" type="int(11)" is_nullable="NO"/> <primary_key> <attribute name="SSN"/> </primary_key> <foreign_key> <attribute name="SUPERSSN"/> <references table="employee"> <attribute name="SSN"/> </references> </foreign_key> <foreign_key> <attribute name="DNO"/> <references table="department"> <attribute name="DNUMBER"/> </references> </foreign_key> </table> Prof. Dr. Dietmar Seipel 862 Vorlesung Datenbanken Wintersemester 2012/13 Data Dictionary as a P ROLOG Term table:[name:employee]:[ attribute:[name:’FNAME’, type:’varchar(15)’, is_nullable:’NO’]:[], attribute:[name:’MINIT’, ...]:[], attribute:[name:’LNAME’, ...]:[], attribute:[name:’SSN’, ...]:[], ... attribute:[name:’SUPERSSN’, ...]:[], attribute:[name:’DNO’, ...]:[], primary_key:[ attribute:[name:’SSN’]:[] ], ... foreign_key:[ attribute:[name:’DNO’]:[], references:[table:’department]:[ attribute:[name:’DNUMBER’]:[] ] ] ] This P ROLOG representation of X ML can be queried and transformed using the D DK library F N Query. Prof. Dr. Dietmar Seipel 863 Vorlesung Datenbanken Wintersemester 2012/13 A foreign key foreign_key:[ attribute:[name:A1]:[], ..., attribute:[name:An]:[], references:[table:T]:[ attribute:[name:B1]:[], ..., attribute:[name:Bn]:[] ] ] can be represented in short form as [A1,...,An] -> T:[B1,...,Bn]. Then, the list of all foreign keys becomes a P ROLOG term foreign_keys:[fk1,...,fkm]. Similarly, the list of attributes and the primary key can be simplified to a short form. DD BASE stores a P ROLOG fact schema(table:[...]:[...]) with the simplified term representation for every database table. Prof. Dr. Dietmar Seipel 864 Vorlesung Datenbanken Wintersemester 2012/13 Data Dictionary as P ROLOG Facts (Short Form) schema( table:[name:employee, database:company]:[ attributes:[’FNAME’, ’MINIT’, ’LNAME’, ’SSN’, ’BDATE’, ’ADDRESS’, ’SEX’, ’SALARY’, ’SUPERSSN’, ’DNO’], primary_key:[’SSN’], foreign_keys:[ [’SUPERSSN’]->employee:[’SSN’], [’DNO’]->department:[’DNO’] ] ] ). schema( table:[name:works_on, database:company]:[ attributes:[’ESSN’, ’PNO’, ’HOURS’], primary_key:[’ESSN’, ’PNO’], foreign_keys:[ [’ESSN’]->employee:[’SSN’], ’[PNO’]->project:[’PNO’] ] ] ). schema( table:[name:department, ...]:[...] ). schema( table:[name:project, ...]:[...] ). Prof. Dr. Dietmar Seipel 865 Vorlesung Datenbanken Wintersemester 2012/13 Views and Queries vs. Rules and Goals • S QL V IEW: CREATE SELECT FROM WHERE AND VIEW QUERY_1 AS LNAME, PNAME, HOURS EMPLOYEE, WORKS_ON, PROJECT EMPLOYEE.SSN = WORKS_ON.ESSN PROJECT.PNUMBER = WORKS_ON.PNO • P ROLOG rule: query_1(LNAME, PNAME, HOURS) :employee(_,_, LNAME, SSN, _,_,_,_,_,_), project(PNAME, P, _,_), works_on(SSN, P, HOURS). Prof. Dr. Dietmar Seipel 866 Vorlesung Datenbanken Wintersemester 2012/13 • S QL S ELECT: SELECT * FROM QUERY_1 The S ELECT statement calls the view. • P ROLOG goal: ?- query_1(LNAME, PNAME, HOURS). The query is submitted to the P ROLOG interpreter as a goal. The goal corresponds to the S ELECT statement calling the view. Prof. Dr. Dietmar Seipel 867 Vorlesung Datenbanken Wintersemester 2012/13 Recursive Queries: Transitive Closure (Version 1) The following recursive rule set derives the transitive supervisor relation on the social security numbers: supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_2). supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_3), supervisor(SSN_3, SSN_2). SSN_1 ?direct s. SSN_3 supervisor ? SSN_2 direct_supervisor(SSN_1, SSN_2) :employee(_, _, _, SSN_2, _, _, _, _, SSN_1, _). The following query assigns names to the social security numbers: query_2(F1-M1-L1, F2-M2-L2) :supervisor(SSN_1, SSN_2), employee(F1, M1, L1, SSN_1, _, _, _, _, _, _), employee(F2, M2, L2, SSN_2, _, _, _, _, _, _). Prof. Dr. Dietmar Seipel 868 Vorlesung Datenbanken Wintersemester 2012/13 Transitive closure queries cannot be formulated in standard S QL systems. Some relational database systems, however, offer limited forms of recursion – cf. S QL–99. CREATE SELECT FROM UNION SELECT FROM WHERE RECURSIVE VIEW supervisor(Emp, Sup) AS Emp, Sup direct_supervisor D.Emp, S.Sup direct_supervisor D, supervisor S D.Sup = S.Emp This assumes a table direct_supervisor with the attributes Emp and Sup. Obviously, this S QL implementation is structurally equivalent to the following shorter rule implementation (“;” means “or”). supervisor(Emp, Sup) :( direct_supervisor(Emp, Sup) ; direct_supervisor(Emp, X), supervisor(X, Sup) ). Prof. Dr. Dietmar Seipel 869 Vorlesung Datenbanken Wintersemester 2012/13 Further Applications of Recursion • computation of aggregate functions • parts–of–list resolution Meta–Predicates: Transitive Closure (Version 2) Using the generic meta–predicate transitive_closure, the previous two rules for supervisor can be replaced by a single and much more compact and abstract rule: supervisor(SSN_1, SSN_2) :transitive_closure( direct_supervisor, SSN_1, SSN_2 ). Prof. Dr. Dietmar Seipel 870 Vorlesung Datenbanken Wintersemester 2012/13 Aggregation Queries The meta–predicate ddbase_aggregate/3 in the following query groups over the employees: • for every employee – given by FNAME,MINIT,LNAME,SSN – • the corresponding list of all tuples [PNO,HOURS] is computed: ?- ddbase_aggregate( [F, M, L, S, list([P,H])], ( works_on(S, P, H), employee(F, M, L, S, _,_,_,_,_,_) ), Tuples ), Attributes = [’FNAME’,’MINIT’,’LNAME’,’SSN’,’[PNO,HOURS]’], xpce_display_table(Attributes, Tuples). The result is displayed as a table in the X PCE extension of S WI P ROLOG. Prof. Dr. Dietmar Seipel 871 Vorlesung Datenbanken Wintersemester 2012/13 Tuples = [ [’Ahmad’, ’V’, ’Jabbar’, ’888888888’, [[10, 35.5], [30, 5.0]]], [’Alicia’, ’J’, ’Zelaya’, ’777777777’, [[10, 10.0], [30, 30.0]]], ... ] Thus, DD BASE can produce nested (NF2 ) tables, which is not possible in S QL. Prof. Dr. Dietmar Seipel 872 Vorlesung Datenbanken Wintersemester 2012/13 Transitive Closure (Version 3) We can also compute the list of subordinates for each employee in P ROLOG: ?- findall( Boss-Emp, ( employee(_,_,_, Emp, _,_,_,_, Boss, _), Boss \= ’$null$’ ), Edges ), edges_to_ugraph(Edges, Graph), transitive_closure(Graph, Tc_Graph). Tc_Graph = [ ’111111111’-[’222222222’, ’333333333’, ’444444444’, ’555555555’, ’666666666’, ’777777777’, ’888888888’], ’222222222’-[’444444444’, ’555555555’, ’666666666’], ’333333333’-[’777777777’, ’888888888’], ’444444444’-[], ’555555555’-[], ’666666666’-[], ’777777777’-[], ’888888888’-[] ]. Prof. Dr. Dietmar Seipel 873 Vorlesung Datenbanken Wintersemester 2012/13 Firstly, we compute a list Edges of pairs Boss-Emp of social security numbers in DD BASE, such that Boss is the boss of Emp and Boss is not the NULL value. Secondly, we transform Edges to an adjacency representation Graph using the predicate edges_to_ugraph/2 from S WI P ROLOG: Graph = [ ’111111111’-[’222222222’, ’333333333’], ’222222222’-[’444444444’, ’555555555’, ’666666666’], ’333333333’-[’777777777’, ’888888888’], ’444444444’-[], ’555555555’-[], ’666666666’-[], ’777777777’-[], ’888888888’-[] ]. Thirdly, the predicate transitive_closure/2 from S WI P ROLOG computes the transitive closure of Graph. It infers, e.g., that ’111111111’ is the transitive supervisor of all the other employees. Prof. Dr. Dietmar Seipel 874 Vorlesung Datenbanken Wintersemester 2012/13 In P ROLOG, the edges of a graph G = (N, E), where • nodes N = { a, . . . , d } and • edges E = { (a, b), (b, c), (c, a), (c, d) }, can be represented as a list Edges = [ a-b, b-c, c-a, c-d ]. G: aY ? b *c -d In S WI P ROLOG, the call edges_to_ugraph(Edges, Graph) converts Edges to an adjacency list representation Graph = [ a-[b], b-[c], c-[a,d], d-[] ]. For every node V, a tuple V-Vs is given, such that Vs consists of all successor nodes of V. Prof. Dr. Dietmar Seipel 875 Vorlesung Datenbanken Wintersemester 2012/13 Termination Issues in P ROLOG and DATALOG • Version 1 can be evaluated both in P ROLOG and in DATALOG. The DATALOG evaluation always terminates, whereas the P ROLOG evaluation is only suitable for acyclic graphs; it may not terminate for cyclic graphs. • The Versions 2 and 3 can only can be evaluated in P ROLOG. The predicates transitive_closure/3 from the D DK and transitive_closure/2 from S WI P ROLOG ensure termination for arbitrary graphs. Graph Representations The Versions 1 and 2 work on facts. Version 3 works on a list representation of the graph edges. Prof. Dr. Dietmar Seipel 876 Vorlesung Datenbanken Wintersemester 2012/13 Basic Syntax of P ROLOG Constant Symbol: a, 10, ’Smith, John B.’ Variable Symbol: X, Lname (starts with a capital letter) Term: f (t1 , . . . , tn ), with function symbol f and terms ti a, X (constant and variable symbols are terms), f(g(a,b),X,10), a*(b+c) (complex terms), [LNAME, . . . , DNO] (this is a list) Predicate Symbol: employee, attributes, query_1, transitive_closure Atom: p(t1 , . . . , tn ), with predicate symbol p and terms ti . Prof. Dr. Dietmar Seipel 877 Vorlesung Datenbanken Wintersemester 2012/13 Terms in Infix / Prefix Form • The infix term 1955-01-09 representing a date has the prefix form -(-(1955,01),09). • The infix term a*(b+c) representing an arithmetic expression has the prefix form *(a,+(b,c)). The operator trees for the terms above are given in the following: - 1955 Prof. Dr. Dietmar Seipel * R 09 a R R 01 b + R c 878 Vorlesung Datenbanken Wintersemester 2012/13 Term Representation for X ML An X ML element <table name="employee"> <attribute name="FNAME"/> </table> can be represented by a complex term in field notation (FN): table:[name:employee]:[ attribute:[name:’FNAME’]:[] ]. This infix form is using the binary functor ”:”. The sub–term name:employee could be equivalently represented in prefix form as :(name, employee). Lists are denoted as ”[X1 ,...,Xn ]”, and ”[]” is the empty list – above the list of sub–elements of the attribute element is empty. Prof. Dr. Dietmar Seipel 879 Vorlesung Datenbanken Wintersemester 2012/13 Term Representation for Lists In term notation, a non–empty list is represented as .(X, Xs), where • X is the first element (head) and • Xs represents the rest of the list (tail). The list functor ”.” is binary, and the empty list is given by ”[]”. [b] = .(b, []) [a, b] = .(a, []) = .(a, .(b, [])) For communicating lists with the user, P ROLOG uses the compact list notation [X1 ,...,Xn ], which is called syntactic sugar. It helps the user to better comprehend the list. Prof. Dr. Dietmar Seipel 880 Vorlesung Datenbanken Wintersemester 2012/13 When an infix operator ⊙ is used multiple times in a term a ⊙ b ⊙ c, then there are rules in P ROLOG that determine whether a and b or b and c are joined first in the prefix form. • The infix term 1955-01-09 representing a date has the prefix form -(-(1955,01),09). • The infix term T:As:Es representing an X ML element has the prefix form :(T,:(As,Es)). The operator trees for the terms above are given in the following: - 1955 Prof. Dr. Dietmar Seipel : R 09 R 01 T R As : R Es 881 Vorlesung Datenbanken Wintersemester 2012/13 Thus, the term attribute:[name:’FNAME’]:[], which is equivalent to :(attribute, :(.(:(name,’FNAME’), []), [])), has the following operator tree: Prof. Dr. Dietmar Seipel 882 Vorlesung Datenbanken Wintersemester 2012/13 Facts, Rules, and Goals Literal: atom A oder negated atom not(A) Fact: A with atom A; e.g., employee(’John’, ’B’, ’Smith’, ...) Rule: A :- B1 , . . . , Bm |{z} {z } | head body with atom A and literals Bi , example later Goal: :- B1 , . . . , Bm with literals Bi A set of facts for the same predicate symbol corresponds to a relation in databases. Rules generalize views. Goals are used for expressing queries. Prof. Dr. Dietmar Seipel 883 Vorlesung Datenbanken Wintersemester 2012/13 Argument Positions vs. Field Notation (FN) • Like in other programming languages, the arguments ti of an atom p(t1 , . . . , tn ) are handed over by position in P ROLOG. E.g., in works_on(S, P, H), the first position t1 = S is the social security number of an employee who has worked on the project with the number t2 = P (second position) for t3 = H hours (third position). • In the database context, we could use a meta–interpreter for accessing arguments in field notation – in a more abstract way – by their corresponding attribute name. Then, according to the database schema, works_on(’PNO’:P, ’ESSN’:S) means that the employee with the social security number S has worked on the project with the number P, independently of the order of the arguments – and it is, e.g., not necessary to refer to the hours. Prof. Dr. Dietmar Seipel 884 Vorlesung Datenbanken Wintersemester 2012/13 Integrity Constraints in P ROLOG • Primary Key Constraint for Employee: primary_key_violation(employee, X, Y) :X = employee(_,_,_, SSN, _,_,_,_,_,_), Y = employee(_,_,_, SSN, _,_,_,_,_,_), call(X), call(Y), X \= Y. • Foreign Key Constraint for Employee: foreign_key_violation( employee(’DNO’), department(’DNUMBER’), X) :X = employee(_,_,_,_,_,_,_,_,_, DNO), call(X), not(department(_, DNO, _,_)). In DD BASE, the primary and foreign key contraints of a relational database are transformed to such rules, which are then tested on database updates. Prof. Dr. Dietmar Seipel 885 Vorlesung Datenbanken Wintersemester 2012/13 In a less elegant, naive implementation, we have to assign variable symbols for all the argument positions of the two violating employee facts: • Primary Key Constraint for Employee: primary_key_violation(employee, X, Y) :employee(A,B,C, SSN, D,E,F,G,H,I), employee(J,K,L, SSN, M,N,O,P,Q,R), X = employee(A,B,C, SSN, D,E,F,G,H,I), Y = employee(J,K,L, SSN, M,N,O,P,Q,R), X \= Y. Moreover, we have to repeat all these variable symbols when we define the return arguments X and Y of the call primary_key_violation(employee, X, Y). The many variable symbols and their repetition makes the rule more error–prone. Prof. Dr. Dietmar Seipel 886 Vorlesung Datenbanken Wintersemester 2012/13 In the shorter, first primary key rule above, we use the templates X = employee(_,_,_, SSN, _,_,_,_,_,_), Y = employee(_,_,_, SSN, _,_,_,_,_,_), to avoid the naming and the repeated writing of all the arguments. call(X), call(Y), X \= Y. calls the templates in the P ROLOG database and tries to assign values to all argument positions – even the ones with anonymous variables “_” – and tests if X and Y represent two different database tuples. If the primary key constraint is violated, then the instantiated templates are returned. Analogously, we proceed for the foreign key constraint. As a general purpose programming language, P ROLOG offers a great functionality for defining integrity constraints. Prof. Dr. Dietmar Seipel 887 Vorlesung Datenbanken Wintersemester 2012/13 Semantic Constraints in Field Notation (FN) • No employee should earn more than his manager: trigger(salary, X, Y) :employee(’SSN’:X, ’SALARY’:S1, ’SUPERSSN’:Y), employee(’SSN’:Y, ’SALARY’:S2), S1 > S2. • Which employee works on a foreign project ? trigger(employee_works_on_foreign_project, E, P) :works_on(’ESSN’:E, ’PNO’:P), employee(’SSN’:E, ’DNO’:D1), project(’PNUMBER’:P, ’DNUM’:D2), D1 \= D2. FN abstracts from argument positions: employee(’SSN’:E, ’DNO’:D1) corresponds to employee(_,_,_, E, _,_,_,_,_, D1). Prof. Dr. Dietmar Seipel 888 Vorlesung Datenbanken Wintersemester 2012/13 Bottom–Up Evaluation of DATALOG • The set of all given facts for a predicate corresponds to a relation. • A rule without function symbols corresponds to a VIEW statement defining a relation for the head predicate. • The relations for the body predicates are derived using rules themselves. Thus, it can happen that a rule transitively helps to derive tuples for one of its body predicates (recursion). E.g., the second rule for supervisor is directly recursive. • The bottom–up evaluation iteratively enlarges the relations for the predicates by repeatedly evaluating all rules until a fixpoint is reached. Thus, e.g., all transitive supervisors can be derived, which is provably not possible using standard S QL. Prof. Dr. Dietmar Seipel 889 Vorlesung Datenbanken Wintersemester 2012/13 Example (Recursion and Transitive Closure) 111111111 j 222222222 ? 333333333 j R 444444444 555555555 666666666 777777777 888888888 The following recursive rule set derives the transitive supervisor relation on the social security numbers: supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_2). supervisor(SSN_1, SSN_2) :direct_supervisor(SSN_1, SSN_3), supervisor(SSN_3, SSN_2). direct_supervisor(SSN_1, SSN_2) :employee(_,_,_, SSN_2, _,_,_,_, SSN_1, _). Prof. Dr. Dietmar Seipel 890 Vorlesung Datenbanken Wintersemester 2012/13 The first iteration derives the facts for direct_supervisor from the facts for employee: direct_supervisor(111111111, direct_supervisor(111111111, direct_supervisor(222222222, direct_supervisor(222222222, direct_supervisor(222222222, direct_supervisor(333333333, direct_supervisor(333333333, 222222222). 333333333). 444444444). 555555555). 666666666). 777777777). 888888888). The second iteration translates these facts to the corresponding 7 facts for supervisor. supervisor(111111111, 222222222). ... supervisor(333333333, 888888888). Prof. Dr. Dietmar Seipel 891 Vorlesung Datenbanken Wintersemester 2012/13 The third iteration derives the 5 new facts that 111111111 is the transitive (indirect) supervisor of the employees 444444444 to 888888888: supervisor(111111111, supervisor(111111111, supervisor(111111111, supervisor(111111111, supervisor(111111111, 444444444). 555555555). 666666666). 777777777). 888888888). Since the hierarchy is of limited depth 2 here, the relations corresponding to these facts could also be derived in S QL. For arbitrary hierarchies of unlimited depth, however, it is not possible to derive the transitive supervisors in S QL. Prof. Dr. Dietmar Seipel 892 Vorlesung Datenbanken Wintersemester 2012/13 In principle, all rules can be used in all iterations. But, a rule can only fire and derive facts, as soon as facts for the body atoms have been derived in previous iterations. From then on, the rule can always be used to derive the same facts. One of the purposes of efficient bottom–up evaluation is to avoid these redundant derivations – especially in the presence of recursion. The rule for query_2 fires in iteration 3 for the first time and derives 7 facts for direct supervisors: query_2(’James’-’E’-’Borg’, ’Franklin’-’T’-’Wong’). ... query_2(’Jennifer’-’S’-’Wallace’, ’Ahmad’-’V’-’Jabbar’). Finally, in iteration 4, the 5 facts for transitive supervisors are derived. Iteration 5 does not derive any new facts. Thus, a fixpoint is reached, and the iteration terminates. Prof. Dr. Dietmar Seipel 893 Vorlesung Datenbanken Wintersemester 2012/13 Comparison with S QL • Non–recursive DATALOG could be simulated in S QL by mapping the rules to View statements – or to INSERT statements whose result is computed using a SELECT statement. • Recursion brings higher expressivity to DATALOG. • There are DATALOG extensions which allow for default negation and aggregate operations as well. • The rule–based approach of DATALOG supports modularization: instead of one single, complex VIEW or SELECT statement in S QL, a set of simpler and more compact DATALOG rules can be used. The deductive database system DDBASE also supports update operations such as INSERT and DELETE, and it can connect to relational databases. Prof. Dr. Dietmar Seipel 894 Vorlesung Datenbanken Wintersemester 2012/13 8.1.2 The Deductive Database System DDBASE The deductive database system DD BASE, which is part of the D DK, can process • relational databases and • X ML documents within the same query using O DBC and F N Query, respectively: DD BASE O DBC RDB F N Query U X ML This extends database programming languages (DBPL) by X ML capabilities. Prof. Dr. Dietmar Seipel 895 Vorlesung Datenbanken Wintersemester 2012/13 O DBC The following P ROLOG rule accesses a relational database – given by the connection handle mysql – using the O DBC library of S WI P ROLOG. generate_html_table(Salary, table:Rows) :concat(’SELECT fname, minit, lname, salary \ FROM employee WHERE salary >= ’, Salary, Query), Types = [types([atom,atom,atom,integer])], findall( Row, ( odbc_query(mysql, Query, row(F,M,L,S), Types), Row = tr:[td:[F], td:[M], td:[L], td:[S]] ), Rows ). The query string Query is obtained by concatenating a partial select statement with the value for the salary. Types gives the types of the components of the result tuples. Prof. Dr. Dietmar Seipel 896 Vorlesung Datenbanken Wintersemester 2012/13 The findall Statement • The call odbc_query(mysql, Query, row(F,M,L,S), Types) returns the values F,M,L,S for the attributes fname, minit, lname, salary of the table employee. • By backtracking, the findall statement produces a list Rows of P ROLOG terms Row of the form tr:[td:[F], td:[M], td:[L], td:[S]], which represent X ML elements in F N Query. • For a given Salary, the call generate_html_table(Salary, table:Rows) produces a P ROLOG term table:Rows, which represents the following H TML table in F N Query. Prof. Dr. Dietmar Seipel 897 Vorlesung Datenbanken Wintersemester 2012/13 The generated H TML table <table> <tr><th>Fname</th><th>Minit</th><th>Lname</th><th>Salary</th></tr> <tr><td>John</td><td>B</td><td>Smith</td><td>30000</td></tr> <tr><td>Franklin</td><td>T</td><td>Wong</td><td>40000</td></tr> <tr><td>Jennifer</td><td>S</td><td>Wallace</td><td>43000</td></tr> <tr><td>Ramesh</td><td>K</td><td>Narayan</td><td>38000</td></tr> <tr><td>James</td><td>E</td><td>Borg</td><td>55000</td></tr> </table> can be rendered in a web browser: Prof. Dr. Dietmar Seipel 898 Vorlesung Datenbanken Wintersemester 2012/13 By O DBC, we can make S QL tables available in DD BASE: employee(A,B,C,D,E,F,G,H,I,J) :Goal = company:employee(A,B,C,D,E,F,G,H,I,J), ddbase_call(odbc(mysql), Goal). works_on(A,B,C) :Goal = company:works_on(A,B,C), ddbase_call(odbc(mysql), Goal). It is also possible to generate these rules in DD BASE, which avoids the error–prone, repeated use of so many variable symbols. The call ddbase_connect(odbc(mysql), M, Database:Table) asserts a corresponding rule in a P ROLOG module M. The following two aggregation statements refer to the predicate employee/10 provided by O DBC. The facts for works_on/3 are derived using F N Query from an X ML document works_on.xml. Prof. Dr. Dietmar Seipel 899 Vorlesung Datenbanken Wintersemester 2012/13 Aggregation on RDB and X ML For every Ssn in the table employee, the following query groups all corresponding entries from the document works_on.xml: ?- ddbase_aggregate( [Ssn, list([Pno, Hours])], ( employee(_,_,_, Ssn, _,_,_,_,_,_), Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn], Pno := Row@’PNO’, Hours := Row@’HOURS’ ), Tuples ). Tuples = [ [’111111111’, [[’20’, ’0.0’]]], [’222222222’, [[’2’, ’10.0’], [’3’, ’10.0’]]], ... ] The resulting list Tuples represents an NF2 relation. A query optimizer could rearrange the Goal in ddbase_aggregate/3 by changing the order of the calls to the predicate employee/10 and the X ML document works_on.xml. Prof. Dr. Dietmar Seipel 900 Vorlesung Datenbanken Wintersemester 2012/13 In DD BASE, we can define arbitary binary aggregation predicates. ddbase_aggregate/3 groups over all variable symbols that occur standalone in the result template [Ssn, list([Pno, Hours])]; in this case, this is Ssn. • For every Ssn, the above call to ddbase_aggregate/3 computes the list Xs of all corresponding pairs [Pno, Hours]. • Then, the call list(Xs, Pairs), which will be explained in a little while, simply passes Xs to Pairs. • Thus, ddbase_aggregate/3 produces a nested tuple [Ssn, Pairs] for every Ssn. Pairs is a list of lists; it represents a relation. The resulting list Tuples is the output. Prof. Dr. Dietmar Seipel 901 Vorlesung Datenbanken Wintersemester 2012/13 The following statement aggregates the working hours of the employees of the departments: ?- ddbase_aggregate( [Dno, sum(Hours)], ( employee(_,_,_, Ssn, _,_,_,_,_, Dno), Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn], H := Row@’HOURS’, atom_number(H, Hours) ), Tuples ). Tuples = [[1, 0.0], [4, 115.5], [5, 140.0]] The attribute value H of the attribute ’Hours’ of Row is an atom that has to be converted to a number Hours. The template [Dno, sum(Hours)] leads to a grouping on the department numbers. For every Dno, first the list Xs of all corresponding Hours is computed, and then the sum is computed by the call sum(Xs, Sum); thus, we obtain a standard result tuple [Dno, Sum]. Prof. Dr. Dietmar Seipel 902 Vorlesung Datenbanken Wintersemester 2012/13 For explaining the effect of the template [Dno, sum(Hours)], we abstract the second argument of the call above as follows: dno_hours(Dno, Hours) :employee(_,_,_, Ssn, _,_,_,_,_, Dno), Row := doc(’works_on.xml’)/row::[@’ESSN’=Ssn], H := Row@’HOURS’, atom_number(H, Hours). The intermediate variable symbols Ssn, Row, and H do not become arguments of dno_hours/2, since they are not used in the template. Then, the following call has the same result as the call above: ?- ddbase_aggregate( [Dno, sum(Hours)], dno_hours(Dno, Hours), Tuples ). Prof. Dr. Dietmar Seipel 903 Vorlesung Datenbanken Wintersemester 2012/13 E.g., for Dno=4, first the list Xs of all working hours of employees from department 4 is computed by dno_hours(4, Hours) in the following functional set notation, and then the sum Sum is computed: ?- Xs <= { Hours | dno_hours(4, Hours) }, Sum <= sum(Xs). Xs = [15.0, 20.0, 10.0, 30.0, 35.5, 5.0], Sum = 115.5. These functional notations, which are possible in the D DK, can even be nested to get rid of the intermediate variable symbol Xs: ?- Sum <= sum({ Hours | dno_hours(4, Hours) }). Sum = 115.5. The functional notation Sum <= sum(Xs) is equivalent to the relational notation sum(Xs, Sum) which includes the return value as the last argument. Thus, sum should be defined as a binary predicate in P ROLOG. Prof. Dr. Dietmar Seipel 904 Vorlesung Datenbanken Wintersemester 2012/13 Aggregation Predicates In DD BASE, arbitrary user–defined aggregation predicates can be used. The predicate list/2 simply passes the input to the output: list(Xs, Xs). The predicate sum/2 uses an accumulator, which is initialized to 0. sum/3 traverses the input list recursively. The list head X is added to the accumulator Acc, and then sum/3 is called recursively on the list tail Xs and the new accumulator Acc_2; if the list is empty, then Acc becomes the output: sum(Xs, Sum) :sum(Xs, 0, Sum). sum([X|Xs], Acc, Sum) :Acc_2 is Acc + X, sum(Xs, Acc_2, Sum). sum([], Acc, Acc). Prof. Dr. Dietmar Seipel 905 Vorlesung Datenbanken Wintersemester 2012/13 Lightweight Fact Database A relational database can also be imported into a lightweight fact representation in P ROLOG. The following sequence of statements loads the data dictionary from the MyS QL database company in a module c. Subsequently, the corresponding relations are imported from the MyS QL database, and a summary is shown. ?- ddbase_load(odbc(mysql), company, c), ddbase_load_tables(c), ddbase_show_tables(c). Prof. Dr. Dietmar Seipel 906 Vorlesung Datenbanken Wintersemester 2012/13 We can describe the schema of a database table based on the data dictionary of MyS QL: ?- ddbase_describe_table(company:works_on). <table name="works_on"> <attribute name="ESSN" type="char(9)" is_nullable="NO"/> <attribute name="PNO" type="int(11)" is_nullable="NO"/> <attribute name="HOURS" type="decimal(3,1)" is_nullable="NO"/> <primary_key> <attribute name="ESSN"/> <attribute name="PNO"/> </primary_key> <foreign_key> <attribute name="ESSN"/> <references table="employee"> <attribute name="SSN"/> </references> </foreign_key> <foreign_key> <attribute name="PNO"/> <references table="project"> <attribute name="PNUMBER"/> </references> </foreign_key> </table> true. Prof. Dr. Dietmar Seipel 907 Vorlesung Datenbanken Wintersemester 2012/13 We can display a complete database or single relations. ?- ddbase_facts_to_display(c). ?- ddbase_facts_to_display(c:works_on/3). Prof. Dr. Dietmar Seipel 908 Vorlesung Datenbanken Wintersemester 2012/13 Of course, the P ROLOG representation of the relational database can be queried in the standard way in P ROLOG. Moreover, we can also execute update statements, which respect the integrity constraints of the relational database. After an insertion or deletion in the database, the primary and foreign key constraints from the data dictionary are checked. ?- ddbase_insert(c, works_on(’666666666’, 10, 3)), ddbase_insert(c, works_on(’666666666’, 10, 4)), ddbase_delete(c, works_on(’666666666’, 10, 3)). The second update is rejected, since it violates a primary key constraint. All tuples from all relations of a database can be deleted in one step. The data dictionary remains unchanged. ?- ddbase_drop_database(c). Prof. Dr. Dietmar Seipel 909 Vorlesung Datenbanken Wintersemester 2012/13 Complex Computations with F N Query element_to_subtree(Xml, Course_1, Course_2) :[T] := Course_1/’Title’/content::’*’, ( Ps := Course_1@’Prerequisites’ -> let( Trees := Xml/’Course’::[ @’CourseNr’ = N, name_contains_name(Ps, N) ] /call::element_to_subtree(Xml) ) ; Trees = [] ), Course_2 = ’Course’:[’Title’:T]:Trees. ?- dread(xml, ’Uni.xml’, [Xml]), let( Trees := Xml/descendant::’Course’ /call::element_to_subtree(Xml) ), dwrite(xml, ’CourseHierarchy’:Trees). In F N Query, the attribute value of an element C is accessed by C@A, whereas in XPATH, it is accessed by C/@A. Prof. Dr. Dietmar Seipel 910 Vorlesung Datenbanken Wintersemester 2012/13 The call element_to_subtree(Xml, Course_1, Course_2) takes an X ML document and a course element Course_1 and produces another course element Course_2: • Firstly, T becomes the content of the Title element of the Course element Course_1. • If Course_1 has prerequisites, then we determine a list Trees of X ML terms using let. For every course in the document, we check whether the course number N is contained in the list Ps of prerequisites of Course_1. In that case, we call the predicate element_to_subtree/3 recursively on that course to produce an element of Trees. The global X ML document is also a parameter of the call. • If Course_1 has no prerequisites, then we determine the empty list Trees = [ ]. Prof. Dr. Dietmar Seipel 911 Vorlesung Datenbanken Wintersemester 2012/13 The main call reads the X ML document Uni.xml into a P ROLOG variable Xml using dread/3. Subsequently, let/1 calls element_to_subtree/3 on every descendant Course element. The resulting list Trees represents a list of X ML elements, which are then packed into a CourseHierarchy element. The corresponding P ROLOG term ’CourseHierarchy’:Trees represents the X ML output of the whole computation, and it can be written to standard output (the screen) using dwrite/2. Prof. Dr. Dietmar Seipel 912 Vorlesung Datenbanken 8.1.3 Wintersemester 2012/13 P ROLOG as a Programming Language In the following, we will present P ROLOG implementations of well–known algorithms for searching in graphs and binary search trees. The benefits of P ROLOG are • the elegant handling of data structures (lists, trees, X ML), • (implicit) backtracking, and • the compact representation of case distinctions in different rules. The algorithms are typically recursive. Recursion can be formulated nicely due to the compact list access. Also meta–predicates support a compact and elegant encoding. Prof. Dr. Dietmar Seipel 913 Vorlesung Datenbanken Wintersemester 2012/13 Graph Search Labyrinth: Prof. Dr. Dietmar Seipel 914 Vorlesung Datenbanken Wintersemester 2012/13 Computation of Simple Paths by Backtracking The predicate graph_search/2 computes a simple path from a given node to a sink of a graph: % graph_search(+Node, ?Path) <graph_search(X, Path) :graph_search(X, [X], Path). Another predicate graph_search/3 with the same predicate symbol but a different arity is called. The graph is given by facts for the prediactes graph_arc/2 and graph_sink/1. Notation for arguments in the comment line: +: bound, -: free, ?: either bound or free Prof. Dr. Dietmar Seipel 915 Vorlesung Datenbanken Wintersemester 2012/13 Path = Visited [Y1 =Y, . . . ,Yn =Z] -X-Y -Z • A call graph_search(X, Visited, Nodes) with a bound argument X, which is not a sink, and a list Visited of already visitied nodes – uses an edge from X to not yet vistited successor node Y, and then – calculates a path Path from Y to a sink Z, which does not visit Y and the nodes in Visited. If no path from Y to a sink can be found, then another successor node of X must be used (Backtracking). • The result Nodes = [X|Path] is a simple path from X to a sink of the graph. Prof. Dr. Dietmar Seipel 916 Vorlesung Datenbanken Wintersemester 2012/13 The predicate graph_search/3 is recursive, because of its second rule: % graph_search(+Node, +Visited, ?Path) <graph_search(X, _, [X]) :graph_sink(X). graph_search(X, Visited, [X|Path]) :Path = [Y1 =Y, Visited graph_edge(X, Y), -X-Y not(member(Y, Visited)), write(user, ’->’), write(user, Y), graph_search(Y, [Y|Visited], Path). . . . ,Yn =Z] -Z Termination is ensured by the fact that already visited nodes cannot be visited again. Prof. Dr. Dietmar Seipel 917 Vorlesung Datenbanken Wintersemester 2012/13 • The initial call graph_search(X, [X], Path) calculates a simple path from X to a sink of the graph. – If X is a sink, then the first rule for graph_search/3 computes Path as an empty list. – Otherwise, the recursive, second rule choses a successor node Y using graph_edge(X, Y), and then it continues searching from there. • Further paths can be searched for by backtracking. – Alternative successor nodes Y can be used in the second rule. – In the implementation above, we can continue searching beyond a sink by using the second rule instead of the fist one. Prof. Dr. Dietmar Seipel 918 Vorlesung Datenbanken Wintersemester 2012/13 Implicit and Explicit Backtracking In P ROLOG, backtracking is used automatically (implicitly). In a procedural language, backtracking has to be implemented explicitly. In a direct translation of the code above to a procedural environment, a call graph_edge(X, Y) can only produce a single successor node Y of X – if there is no path from Y to a sink, then the computation fails. Moreover, at most one solution could be computed. If we implement the graph search procedurally using explicit backtracking, then we get more code than in P ROLOG. Prof. Dr. Dietmar Seipel 919 Vorlesung Datenbanken Wintersemester 2012/13 Representation of a Graph by P ROLOG Facts Labyrinth: a b c d e f g h i - graph_arc(i, graph_arc(i, graph_arc(h, graph_arc(g, graph_arc(d, graph_arc(d, graph_arc(a, graph_arc(b, f). h). g). d). e). a). b). c). 6 graph_sink(c). Prof. Dr. Dietmar Seipel 920 Vorlesung Datenbanken Wintersemester 2012/13 The following rule symmetrisises the predicate graph_arc/2: graph_edge(X, Y) :( graph_arc(X, Y) ; graph_arc(Y, X) ). Thus, it is not necessary to explicitely list the inverse edges: graph_edge(i, graph_edge(f, graph_edge(i, graph_edge(h, ... Prof. Dr. Dietmar Seipel f). i). h). i). 921 Vorlesung Datenbanken Wintersemester 2012/13 Computation • The predicate graph_search/2 use depth first search, and it calculates simple paths (without duplicate nodes). • With the call graph_search(+Node, -Path), we can calculate all simple paths from Node to a sink (graph_sink) by backtracking: ?- graph_search(i, Path). ->f->h->g->d->e->a->b->c Path = [i, h, g, d, a, b, c] ?- graph_search(e, Path). ->d->a->b->c Path = [e, d, a, b, c] ; ->g->h->i->f No Prof. Dr. Dietmar Seipel 922 Vorlesung Datenbanken Wintersemester 2012/13 • If we add another edge graph_arc(e, b) to the graph (i.e., we tear down the wall between e and b), then there appears another simple path [e, b, c] from e to the sink c. • All results can be calculated by backtracking and findall/3: graph_arc(e, b). ?- findall( Path, graph_search(e, Path), Paths ). Paths = [[e, d, a, b, c], [e, b, c]] Yes Prof. Dr. Dietmar Seipel 923 Vorlesung Datenbanken Wintersemester 2012/13 The Meta–Predicate findall/3 Finding of all solutions for a goal: findall( X, goal(X), Xs ) The D DK allows for the following equivalent set notation: Xs <= { X | goal(X) } Further important meta–predicates are checklist/2 and maplist/3 for lists, as well as the predicates for loops (control structures) from the library loops.pl (e.g., foreach-do). Prof. Dr. Dietmar Seipel 924 Vorlesung Datenbanken Wintersemester 2012/13 Binary Search Trees % search_in_tree(+Key, +Tree) <search_in_tree(Key, Tree) :parse_tree(Tree, Root, Lson, Rson), ( Key = Root ; Key < Root -> search_in_tree(Key, Lson) ; Key > Root -> search_in_tree(Key, Rson) ). arguments: +: bound, -: free, ?: either bound or free Prof. Dr. Dietmar Seipel 925 Vorlesung Datenbanken Wintersemester 2012/13 Search Tree X ML Representation: P ROLOG Representation: <node key="5"> <node key="4"/> <node key="9"> <node key="6"/> <node key="10"/> </node> </node> alternative P ROLOG representation: node:[key:5]:[ node:[key:4]:[], node:[key:9]:[ node:[key:6]:[], node:[key:10]:[] ] 5 ] 4 9 [5, [4], [9, [6], [10]]] 6 Prof. Dr. Dietmar Seipel 10 926 Vorlesung Datenbanken Wintersemester 2012/13 Encapsulation of the Tree Access % parse_tree(+Tree, ?Key, ?Lson, ?Rson) <% parse_tree(?Tree, +Key, +Lson, +Rson) <parse_tree(Tree, Key, Empty, Empty) :Tree = node:[key:Key]:[], Empty = node:[]:[]. parse_tree(Tree, Key, Lson, Rson) :Tree = node:[key:Key]:[Lson, Rson]. % binary_tree_empty(?Tree) <binary_tree_empty(node:[]:[]). The same code for parse_tree/4 can be called both for extracting the root key and the two subtrees of a binary tree (+,-,-,-) and for constructing a binary tree from a key and two subtrees (-,+,+,+) . Prof. Dr. Dietmar Seipel 927 Vorlesung Datenbanken Wintersemester 2012/13 Examples: ?- Tree = node:[key:5]:[ node:[key:4]:[], node:[key:9]:[node:[key:6]:[], ...] ], parse_tree(Tree, Key, Lson, Rson). Key = 5, Lson = node:[key:4]:[], Rson = node:[key:9]:[node:[key:6]:[], ...] ?- Key = 9, Lson = node:[key:6]:[], Rson = node:[key:10]:[], parse_tree(Tree, Key, Lson, Rson). Tree = node:[key:9]:[node:[key:6]:[], node:[key:10]:[]] Prof. Dr. Dietmar Seipel 928 Vorlesung Datenbanken Wintersemester 2012/13 Alternative Encapsulation of the Tree Access parse_tree([Root, Lson, Rson], Root, Lson, Rson). parse_tree([Root], Root, [], []). binary_tree_empty([]). 5 4 9 6 10 Example: ?- Tree = [5, [4], [9, [6], [10]]], parse_tree(Tree, Root, Lson, Rson). Root = 5, Lson = [4], Rson = [9, [6], [10]] Prof. Dr. Dietmar Seipel 929 Vorlesung Datenbanken Wintersemester 2012/13 % insert_into_tree(+Key, +Tree, ?New_Tree) <insert_into_tree(Key, Tree, New_Tree) :parse_tree(Tree, Root, Lson, Rson), ( Key = Root -> New_Tree = Tree ; Key < Root -> insert_into_tree(Key, Lson, L), parse_tree(New_Tree, Root, L, Rson) ) ; K > Root -> insert_into_tree(Key, Rson, R), parse_tree(New_Tree, Root, Lson, R) ). insert_into_tree(Key, _, New_Tree) :binary_tree_empty(Empty), parse_tree(New_Tree, Key, Empty, Empty). If the tree is empty, then parse_tree(Tree, Root, Lson, Rson) fails. Then the second rule builds a new tree with two empty subtrees using parse_tree(New_Tree, Key, Empty, Empty). Prof. Dr. Dietmar Seipel 930 Vorlesung Datenbanken Wintersemester 2012/13 Important Concepts • Terms (for Data and Control Structures) and Unification • Backtracking • SLDNF–Resolution P ROLOG allows for • declarative programming, • compact programs, and • rapid prototyping, agile software development. We are using the X PCE extension of S WI P ROLOG. Prof. Dr. Dietmar Seipel 931 Vorlesung Datenbanken Wintersemester 2012/13 Top–Down Evaluation of P ROLOG: SLDNF–Resolution • Like in conventional programming, P ROLOG is evaluated top–down: a call to a predicate looks for an applicable rule with the predicate in head and then successively calls the statements in the body. • Unlike in conventional programming languages, there can be many such rules, which are then used successively – comparably to the different options of a case–statement. The evaluation of a call using a rule can fail; then, the next applicable rule is used (backtracking). This is done until finally the complete computation is successive. • Since the arguments of a rule head can be partially instantiated, the passing of paramenters is done using unification, which suitably extends the standard way of paramenter passing. Prof. Dr. Dietmar Seipel 932 Vorlesung Datenbanken Wintersemester 2012/13 • A negated call succeeds, if the corresponding positive call fails (negation–as–finite–failure). • Using backtracking, it is possible to compute the list of all answers to a given call (query). This corresponds to query answering in relational databases using S QL. In practical P ROLOG systems, there exists a large collection of pre–defined built–in predicates and also meta–predicates (i.e., predicates, some of whose arguments can be predicates themselves). Moreover, there can be side–effects – mainly for I/O and access to the internal fact database (assert, retract). Prof. Dr. Dietmar Seipel 933 Vorlesung Datenbanken Wintersemester 2012/13 Data Structures, Operations, and Control Structures • The restriction to a few basic data types and a single complex data type, namely the terms, which is generic and subsumes all the other types, standardizes the data structures. • There are no explicit type declarations. • There exists a large collection of generic operations that are applicable to terms – and thus to all data types. • Frequently, meta–predicates are used. • Actually, control structures are meta–predicates as well. In addition to standard control structures, such as branching (if–then–else), loops (for, while), and recursion, user–defined control structures can be built as meta–predicates. Prof. Dr. Dietmar Seipel 934 Vorlesung Datenbanken Wintersemester 2012/13 Software Engineering Aspects P ROLOG supports abstraction and compact code, and thus stimulates refactoring: • The generic type of terms with generic operations supports abstraction and code reuse. • User–defined control structures allow for further abstraction. • Unification, implicit backtracking, and abstaining from explicit type declarations, result in very compact code and support rapid prototyping. • Declarativity makes the code much more readable and thus extensible. Switching from conventional programming languages to the logic programming paradigm is difficult and usually requires a lot of training and effort. Prof. Dr. Dietmar Seipel 935 Vorlesung Datenbanken Wintersemester 2012/13 Disjunctive Logic Programming Sink in a Network Fact Base: a network is represented as node facts and arc facts. b a d c node(a). node(b). node(c). node(d). arc(a,b); arc(a,c). arc(b,d). arc(c,d). Either there exists an arc from node a to b or from a to c (disjunction). Prof. Dr. Dietmar Seipel 936 Vorlesung Datenbanken Wintersemester 2012/13 Rule Base: A node X is a sink, if there is no other node Y for which there is no transitive connection (transitive closure, tc) from Y to X. sink(X) :node(X), not(not_sink(X)). not_sink(X) :node(X), node(Y), X \= Y, not(tc(Y, X)). tc(X, Y) :arc(X, Z), tc(Z, Y). tc(X, Y) :arc(X, Y). Query: ?- sink(X) X = d Prof. Dr. Dietmar Seipel 937 Vorlesung Datenbanken Wintersemester 2012/13 Course on Deductive Databases Topics: • foundations and applications of P ROLOG and DATALOG, data modelling and programming; • the deductive database system DD BASE; • efficient evaluation of DATALOG programs; • further language constructs in the D DK (D IS L OG Developers’ Kit): – complex data structures, – default negation and disjunction; • applications on the basis of P ROLOG and DD BASE. Prof. Dr. Dietmar Seipel 938 Vorlesung Datenbanken Wintersemester 2012/13 8.2 Semantic Web Databases Knowledge Engineering in the Semantic Web (Web 2.0) is based on ontologies and logic. Reasoning Tasks: • supporting the search (query answering); • in knowledge engineering / modelling: analysis of the structure of the ontologies for anomalies. Knowledge engineering and reasoning in the Semantic Web can be supported by deductive databases and logic programming techniques. Prof. Dr. Dietmar Seipel 939 Vorlesung Datenbanken Wintersemester 2012/13 In the Semantic Web, it is possible to reason about • the ontology / taxonomy (i.e., the schema) and • the instances. This is called terminological or assertional (T–Box or A–Box) reasoning, respectively. This makes search in the Semantic Web more effective. • In the following printer ontology, we could search for a printer from HP, and the result could be a laser–jet printer from HP, since the system knows that hpLaserJetPrinter is a sub–class of hpPrinter. • It can also be derived, that all laser–jet printers from HP are no laser writers from Apple; in this case, this is very easy, since it is explicitely stored in the ontology. Moreover, we will show in the following how to support knowledge engineering by detecting anomalies in OWL ontologies. Prof. Dr. Dietmar Seipel 940 Vorlesung Datenbanken Wintersemester 2012/13 The Web Ontology Language (OWL) In OWL, we can mix concepts from • rdf (Resource Description Framework) for defining instances and • rdfs (rdf Schema) for defining the schema of an application. Moreover, tags with the namespace owl are allowed. The Semantic Web Rule Language (S WRL) incorporates logic programming rules into OWL ontologies. There exist well–known, powerful tools for asking queries on and for reasoning with OWL ontologies. Prof. Dr. Dietmar Seipel 941 Vorlesung Datenbanken Wintersemester 2012/13 The Printer Ontology product hpProduct printer personalPrinter ibmLaserPrinter laserJetPrinter appleLaserWriter hpPrinter hpLaserJetPrinter {disjoint} hpApplePrinter Prof. Dr. Dietmar Seipel 942 Vorlesung Datenbanken Wintersemester 2012/13 The Printer Ontology in OWL <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:xsd="http://www.w3.org/2001/XLMSchema#" xmlns="file:/protege/Ontologies/p.owl#"> <owl:Ontology rdf:about=""> <owl:VersionInfo> Printer Example, Version 1.3, 02.02.2013 </owl:VersionInfo> </owl:Ontology> <owl:Class rdf:ID="printer"/> <owl:Class rdf:ID="laserJetPrinter"> <rdfs:subClassOf rdf:resource="#printer"/> </owl:Class> ... </rdf:RDF> Prof. Dr. Dietmar Seipel 943 Vorlesung Datenbanken Wintersemester 2012/13 The following owl:Class element defines the class appleLaserWriter: <owl:Class rdf:ID="appleLaserWriter"> <rdfs:comment> Apple laser writers are laser jet printers </rdfs:comment> <rdfs:subClassOf rdf:resource="#laserJetPrinter"/> <owl:disjointWith rdf:resource="#hpLaserJetPrinter"/> </owl:Class> The rdfs:subClassOf sub–element states that appleLaserWriter is a sub–class of laserJetPrinter. The owl:disjointWith sub–element states that appleLaserWriter is disjoint from hpLaserJetPrinter. Observe, that we refer using the attribute rdf:resource and a “#”, whereas the owl:Class element uses the attribute rdf:ID and no “#”. Prof. Dr. Dietmar Seipel 944 Vorlesung Datenbanken Wintersemester 2012/13 The following owl:Class element defines a class of printers from a joint venture of HP and Apple: <owl:Class rdf:ID="hpApplePrinter"> <rdfs:comment> Printers from a joint venture of HP and Apple </rdfs:comment> <rdfs:subClassOf rdf:resource="#hpLaserJetPrinter"/> <rdfs:subClassOf rdf:resource="#appleLaserWriter"/> </owl:Class> The existence of such printers would contradict the disjointWith restriction between the classes hpLaserJetPrinter and apperLaserWriter. The emptiness of the class hpApplePrinter can be detected by reasoners in the ontology editor Protégé. Prof. Dr. Dietmar Seipel 945 Vorlesung Datenbanken Wintersemester 2012/13 Every laserJetPrinter is a printer, and every hpPrinter is an hpProduct: <owl:Class rdf:ID="printer"/> <owl:Class rdf:ID="laserJetPrinter"> <rdfs:subClassOf rdf:resource="#printer"/> </owl:Class> <owl:Class rdf:ID="hpProduct"/> <owl:Class rdf:ID="hpPrinter"> <rdfs:subClassOf rdf:resource="#hpProduct"/> </owl:Class> Prof. Dr. Dietmar Seipel 946 Vorlesung Datenbanken Wintersemester 2012/13 Redundant subClassOf Relation Since hpLaserJetPrinter is a sub–class of hpPrinter and hpPrinter is a sub–class of hpProduct, it is redundant to explicitely state that hpLaserJetPrinter is a sub–class of hpProduct. <owl:Class rdf:ID="hpLaserJetPrinter"> <rdfs:subClassOf rdf:resource="#laserJetPrinter"/> <rdfs:subClassOf rdf:resource="#hpPrinter"/> <rdfs:subClassOf rdf:resource="#hpProduct"/> <owl:disjointWith rdf:resource="#appleLaserWriter"/> </owl:Class> This redundancy is not an error. We could simply consider it as an anomaly, that should be reported to the knowledge engineer. This anomaly is not reported by reasoners in the ontology editor Protégé. Prof. Dr. Dietmar Seipel 947 Vorlesung Datenbanken Wintersemester 2012/13 Instances Finally, we have some instances of some of the defined classes: <appleLaserWriter rdf:ID="1001"/> <appleLaserWriter rdf:ID="1002"/> <hpLaserJetPrinter rdf:ID="1003"/> <hpLaserJetPrinter rdf:ID="1004"/> As mentioned before, there cannot exist instances of the class hpApplePrinter. Prof. Dr. Dietmar Seipel 948 Vorlesung Datenbanken Wintersemester 2012/13 The Ontology Editor Protégé Prof. Dr. Dietmar Seipel 949 Vorlesung Datenbanken Wintersemester 2012/13 The ontology editor Protégé has some plugged in reasoners, such as • FaCT++, • HermiT, and • Racer. In the session that is shown in the screenshot above, the emptiness of the class hpApplePrinter was be detected by the ontology reasoner FaCT++. It is inferred that the class hpApplePrinter is EquivalentTo the empty class Nothing. By clicking the question mark, an explanation can be shown. There are also databases for handling rdf data, so called triple stores, such as Sesame or Jena. They use extensions of S QL– most notably SPARQL – as a query language. Prof. Dr. Dietmar Seipel 950 Vorlesung Datenbanken Wintersemester 2012/13 Declarative Queries in F N Query Complex X ML data structures in P ROLOG: ’owl:Class’:[’rdf:ID’:’appleLaserWriter’]:[ ’rdfs:comment’:[’Apple laser ...’], ’rdfs:subClassOf’:[ ’rdf:resource’:’#laserJetPrinter’]:[], ’owl:disjointWith’:[ ’rdf:resource’:’#hpLaserJetPrinter’]:[] ] An X ML element is represented as a term structure T:As:C, called FN–triple. • T is the tag of the element, • As is the list of the attribute/value pairs A:V of the element, and • C is a list of FN–triples for the sub–elements. Prof. Dr. Dietmar Seipel 951 Vorlesung Datenbanken Wintersemester 2012/13 F N S ELECT In an OWL knowledge base Owl, there exists an isa relation between two classes C1 and C2, if a subclassOf relation is stated explicitely, or if C1 was defined as the interesection of C2 and some other classes: % isa(+Owl, ?C1, ?C2) <isa(Owl, C1, C2) :C := Owl/’owl:Class’::[@’rdf:ID’=C1], ( R2 := C/’rdfs:subClassOf’@’rdf:resource’ ; R2 := C/’owl:intersectionOf’/’owl:Class’@’rdf:about’ ), owl_reference_to_id(R2, C2). % owl_reference_to_id(+Reference, ?Id) <owl_reference_to_id(Reference, Id) :( concat(’#’, Id, Reference) ; Id = Reference ). Prof. Dr. Dietmar Seipel 952 Vorlesung Datenbanken Wintersemester 2012/13 Disjointness of Classes % disjointWith(+Owl, ?C1, ?C2) <disjointWith(Owl, C1, C2) :R2 := Owl/’owl:Class’::[@’rdf:about’=R1] /’owl:disjointWith’@’rdf:resource’, owl_reference_to_id(R1, C1), owl_reference_to_id(R2, C2). In the following, we often suppress the ontology argument Owl. Transitive Closure of isa % subClassOf(?C1, ?C2) <subClassOf(C1, C2) :isa(C1, C2). subClassOf(C1, C2) :isa(C1, C), subClassOf(C, C2). Prof. Dr. Dietmar Seipel 953 Vorlesung Datenbanken Wintersemester 2012/13 Anomalies in Ontologies Cycle ?- isa(C1, C2), subClassOf(C2, C1). C1 = personalPrinter, C2 = printer Partition Error ?- disjointWith(C1, C2), subClassOf(C, C1), subClassOf(C, C2). C = hpApplePrinter, C1 = hpLaserJetPrinter, C2 = appleLaserWriter The class C is a sub–class of two disjoint classes C1 and C2. Prof. Dr. Dietmar Seipel 954 Vorlesung Datenbanken Wintersemester 2012/13 Incompleteness ?- isa(C1, C), isa(C2, C), isa(C3, C), disjointWith(C1, C2), not(disjointWith(C2, C3)). C C1 C2 C3 = = = = laserJetPrinter, hpLaserJetPrinter, appleLaserWriter, ibmLaserPrinter The class C has three sub–classes C1, C2 and C3, from which only the two sub–classes C1 and C2 are declared as disjoint in the knowledge base. The fact that C2 and C3 are disjoint and that C1 and C3 are disjoint as well, possibly was forgotten by the knowledge engineer during the creation of the knowledge base. Prof. Dr. Dietmar Seipel 955 Vorlesung Datenbanken Wintersemester 2012/13 Redundant subClassOf/instanceOf Relations % redundant_isa(?Chain) <redundant_isa(C1->C2->C3) :isa(C1, C2), subClassOf(C2, C3), isa(C1, C3). ?- redundant_isa(Chain). Chain = hpLaserJetPrinter -> hpPrinter -> hpProduct The sub–class relation between C1 and C3 can be derived by transitivity over the class C2. Here, isa(C1, C2), subClassOf(C2, done over at least two levels. Prof. Dr. Dietmar Seipel C3), requires that this deduction is 956 Vorlesung Datenbanken Wintersemester 2012/13 Undefined Reference During the development of an ontology in OWL, it is possible that we reference a class that we have not yet defined. % undefined_reference(+Owl, ?Ref) <undefined_reference(Owl, Ref) :rdf_reference(Owl, Ref), not(owl_class(Owl, Ref)). rdf_reference(Owl, Ref) :( R := Owl/descendant_or_self::’*’@’rdf:resource’ ; R := Owl/descendant_or_self::’*’@’rdf:about’ ), owl_reference_to_id(R, Ref). owl_class(Owl, Ref) :Ref := Owl/’owl:Class’@’rdf:ID’. If we load such an ontology into Protégé, then the ontology reasoners may produce wrong results, even for unrelated parts of the ontology. Prof. Dr. Dietmar Seipel 957 Vorlesung Datenbanken Wintersemester 2012/13 8.3 Object–Oriented Databases Application Domains • engineering (CAD/CAM, CIM) • image and graphics databases • scientific applications • geo–databases • multimedia systems • integration of heterogenous databases Prof. Dr. Dietmar Seipel 958 Vorlesung Datenbanken Wintersemester 2012/13 Influences and concepts from other areas of computer science: • programming languages: abstract data typs and encapsulation completeness (w.r.t. expressivity) • software engineering: modularisation, code extensibility and reuse • artificial intelligence: concepts for knowledge representation, classification • databases: (semantic) data modelling Prof. Dr. Dietmar Seipel 959 Vorlesung Datenbanken Wintersemester 2012/13 8.3.1 Complex Objects Every object has a unique object identifier OID. This value is invisible for the user. It is only used internally by the system to identify an object and to allow for references between different objects. An object o is represented by a triple h i, c, v i: • i is the unique object identifier, • c is a type constructor, • v is the value of o. Type Constructors: atom, tuple, set, list, array Domains for atomic values: integer, real, string, boolean, date, . . . Prof. Dr. Dietmar Seipel 960 Vorlesung Datenbanken Wintersemester 2012/13 Given an object o = h i, c, v i. • If c = atom, then v is an atomic value. • If c = tuple, then v = h a1 : i1 , a2 : i2 , . . . , an : in i is a tuple with attribute names aj and OID’s ij . • If c = set, then v = { i1 , i2 , . . . , in } is a set of OID’s ij . • If c = list/array, then v is an ordered list / an array of OID’s. Prof. Dr. Dietmar Seipel 961 Vorlesung Datenbanken Wintersemester 2012/13 A complex object can be represented by a graph. Two complex objects o1 = h i1 , c1 , v1 i and o2 = h i2 , c2 , v2 i are called • deeply equal, if c1 = c2 and v1 = v2 . • shallow equal, if their graphs are isomorphic and the atomic values in the corresponding leaves are the same. OODDL: Object–Oriented Data Definition Language Nowadays, complex objects are frequently stored and managed using X ML databases. Prof. Dr. Dietmar Seipel 962 Vorlesung Datenbanken Wintersemester 2012/13 Example (Complex Objects) The complex objects o1 , o2 , and o3 contain the atomic objects o4 , o5 , and o6 as sub–objects: o1 = h i1 , tuple, h a1 : i4 , a2 : i6 i i, o2 = h i2 , tuple, h a1 : i4 , a2 : i6 i i, o3 = h i3 , tuple, h a1 : i5 , a2 : i6 i i, o4 = h i4 , atom, 10 i, o5 = h i5 , atom, 10 i, o6 = h i6 , atom, 20 i. o1 and o2 are deeply equal; o1 and o3 are shallow equal. Prof. Dr. Dietmar Seipel 963 Vorlesung Datenbanken Wintersemester 2012/13 i1: i : 3 tuple o1 tuple a1 i4: atom a2 o4 o i : 6 6 atom <a 1:10, a 2 :20> a o3 1 o i : 5 5 atom a2 o6 i : 6 atom <a :10, a 2 :20> 1 Identity after resolving the references Prof. Dr. Dietmar Seipel 964 Vorlesung Datenbanken Wintersemester 2012/13 Nested Relations Given a set U of attributes with domains dom(A), A ∈ U . Formats R and domains dom(R) are defined recursively: • R = (A1 , . . . , An , R1 , . . . , Rm ) with Ai ∈ U , 1 ≤ i ≤ n, and formats Rj , 1 ≤ j ≤ m, is a format with dom(R) = dom(A1 ) × . . . × dom(An ) × 2dom(R1 ) × . . . × 2dom(R2 ) . • If m = 0, then R is a basic format. A nested tuple over a format R is an element of dom(R). A nested relation or NF2 –Relation (Non–First–Normal–Form) over R is a subset of dom(R). Prof. Dr. Dietmar Seipel 965 Vorlesung Datenbanken Wintersemester 2012/13 Example (NF2 –Relation) formats Children = (Cname, BDate, Sex) Graduations = (Type, Date) Employees = (Id, Name, Address, Children, Graduations) NF2 –Relation over the format Employees: Employees Id Name Children Address Cname 100 200 Prof. Dr. Dietmar Seipel Joe Theo LA NY Graduations Bdate Sex Mary 120261 F Peter 041465 M John 082270 M Mary 051578 F Laura 051578 F Type Date driv_lic 121255 phd_cs 021565 driv_lic 082686 966 Vorlesung Datenbanken Wintersemester 2012/13 8.3.2 Features of Object–Orientation Encapsulation of Structure and Behaviour In the relational data model there exist generic operatios for searching, inserting, deleting, and updating tuples, which can be applied to arbitrary relation schemas. In object–oriented databases there are visible and hidden attributes. • The visible attributes can be accessed by a declarative query language. • The hidden attributes are accessed by sending messages (message passing) between the objects. Each object type “has” integrity conditions, which are realised in the access operations. Prof. Dr. Dietmar Seipel 967 Vorlesung Datenbanken Wintersemester 2012/13 Type and Class Hierarchies, Inheritance A type is given by its • type name, • attributes, and • operations (methods). As a generalization of attribute and method we use the term function. Prof. Dr. Dietmar Seipel 968 Vorlesung Datenbanken Wintersemester 2012/13 A type hierarchy is an acyclic, binary relation of the set of types: Person Student ? Grad_Student R Faculty Supertype ? Subtype specialization: ↓ generalization: ↑ Prof. Dr. Dietmar Seipel 969 Vorlesung Datenbanken Wintersemester 2012/13 A sub–type inherits the functions of the super–type (inheritance). Additionally, the sub–type has its own functions. → multiple inheritance, selective inheritance A class is a set of objects, which usually are of the same type. Usually, the set of all stored objects of each type forms a class. Classes can form hierarchies, too. Prof. Dr. Dietmar Seipel 970 Vorlesung Datenbanken Wintersemester 2012/13 The type system in OODBs can be extended at run time. Frequently, the non–standard data type BLOB (binary large object) is used for • raster pixel pictures and • long text strings. These are supported as abstract data types with suitable access operations. Prof. Dr. Dietmar Seipel 971 Vorlesung Datenbanken Wintersemester 2012/13 Polymorphism (Operator Overloading) The same operator name can have different implementations. The implementation which is suitable for a certain object is determined at run time, when the type of the object is known (late binding). E.g., the function “area” for calculating the area is implemented differently for different geometrical objects. GEOMETRY_OBJEKT: Shape, Area, Centerpoint RECTANGLE subtype-of GEOMETRY_OBJECT (Shape=’rectangle’): Width, Height TRIANGLE subtype-of GEOMETRY_OBJECT (Shape=’triangle’): Side1, Side2, Angle CIRCLE subtype-of GEOMETRY_OBJECT (Shape=’circle’): Radius Prof. Dr. Dietmar Seipel 972 Vorlesung Datenbanken Wintersemester 2012/13 Multiple and Selective Inheritance • Multiple inheritance occurs in a type hierarchy, if a type T is a sub–type of several super–types T1 , . . . , Tn : T1 T2 ... Tn RU T Then T inherits the functions of T1 , . . . , Tn ; this can lead to ambiguities. • Selective inheritance occurs, if a type should only inherit some special functions of one of its super–types T ′ . In this case, the undesired functions are excluded (EXCEPT clause). Prof. Dr. Dietmar Seipel 973 Vorlesung Datenbanken Wintersemester 2012/13 Versions and Configurations Many database applications require the management of different versions versions of complex objects: • software projects • CAD applications. A version graph shows the relations between the different versions of an object. A configuration of a complex object is a composition of compatible versions for the sub–objects. Prof. Dr. Dietmar Seipel 974 Vorlesung Datenbanken Wintersemester 2012/13 8.3.3 Examples: C OMPANY and U NIVERSITY Database In the following we will see the 1. types, 2. classes, 3. methods, and 4. some queries for two examples. Prof. Dr. Dietmar Seipel 975 Vorlesung Datenbanken Wintersemester 2012/13 The C OMPANY Database as an OODB i8 : tuple DNAME i : 5 atom o v 5 5 DNUMBER MGR i : 4 atom i : 9 tuple o 4 v 4 o 8 LOCATIONS o i : o 7 7 set 9 10 set : o i 10 11 set : o 11 v 10 7 v 11 5 i : 1 atom i : o 2 2 atom o1 v v 1 Houston MANAGER i PROJECTS v 9 v Research EMPLOYEES 2 Bellaire i : o 3 3 atom v i :... i :... i :... 15 17 16 tuple tuple tuple 3 Sugarland MANAGERSTARTDATE i : 6 atom o 6 v 6 i 13 :... 22-May-78 tuple i 14 :... tuple i 12 :... tuple Prof. Dr. Dietmar Seipel 976 Vorlesung Datenbanken Wintersemester 2012/13 Complex Objects o1 = h i1 , atom, Houston i, o2 = h i2 , atom, Bellaire i, o3 = h i3 , atom, Sugarland i, o4 = h i4 , atom, 5 i, o5 = h i5 , atom, Research i, o6 = h i6 , atom, 22-May-78 i, o7 = h i7 , set, { i1 , i2 , i3 } i, o8 = h i8 , tuple, h DNAME : i5 , DNUMBER : i4 , MGR : i9 , LOCATIONS : i7 , EMPLOYEES : i10 , PROJECTS : i11 i i, o9 = h i9 , tuple, h MANAGER : i12 , MANAGERSTARTDATE : i6 i i, o10 = h i10 , set, { i12 , i13 , i14 } i, o11 = h i11 , set, { i15 , i16 , i17 } i, . . . Prof. Dr. Dietmar Seipel 977 Vorlesung Datenbanken Wintersemester 2012/13 Data Types define type Date: tuple( year: integer, month: integer, day: integer ); define type Employee: tuple( name: string, ssn: string, birthdate: Date, sex: char, dept: Department ); define type Department: tuple( dname: string, dnumber: integer, mgr: tuple( manager: Employee, startdate: Date ), locations: set(string), employees: set(Employee), projects: set(Project) ); Prof. Dr. Dietmar Seipel 978 Vorlesung Datenbanken Wintersemester 2012/13 Classes define class Employee: type tuple( name: string, ssn: string, birthdate: Date, sex: char, dept: Department ); operations age(e: Employee): integer, create_new_emp: Employee, destroy_emp(e: Employee): boolean; Prof. Dr. Dietmar Seipel 979 Vorlesung Datenbanken Wintersemester 2012/13 define class Department: type tuple ( dname: string, dnumber: integer, mgr: tuple( manager: Employee, startdate: Date ), locations: set (string), employees: set (Employee), projects: set (Project) ); operations number_of_emps(d: Department): integer, create_new_dept: Department, destroy_dept(d: Department): boolean, add_emp(d: Department, e: Employee): boolean, remove_emp(d: Department, e: Employee): boolean; Prof. Dr. Dietmar Seipel 980 Vorlesung Datenbanken Wintersemester 2012/13 define class DepartmentSet: type set (Department); operations create_dept_set: DepartmentSet, destroy_dept_set(ds: DepartmentSet): boolean, add_dept(ds: DepartmentSet, d: Department): boolean, remove_dept(ds: DepartmentSet, d: Department): boolean; persistent name AllDepartments: DepartmentSet; /* AllDepartments is a persistent named object of type set(Department) */ ... d := create_new_dept; /* creates new department object in the variable d */ b := add_dept(AllDepartments, d); /* makes d persistent by adding it to a persistent named object */ Prof. Dr. Dietmar Seipel 981 Vorlesung Datenbanken Wintersemester 2012/13 The U NIVERSITY Database as an OODB Data Types type Phone: tuple( area_code: integer, number: integer ); type Date: tuple( year: integer, month: integer, day: integer ); Prof. Dr. Dietmar Seipel 982 Vorlesung Datenbanken Wintersemester 2012/13 Classes class Person type tuple( ssn: string, name: tuple( firstname: string, middlename: string, lastname: string ), address: tuple( number: integer, street: string, apt_no: string, city: string, state: string, zipcode: string ), birthdate: Date, sex: character ); method age: integer end Prof. Dr. Dietmar Seipel 983 Vorlesung Datenbanken Wintersemester 2012/13 class Student inherit Person type tuple( class: string, majors_in: Department, minors_in: Department, registered_in: set(Section), transcript: set ( tuple( grade: character, ngrade: real, section: Section ) ) ); method grade_point_average: real, change_class: boolean, change_major(new_major: Department): boolean; end class Grad_Student inherit Student type tuple( degrees: set ( tuple ( college: string, degree: string, year: integer ) ), advisor: Faculty ); end Prof. Dr. Dietmar Seipel 984 Vorlesung Datenbanken Wintersemester 2012/13 class Faculty inherit Person type tuple( salary: real, rank: string, foffice: string, fphone: Phone, belongs_to: set(Department), grants: set(Grants), advises: set(Student) ), method promote_faculty, give_raise(percent: real), end class Department type tuple( dname: string, office: string, dphone: Phone, members: set(Faculty), majors: set(Student), chairperson: Faculty, courses: set(Course) ), method add_major(s: Student), remove_major(s: Student):boolean end Prof. Dr. Dietmar Seipel 985 Vorlesung Datenbanken Wintersemester 2012/13 class Section type tuple( sec_num: integer, qtr: Quartar, year: Year, students: set ( tuple( stud: Student, grade: character ) ), course: Course, teacher: Instructor ), method change_grade(s: Student, g: character); end class Course type tuple( cname: string, cnumber: string, cdescription: string, sections: set(Section), offering_dept: Department ); end Prof. Dr. Dietmar Seipel 986 Vorlesung Datenbanken Wintersemester 2012/13 Methods method body age: integer in class Person { int a; Date d; d=today(); a=d->year - self->birthdate->year; if ((d->month < self->birthdate->month) || (d->month == self->birthdate->month) && (d->day < self->birthdate->day)) --a; return a; } Prof. Dr. Dietmar Seipel 987 Vorlesung Datenbanken Wintersemester 2012/13 method body grade_point_average: real in class Student { float sum=0.0; int count=0; struct { char gr; float ngrade; o2_Section sec; } t; for (t in self->transcript) { /* increment sum by ngrade, count by 1 */ sum += t->ngrade; ++count; } return sum/count; } Prof. Dr. Dietmar Seipel 988 Vorlesung Datenbanken Wintersemester 2012/13 method body change_major (new_major: Department): boolean in class Student { if (self->majors_in->remove_major(self)) { return 0; } else { new_major->add_major(self); self->majors_in=new_major; return 1; } } Prof. Dr. Dietmar Seipel 989 Vorlesung Datenbanken Wintersemester 2012/13 method body remove_major(s: Student): boolean in class Department { if (s in self->majors) { /* –= apply set difference to remove object s from set of majors */ self->majors –= set(s); return 1; } else return 0; } Prof. Dr. Dietmar Seipel 990 Vorlesung Datenbanken Wintersemester 2012/13 method body add_major(s: Student) in class Department { /* += apply set union to add object s to set of majors */ self->majors += set(s); } /* a persistent root to hold all persistent Person objects */ name All_Persons: set(Person); /* a persistent root to hold a single Person object */ name John_Smith: set(Person); Prof. Dr. Dietmar Seipel 991 Vorlesung Datenbanken Wintersemester 2012/13 run body { /* create a new Person object p */ o2 Person p = new Person; *p = tuple ( ssn: ”222222222”, name: tuple( firstname: ”Franklin”, middlename: ”T”, lastname: ”Wong” ), address: tuple( number: 638, street: ”Voss Road”, city: ”Houston”, state: ”Texas”, zipcode: ”77079” ), birthdate: tuple( year: 1945, month: 12, day: 8 ), sex: M ); /* p becomes persistent by attaching to persistent root */ All_Persons += set(p); Prof. Dr. Dietmar Seipel 992 Vorlesung Datenbanken Wintersemester 2012/13 /* now put values in persitent named object John_Smith */ John_Smith->ssn=”444444444”, John_Smith->name: tuple( firstname: ”John”, middlename: ”B”, lastname: ”Smith”), John_Smith->address: tuple( number: 731, street: ”Fondren Road”, city: ”Houston”, state: ”Texas”, zipcode: ”77036” ), John_Smith->birthdate: tuple( year: 1955, month: 1, day: 9 ), John_Smith->sex:M; } Prof. Dr. Dietmar Seipel 993 Vorlesung Datenbanken Wintersemester 2012/13 Queries select tuple ( fname: s.name.firstname, lname: s.name.lastname ) from s in Student where s.majors_in.dname = ”Computer Science” select tuple( fname: s.name.firstname, lname: s.name.lastname, transcript: select tuple( cname: sc.section.course.cname, sec_no: sc.section.sec_num, quarter: sc.section.qtr, year: sc.section.year, grade: sc.grade ) from sc in sec ) from s in Student, sec in s.transcript where s.majors_in.dname = ”Computer Science” Prof. Dr. Dietmar Seipel 994