UCLouvain / ICTEAM, 2015-2016 Bernard Lambeau
[email protected]
LINGI2172 – Databases Relational Calculus & SQL
Textbook / Reading “SQL: A Comparative Survey”, H. Darwen
“Database Systems”, Elmasri & Navathe • Chapters 6 & 8 • Tuple and Domain Calculus, SQL
Basics of Tuple Calculus • Tuple variables – Ranges over a particular relation variable
• Satisfy COND(t) • Specify: – Range relation variable R of t – Select particular combinations of tuples – Set of attributes to be retrieved (requested attributes)
General form of Tuple Calculus • General expression of tuple relational calculus is of the form: { t1.Aj, t2.Ak, …, tn.Am | COND(t1 … tn) }
• Truth value of an atom – Evaluates to either TRUE or FALSE for a specific combination of tuples
• Formula (Boolean condition) – Made up of one or more atoms connected via logical operators AND, OR, and NOT
Towards first-order logic • Quantifiers from logic (in COND) – Universal quantifier (∀) – Existential quantifier (∃)
• Define a tuple variable in a formula as free or bound
Another Syntax & Example • Comparing Tutorial D with Tuple Calculus/SQL – In Hugh Darwen & Chris Date Books RANGEVAR C RANGES OVER COURSE; RANGEVAR I RANGES OVER IS_ENROLLED_ON; { I.StudentId, C.Title } WHERE I.CourseId = C.CourseId [ AND EXISTS/FORALL … ]
Calculus & Algebra are Equivalent Relational Algebra • You build the solution • Less declarative • Rather easy to implement – Operators map to what computers can easily do
X = 98 + 2
Tuple Calculus • You describe the solution • More declarative in style • Harder to implement – Transform to algebra
Find X such that X - 2 = 98
Precise equivalence criterion • A language is relationally complete if it is at least as powerful as Codd’s calculus • Codd showed that relational algebra is relationally complete – Codd’s reduction algorithm • A measure of the expressiveness of a language • !! At that time, on a simpler model !! – No support for tuple/relation-valued attributes – Tutorial D is more expressive than, e.g. SQL
(Postgre)SQL with_query is: with_query_name [ ( column_name [, ...] ) ] AS (select) select is: SELECT [ ALL | DISTINCT ] * | expression [ [ AS ] output_name ] [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition [, ...] ] [{ UNION | INTERSECT | EXCEPT } [ ALL ] select] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start ]
(Postgre)SQL (ctnd.) from_item can be one of: table_name [ [ AS ] alias ] ( select ) [ AS ] alias with_query_name [ [ AS ] alias ] from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ] join_type is one of: [ INNER | NATURAL | CROSS | LEFT [OUTER] | RIGHT | FULL ] JOIN condition is one of: any boolean expression expression [NOT] IN (subquery) EXISTS (subquery)
Let learn SQL through RA Logic AND
EXISTS
Tutorial D (relational) counterpart JOIN
WHERE (restriction, aka selection or σ) EXTEND (extension)
SUMMARIZE
and some more {...} (projection aka π)
OR UNION (AND) NOT MINUS (difference) NOT MATCHING (semi-difference) RENAME
JOIN (= AND) StudentId is called Name and is enrolled on CourseId
IS_CALLED JOIN IS_ENROLLED_ON SELECT C.student_id, C.name, I.course_id FROM is_called AS C NATURAL JOIN is_enrolled_on AS I
JOIN (= AND) StudentId is called Name and is enrolled on CourseId
IS_CALLED JOIN IS_ENROLLED_ON SELECT C.student_id, C.name, I.course_id FROM is_called AS C, is_enrolled_on AS I WHERE C.student_id = I.student_id
RENAME Sid1 is called Name IS_CALLED RENAME { StudentId as Sid1 SELECT C.student_id AS sid1, C.name FROM is_called AS C
}
RENAME + JOIN Sid1 is called Name and so is Sid2 IS_CALLED RENAME { StudentId AS Sid1 } JOIN IS_CALLED RENAME { StudentId AS Sid2 } SELECT C1.student_id AS sid1, C1.name, C2.student_id AS sid2 FROM is_called AS C1 JOIN is_called AS C2 ON C2.name = C1.name
Alternative, more SQL-minded Sid1 is called Name and so is Sid2 IS_CALLED RENAME { StudentId AS Sid1 } JOIN IS_CALLED RENAME { StudentId AS Sid2 } SELECT C1.student_id AS sid1, C1.name, C2.student_id AS sid2 FROM is_called AS C1, is_called AS C2 WHERE C2.name = C1.name
Projection (= EXISTS) StudentId is enrolled on some course. IS_ENROLLED_ON{StudentId} SELECT DISTINCT I.student_id FROM is_enrolled_on AS I
SQL’s EXISTS StudentId is enrolled on some course. IS_ENROLLED_ON{StudentId} SELECT S.student_id FROM students AS S WHERE EXISTS ( SELECT * FROM is_enrolled_on AS I WHERE I.student_id = S.student_id )
Under the assumption of a students relvar/table
Alternative with IN StudentId is enrolled on some course. IS_ENROLLED_ON{StudentId} SELECT S.student_id FROM students AS S WHERE S.student_id IN ( SELECT I.student_id FROM is_enrolled_on AS I)
Under the assumption of a students relvar/table
WHERE (= AND) StudentId is called Name and name is ‘Boris’. IS_CALLED WHERE Name=NAME('Boris') SELECT C.student_id, C.name FROM is_called AS C WHERE C.name = 'Boris'
WHERE (= AND) Sid1 has the same name as Sid2 (and Sid1 ≠ Sid2) (( ( IS_CALLED RENAME { StudentId AS Sid1 } ) JOIN ( IS_CALLED RENAME { StudentId AS Sid2 } ) ) WHERE Sid1 <> Sid2 ) { Sid1, Sid2 }
SELECT C1.student_id as C2.student_id as FROM is_called AS C1 JOIN is_called AS C2 ON AND NOT(C1.student_id =
sid1, sid2 C1.name = C2.name C2.student_id)
EXTEND (= AND) StudentId is called Name and Name begins with letter Initial. EXTEND IS_CALLED : { Initial := SUBSTRING (THE_NAME(Name),0,1) } SELECT C.student_id, C.name, substring(C.name, 1, 1) AS initial FROM is_called AS C
SUMMARIZE BY (= AND) The best exam mark for CourseId is TopScore. SUMMARIZE EXAM_MARK BY { CourseId } : { TopScore := MAX(Mark) } SELECT M.course_id, max(M.mark) AS top_score FROM exam_mark AS M GROUP BY M.course_id
SUMMARIZE BY (= AND) The best exam mark (among those greater than 50) for CourseId is TopScore. SUMMARIZE EXAM_MARK BY { CourseId } : { TopScore := MAX(Mark) } WHERE TopScore > 50 SELECT M.course_id, max(M.mark) AS top_score FROM exam_mark AS M GROUP BY M.course_id HAVING max(M.mark) > 50
SUMMARIZE BY (= AND) The best exam mark (among those greater than 50) for CourseId is TopScore. SUMMARIZE (EXAM_MARK WHERE Mark > 50) BY { CourseId } : { TopScore := MAX(Mark) } SELECT M.course_id, max(M.mark) AS top_score FROM exam_mark AS M WHERE M.mark > 50 GROUP BY M.course_id
SUMMARIZE PER (= AND) Takers is how many people took the exam on course CourseId. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() }
SELECT C.course_id, count(*) AS takers FROM course AS C JOIN exam_mark AS M ON M.course_id = C.course_id GROUP BY C.course_id WRONG RESULT !!!, missing C4
SUMMARIZE PER (= AND) Takers is how many people took the exam on course CourseId. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() } SELECT C.course_id, count(*) AS takers FROM course AS C LEFT JOIN exam_mark AS M ON C.course_id = M.course_id GROUP BY C.course_id
WRONG RESULT !!! C4 accounts for 1 taker
SUMMARIZE PER (= AND) Takers is how many people took the exam on course CourseId. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() }
SELECT C.course_id, (SELECT count(*) FROM exam_mark WHERE course_id=C.course_id) AS takers FROM course AS C
SUMMARIZE PER (= AND) Takers is how many people took the exam on course CourseId. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() } SELECT M.course_id, count(*) AS takers FROM exam_mark AS M GROUP BY M.course_id UNION SELECT C.course_id, 0 as takers FROM course AS C WHERE NOT EXISTS ( SELECT * FROM exam_mark WHERE course_id = C.course_id )
SUMMARIZE PER + WHERE Takers is how many people took the exam on course CourseId and Takers > 2. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() } WHERE Takers > 2 SELECT C.course_id, (SELECT count(*) FROM exam_mark WHERE course_id=C.course_id) AS takers FROM course AS C WHERE takers > 2
WILL FAIL
SUMMARIZE PER + WHERE Takers is how many people took the exam on course CourseId and Takers > 2. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() } WHERE Takers > 2 SELECT C.course_id, (SELECT count(*) FROM exam_mark WHERE course_id=C.course_id) AS takers FROM course AS C HAVING takers > 2
WILL FAIL
SUMMARIZE PER + WHERE Takers is how many people took the exam on course CourseId and Takers > 2. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() } WHERE Takers > 2 SELECT C.course_id, (SELECT count(*) FROM exam_mark WHERE course_id=C.course_id) AS takers FROM course AS C WHERE (SELECT count(*) FROM exam_mark WHERE course_id=C.course_id) > 2
SUMMARIZE PER + WHERE Takers is how many people took the exam on course CourseId and Takers > 2. SUMMARIZE EXAM_MARK PER (COURSE{CourseId}) : { Takers := COUNT() } WHERE Takers > 2 SELECT X.course_id, X.takers FROM ( SELECT C.course_id, (SELECT count(*) FROM exam_mark WHERE course_id=C.course_id) AS takers FROM course AS C ) AS X WHERE X.takers > 2
UNION (= OR) StudentId is called Devinder or is enrolled on course C1. (IS_CALLED WHERE Name=NAME('Devinder')){StudentId} UNION (IS_ENROLLED_ON WHERE CourseId=CID('C1')){StudentId}
SELECT C.student_id FROM is_called AS C WHERE C.name='Devinder' UNION
SELECT I.student_id FROM is_enrolled_on AS I WHERE I.course_id='C1'
BEWARE of SQL’s positional UNION
Alternative with OR StudentId is called Devinder or is enrolled on course C1. (IS_CALLED WHERE Name=NAME('Devinder')){StudentId} UNION (IS_ENROLLED_ON WHERE CourseId=CID('C1')){StudentId}
SELECT C.student_id FROM is_called AS C WHERE C.name='Devinder' OR EXISTS ( SELECT * FROM is_enrolled_on AS I WHERE I.student_id=C.student_id AND I.course_id='C1' )
MINUS (= NOT) StudentId is not enrolled in any course. IS_CALLED{StudentId} MINUS IS_ENROLLED_ON{StudentId}
SELECT C.student_id FROM is_called AS C EXCEPT SELECT I.student_id FROM is_enrolled_on AS I
Alternative with NOT EXISTS StudentId is not enrolled in any course. IS_CALLED{StudentId} MINUS IS_ENROLLED_ON{StudentId} SELECT C.student_id FROM is_called AS C WHERE NOT EXISTS ( SELECT * FROM is_enrolled_on AS I WHERE I.student_id=C.student_id )
NOT MATCHING (= AND NOT) StudentId is called Name and is not enrolled in any course. IS_CALLED NOT MATCHING IS_ENROLLED_ON SELECT C.student_id, C.name FROM is_called AS C WHERE NOT EXISTS ( SELECT * FROM is_enrolled_on AS I WHERE I.student_id=C.student_id )
NOT MATCHING (= AND NOT) StudentId is called Name and is not enrolled in any course. IS_CALLED NOT MATCHING IS_ENROLLED_ON SELECT C.student_id, C.name FROM is_called AS C WHERE C.student_id NOT IN ( SELECT I.student_id FROM is_enrolled_on AS I )
MATCHING (= AND EXISTS) StudentId is called Name and is enrolled on some course. IS_CALLED MATCHING IS_ENROLLED_ON SELECT C.student_id, C.name FROM is_called AS C WHERE EXISTS ( SELECT * FROM is_enrolled_on AS I WHERE I.student_id=C.student_id )
MATCHING (= AND EXISTS) StudentId is called Name and is enrolled on some course. IS_CALLED MATCHING IS_ENROLLED_ON SELECT C.student_id, C.name FROM is_called AS C WHERE C.student_id IN ( SELECT I.student_id FROM is_enrolled_on AS I )
RVA comparisons (= FORALL) StudentId is enrolled on every course. (IS_ENROLLED_ON GROUP {CourseId} as EnrolledOn) WHERE EnrolledOn = COURSE{CourseId}
SELECT I.student_id FROM is_enrolled_on AS I WHERE NOT EXISTS ( SELECT * FROM course AS C WHERE NOT EXISTS ( SELECT * FROM is_enrolled_on AS I2 WHERE I2.student_id = I.student_id AND I2.course_id = C.course_id ) )
RVA comparisons ctnd. StudentId is enrolled on exactly C1 and C2. (IS_ENROLLED_ON GROUP {CourseId} as EnrolledOn) WHERE EnrolledOn = Relation{ Tuple{ CourseId CID('C1') }, Tuple{ CourseId CID('C2') } }
SELECT I.student_id FROM is_enrolled_on AS I WHERE EXISTS ( … C1 … ) AND EXISTS ( … C2 … ) AND NOT EXISTS ( … no other one … )
Beware of corner cases ! StudentId is enrolled on every course. (IS_ENROLLED_ON GROUP {CourseId} as EnrolledOn) WHERE EnrolledOn = COURSE{CourseId}
Is wrong when there are no course at all. Why? Is the SQL version wrong too? Rewrite both of them for correctness.
Advanced exercice StudentId is enrolled on no course (IS_ENROLLED_ON GROUP {CourseId} as EnrolledOn) WHERE EnrolledOn = Relation{CourseID CID}{}
Wrong ! Students following no course are not in IS_ENROLLED_ON in the first place. Rewrite it with a GROUP_PER that you define yourself
(Postgre)SQL with_query is: with_query_name [ ( column_name [, ...] ) ] AS (select) select is: SELECT [ ALL | DISTINCT ] * | expression [ [ AS ] output_name ] [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition [, ...] ] [{ UNION | INTERSECT | EXCEPT } [ ALL ] select] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start ]
(Postgre)SQL (ctnd.) from_item can be one of: table_name [ [ AS ] alias ] ( select ) [ AS ] alias with_query_name [ [ AS ] alias ] from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ] join_type is one of: [ INNER | NATURAL | CROSS | LEFT [OUTER] | RIGHT | FULL ] JOIN condition is one of: any boolean expression expression [NOT] IN (subquery) EXISTS (subquery)