12 downloads
24 Views
346KB Size
LINGI2172 - Databases S1b - Introduction UCLouvain - EPL/ICTEAM 2015-2016
Database Landscape
credits (2015) to blogs.the451group.com
Database Landscape in the late 90’s
credits (2015) to blogs.the451group.com
What is a Database? What is your experience with them?
What is no Database?
The File abstraction • • • • •
FILE *fopen(const char *filename, …) size_t fread(…, FILE *fp) site_t fwrite(…, FILE ^*fp) int fseek(File *fp, …) fclose(FILE *fp)
More importantly, the specification (e.g. PRE/POST) of each
Example int main() { struct studata { int stu_id; char stu_name[10]; }; int counter; FILE * filehandle; struct studata record; filehandle = fopen("data.txt","r"); for (counter=1; counter <= 10; counter++) { fread(&record,sizeof(struct studata),1,filehandle); } fclose(filehandle); return 0; }
What’s wrong with Files?
What’s wrong with Files? • Tight coupling between programs and physical data organization – One changes? the other too!
• The “model” of data is hardcoded – What if the program is lost?
• Low level from an information standpoint – The same data is often to be seen from many different point of views
• Hard to manage concurrent updates & failures – What if data is simultaneously used by many users? – What if the computer gets unplugged?
Lack of Physical Data Independence WHAT IF ? The file changes ? The program changes ?
Program
Data
Image extracted from “Integration Definition for Information Modeling (IDEF1X)” Standard, December 1993
Solution: Abstract Data Model
in the middle ANSI/SPARC Three Schema Architecture (1975)
Changes of data structures at the physical level should be kept transparent to the logical level
Image extracted from “Integration Definition for Information Modeling (IDEF1X)” Standard, December 1993
The Database should (!) cary
its own model Database (Management System)
Self-describing nature of a database “Catalog” in relational databases
Image extracted from “Integration Definition for Information Modeling (IDEF1X)” Standard, December 1993
It should be powerful enough and support multiple views Database (Management System)
e.g. The declarative query language of relational databases
Image extracted from “Integration Definition for Information Modeling (IDEF1X)” Standard, December
What’s wrong with Files? • Tight coupling between programs and physical data organization – One changes? the other too!
• The “model” of data is hardcoded – What if the program is lost?
• Low level from an information standpoint – The same data is often to be seen from many different point of views
• Hard to manage concurrent updates & failures – What if data is simultaneously used by many users? – What if the computer gets unplugged?
The ACID specification (1981) • Guarantee that concurrent database transactions are processed reliably – A tomicity: all (success) or nothing (fail) – C onsistency: from a valid state to another one – I solation: partial effects invisible before commit – D urability: stored permanently after commit
• Provides a simple specification for application programs to rely on Gray, Jim. "The transaction concept: Virtues and limitations.“ VLDB. Vol. 81. 1981.
These observations led to the Relational Model of Data (RM, early 70’s) • Edgar F. Codd (1923-2003) • Father of – The Relational Model (RM) – On-line Analytical Processing (OLAP)
• ACM Turing Award 82’
Codd, Edgar Frank. "Relational database: a practical foundation for productivity." ACM Turing Lecture, In: Communications of the ACM 25.2 (1982): 109-117.
Very brief overview of History 1950
1960
1970
1980
1990
2000
2010
Pre-relational Relational ERA Post-Relational
CODASYL
SQL
NoSQL NewSQL
A Database Definition • A Database is a collection of related data – Where ‘data’ means facts that can be recorded and have an implicit meaning
• The Universe of Discourse – Also called the Mini-World – It’s all about what information you need to capture R. Elmasri and S. Navathe, Database Systems, Models, Languages, Design and Application Programming, 6th Edition, Pearson Education, 2011
The Relational Model At a Glance COURSE DEPARTMENT
LINGI2172
DATABASES
…
SOFT.ENG
…
INGI
EPL/INGI
…
LINGI2255
ELEC
EPL/ELEC
…
LELEC2885 IMAGE PROC. OFFERING
INGI
LINGI2172
WEDN. 10h30
BARB01
ELEC
LELEC2885
MONDAY 14h
BARB13
EVALUATION
STUDENT
S1
LINGI2172
A
S1
SMITH
S2
LELC2885
C
S2
JONES
These are “Relations”, we represent them by tables
…
A Relation is all and only about
giving meaning to (stored) Data OFFERING INGI
LINGI2172
WEDN. 10h30
BARB01
ELEC
LELEC2885
MONDAY 14h
BARB13
INGI’s LINGI2172 course is dispensed on Wednesday 10h30 at Barb01 The Relational Model per se says nothing about • How relations are stored • How data is to be distributed over nodes/servers • What data types are available • How tuples (records) are ordered, indexed, etc.
BUT, the RM forbids this [ { deptcode: ‘INGI’, deptname: ‘EPL/INGI’, offers: [ { course: { code: ‘LINGI2172’, name: ‘DATABASES’ }, at: ‘WEDN. 10h30’, room: ‘BARB01’ }, ... ] }, { deptcode: ‘ELEC’, ... } ]
Data Independence, again
Expose Information to software layer(s)
Keep everything else hidden!
So, what happened ??? 1950
1960
1970
1980
1990
2000
2010
Pre-relational Relational ERA Post-Relational SQL
NoSQL NewSQL
Main causes for NoSQL/NewSQL • “Big Data” by Internet giants – Too much data for relational implementations (i.e. SQL products) to handle the load – By lack of … physical data independence!
• The “Relational pack” has been overused – Many software simply require (smart) storage – Not all software are multi-user – Sometimes, you can even depart from ACID
• The “Relational pack” requires upfront thinking – In a software engineering world with a strong “do then think” current motto
Main causes (ctnd.) • SQL has many flaws – E.g. lack of user-defined data types – Very old language, we know better about good language design principles these days
• Data independence is about weak coupling – Software engineering has evolved too, e.g. testing, agile techniques, architectural principles – e.g. you can meet the same objective by other means when your data layer offers a lower-level quality of service
Main causes (ctnd.) • The Relational Model is about working with high-level abstractions, on intent – For the sake of weak coupling and long-term maintenance – For meeting the needs of every user (even unknown ones) – At the expense of all of them (no one will be favoured regarding data access) – Yet, not compatible with the way most “new” businesses want to work (e.g. small startups)
Key Messages • Database definition – Focus on information, not (only) storage!
• Data Independence – Weak coupling for easier maintenance
• High-level Specification – Behavioral guarantees for meeting your requirements (e.g. ACID)
• Declarative vs. procedural – Information aims at being queried!