RDBMS
Relational Database Management System
Fall 2016
Muhammad Adeel Rajput
Scientist/Instector
Contents
• Introduction to Database management
systems
• RDBMS
• Entity Relationship Model
• Normalization
• Introduction to SQL
• File Organization & Indexing
• Database Administration
• Recovery
Data, Database and Database
Management System
Data
• Data is numerical, character or other symbols which can be recorded in a
form suitable for processing by a computer. (e.g. names and addresses of
students enrolling onto a university course).
Database
• A Database is a collection of related data (such as an enrolling students
data) arranged for speedy search and retrieval.
Database Management System
• A Database Management System is a collection of programs that allows
users to specify the structure of a database, to create, query and modify the
data in the database and to control access to it. (e.g. limit access to the
database so that only relevant staff can access details of enrolling
students).
PREVIOUS HISTORY OF DBMS
 Before the concept of DBMS, they used to store the data
(i.e. information) in the form of written copies and store
them....
 This made the retrieve process very difficult.
 It made wastage of paper, files, storage and precious time.
What is DBMS ?
• A set of programs to access the
interrelated data.
• DBMS contains information about a
particular enterprise.
• Computerized record keeping system.
• Provides convenient environment to user
to perform operations:
-Creation, Insertion, Deletion,
Updating & Retrieval of information.
Examples of DBMS
• Some of the common used DBMSs are:
-Oracle, IBM’s DB2, Microsoft’s SQL Server,
MS-Access and Informix.
• Some of the desktop based DBMSs are:
-Microsoft FoxPro, Borland dBase and
Microsoft Access.
Advantages of DBMS
• Controlling Data Redundancy: Data is recorded in only
one place in the database and it is not duplicated.
• Data Consistency: Data item appears only once, and the
updated value is immediately available to all users.
• Control Over Concurrency : In a computer file-based
system in updating, one may overwrite the values recorded
by the other.
• Backup and Recovery Procedures: automatically
create the backup of data and restore data if required.
• Data Independence: Separation of data structure of
database from application program that uses the data is
called data independence.
Disadvantages of DBMS
• Cost of Hardware and Software: Processor with high speed of data
processing and memory of large size is required.
• Cost of Data Conversion: Very difficult and costly method to convert
data of data file into database.
• Cost of Staff Training: A lot of amount for the training of staff to run
the DBMS.
• Appointing Technical Staff: Trained technical persons such as
database administrator, application programmers, data entry operators
etc. are required to handle the DBMS.
• Database Damage: All data is integrated into a single database. If
database is damaged due to electric failure or database is corrupted
on the storage media, then your valuable data may be lost forever.
Applications of DBMS....
 Banking: for transactions
 Airlines: reservation and schedules
 Tele communications: for retrieving data of user
 Credit card: for transactions
 Universities: registration, retrieving marks, applications,
grades
 Human resources: employee records, salaries, tax
deductions,
RDBMS...
• Most popular database system.
• Simple and sound theoretical basis.
• Developed by E F Codd in the early 1970's.
• The model is based on tables, rows and columns and the
manipulation of data stored within.
• Relational database is a collection of these tables.
• First commercial system: MULTICS in 1978.
• Has overtaken Hierarchical and Network models.
• Main feature: Single database can be spread across several tables.
• Examples include: Oracle, IBM's DB2, Sybase, MySQL & Microsoft
Access.
RDBMS Advantages
• Increases the sharing of data and faster development of new applications
• Support a simple data structure, namely tables or relations
• Limit redundancy or replication of data
• Better integrity as data inconsistencies are avoided by storing data in one
place
• Provide physical data independence so users do not have to be aware of
underlying objects
• Offer logical database independence - data can be viewed in different ways
by different users.
• Expandability is relatively easy to achieve by adding new views of the data
as they are required.
• Support one off queries using SQL or other appropriate language.
• Better backup and recovery procedures
• Provides multiple interfaces
• Solves many problems created by other data models
• The ability to handle efficiently simple data types
• Multiple users can access which is not possible in DBMS
RDBMS Disadvantages
• Software is expensive
• Complex software means expensive hardware
• Requires skilled knowledge to implement
• Certain applications are slower processing
• Increased vulnerability
• More difficult to recover if data is lost
• Seen as a poor representation of the real world
• Difficult to represent hierarchies
• Difficult to represent complex data type.
Schema:
- Logical structure of the database.
- Doesn’t show the data in database.
- Classification:
1. Physical
2. Conceptual
3. External
Cont…
1. Physical Schema:
-Describes the physical storage of database.
-Not in terms of blocks or devices, but describes organization of files, access
path etc.
2. Conceptual Schema:
-Describes structure of whole database.
-Describes entities their relationships
and constraints.
3. External Schema:
-Provides a user’s view of data.
-Shows relevant info particular to
user, hides rest of the info.
-one or more levels.
 Instances: Actual data contained in
database at a particular point of time.
Differences betwen DBMS and
RDBMS
DBMS
• Data is stored in a single large table
• Single record modification affects the whole database
RDBMS (Codd 1980)
• Database is 'broken down' into smaller pieces
• The changes will not affect the entire database
The Future of RDBMS
• It is very difficult to see where RDBMS’s could go because alternatives and
new versions such as XML have been introduced. From delving through
forums it is plain to see that many people do not only prefer standard
RDBMS’s but do not even like the alternatives.
• Academics refer to relation models “It's such a simple and elegant model
that it can never become unfashionable”. - Andre Naess.
• Luke you may want to put something here about the incorporation of other
software such as CAD. etc... Hope these slides are ok. Got my script
written as well but you may want me to read different stuff out. X x x
Entity-Relationship (E-R) Model
College Principal
College
Student C
Student A
Student B
College 3
College 2
College 1
Course C
Course B
Course A
Student Course
Admissio
n
Stud_Nam
e
Stud_Roll
No
Course_Id
Course_Na
me
Relationships
E-R diagram
Normalization
• It is a technique for designing relational database
tables to minimize duplication of information.
• Normalization is a practice to safeguard the database
against logical and structural anomalies.
• Normalization is also termed as canonical synthesis by
the experts.
• It is used to keep data consistent and check that no
loss of data as well as data integrity is there.
• Its complexity may lead to higher degree of join
operations which sometimes lead to the degraded
throughput times.
• The normal forms like 1NF, 2NF, 3NF, BCNF, 4NF,
5NF, DKNF & 6NF are in practice.
DATABASE LANGUAGE
SQL…
The most basic Oracle Database utility
A Basic command-line interface
The first thing you work with it in Oracle DBMS
What is SQL?
– When a user wants to get some
information from a database file, he can
issue a query.
– A query is a user–request to retrieve
data or information with a certain
condition.
– SQL is a query language that allows user
to specify the conditions. (instead of
algorithms)
Introduction to SQL…
Concept of SQL
– The user specifies a certain condition.
– The result of the query will then be
stored in form of a table.
– Statistical information of the data.
– The program will go through all the
records in the database file and select
those records that satisfy the
condition.(searching).
Structured Query Language ...
Relational Database Management Systems
use the language known as SQL
SQL is a simple programming language used
for accessing and managing data in relational
databases.
Developed by IBM in 1970 to support its
various relational products.
29
File Organization
• The physical arrangement of data in a file into records and pages on
the disk
• File organization determines the set of access methods for
– Storing and retrieving records from a file
• Therefore, ‘file organization’ synonymous with ‘access method’
• We study three types of file organization
– Unordered or Heap files
– Ordered or sequential files
– Hash files
• We examine each of them in terms of the operations we perform on
the database
– Insert a new record
– Search for a record (or update a record)
– Delete a record
30
Unordered Or Heap File
• Records are stored in the same order in which they are
created
• Insert operation
– Fast – because the incoming record is written at the end of the
last page of the file
• Search (or update) operation
– Slow – because linear search is performed on pages
• Delete Operation
– Slow – because the record to be deleted is first searched for
– Deleting the record creates a hole in the page
– Periodic file compacting work required to reclaim the wasted
space
31
Ordered or Sequential File
• Records are sorted on the values of one or more fields
– Ordering field – the field on which the records are sorted
– Ordering key – the key of the file when it is used for record sorting
• Search (or update) Operation
– Fast – because binary search is performed on sorted records
– Update the ordering field?
• Delete Operation
– Fast – because searching the record is fast
– Periodic file compacting work is, of course, required
• Insert Operation
– Poor – because if we insert the new record in the correct position we need to
shift all the subsequent records in the file
– Alternatively an ‘overflow file’ is created which contains all the new records as a
heap
– Periodically overflow file is merged with the main file
– If overflow file is created search and delete operations for records in the overflow
file have to be linear!
32
Hash File
• Is an array of buckets
– Given a record, r a hash function, h(r) computes the index of the
bucket in which record r belongs
– h uses one or more fields in the record called hash fields
– Hash key - the key of the file when it is used by the hash function
• Example hash function
– Assume that the staff last name is used as the hash field
– Assume also that the hash file size is 26 buckets - each bucket
corresponding to each of the letters from the alphabet
– Then a hash function can be defined which computes the bucket
address (index) based on the first letter in the last name.
33
Hash File (2)
• Insert Operation
– Fast – because the hash function computes the index
of the bucket to which the record belongs
• If that bucket is full you go to the next free one
• Search Operation
– Fast – because the hash function computes the index
of the bucket
• Performance may degrade if the record is not found in the
bucket suggested by hash function
• Delete Operation
– Fast – once again for the same reason of hashing
function being able to locate the record quick
34
Indexing
• Can we do anything else to improve query performance other than
selecting a good file organization?
• Yes, the answer lies in indexing
• Index - a data structure that allows the DBMS to locate particular
records in a file more quickly
– Very similar to the index at the end of a book to locate various topics
covered in the book
• Types of Index
– Primary index – one primary index per file
– Clustering index – one clustering index per file – data file is ordered on
a non-key field and the index file is built on that non-key field
– Secondary index – many secondary indexes per file
• Sparse index – has only some of the search key values in the file
• Dense index – has an index corresponding to every search key
value in the file
35
Primary Indexes
• The data file is sequentially ordered on the key field
• Index file stores all (dense) or some (sparse) values of
the key field and the page number of the data file in which
the corresponding record is stored
B002 1
B003 1
B004 2
B005 2
B007 3
Branch
BranchNo Street City Postcode
B002 56 Clover Dr London NW10 6EU
B003 163 Main St Glasgow G11 9QX
B004 32 Manse Rd Bristol BS99 1NZ
B005 22 Deer Rd London SW1 4EH
B007 16 Argyll St Aberdeen AB2 3SU
Branch B002 record
Branch B003 record
Branch B004 record
Branch B005 record
Branch B007 record
1
2
3
4
36
Indexed Sequential Access Method
• ISAM – Indexed sequential access method
is based on primary index
• Default access method or table type in
MySQL, MyISAM is an extension of ISAM
• Insert and delete operations disturb the
sorting
– You need an overflow file which periodically
needs to be merged with the main file
37
Secondary Indexes
• An index file that uses a non primary field as an
index e.g. City field in the branch table
• They improve the performance of queries that
use attributes other than the primary key
• You can use a separate index for every attribute
you wish to use in the WHERE clause of your
select query
• But there is the overhead of maintaining a large
number of these indexes
• Database administration is the function of managing
and maintaining database management
systems (DBMS) software. Mainstream DBMS software
such as Oracle, IBM DB2 and Microsoft SQL
Server need ongoing management. As such,
corporations that use DBMS software often hire
specialized IT (Information Technology) personnel
called Database Administrators or DBAs.
• Installation, configuration and upgrading of Database server
software and related products.
• Evaluate Database features and Database related products.
• Establish and maintain sound backup and recovery policies and
procedures.
• Take care of the Database design and implementation.
• Implement and maintain database security (create and maintain
users and roles, assign privileges).
• Database tuning and performance monitoring.
• Application tuning and performance monitoring.
• Setup and maintain documentation and standards.
• Plan growth and changes (capacity planning).
• Work as part of a team and provide 24x7 support when required.
• Do general technical troubleshooting and give cons.
• Database recovery.
There are three types of DBAs:
• Systems DBAs: (also referred to as Physical DBAs, Operations DBAs or Production Support
DBAs): focus on the physical aspects of database administration such as DBMS installation,
configuration, patching, upgrades, backups, restores, refreshes, performance optimization,
maintenance and disaster recovery.
• Development DBAs: focus on the logical and development aspects of database administration
such as data model design and maintenance, DDL (data definition language) generation, SQL
writing and tuning, coding stored procedures, collaborating with developers to help choose the most
appropriate DBMS feature/functionality and other pre-production activities.
• Application DBAs: usually found in organizations that have purchased 3rd party application
software such as ERP (enterprise resource planning) and CRM (customer relationship
management) systems. Examples of such application software includes Oracle Applications, Siebel
and PeopleSoft (both now part of Oracle Corp.) and SAP. Application DBAs straddle the fence
between the DBMS and the application software and are responsible for ensuring that the
application is fully optimized for the database and vice versa. They usually manage all the
application components that interact with the database and carry out activities such as application
installation and patching, application upgrades, database cloning, building and running data
cleanup routines, data load process management, etc.
While individuals usually specialize in one type of database administration, in smaller organizations, it is
not uncommon to find a single individual or group performing more than one type of database
administration.
We give a short overview of how recovery might be implemented:
• Requirements for recovery
• A practical approach to recovery – keep a recovery log – must be write-
ahead
• Example showing system components with values in DB and in-memory
cache
• Checkpoint procedure: to aid processing of the very large recovery log
• Transaction categories for recovery
• An algorithm for the recovery manager
• Media failure, e.g. disc-head crash.
Part of persistent store is lost – need to restore it.
Transactions in progress may be using this area – abort uncommitted transactions.
• System failure e.g. crash - main memory lost.
Persistent store is not lost but may have been changed by uncommitted transactions.
Also, committed transactions’ effects may not yet have reached persistent objects.
• Transaction abort
Need to undo any changes made by the aborted transaction.
Our object model assumed all invocations are recorded with the object.
It was not made clear how this was to be implemented – synchronously in persistent store?
We need to optimise for performance reasons - not write-out every operation synchronously.
We consider one method – a recovery log. i.e. update data objects in place in persistent store, as
and when appropriate, and make a (recovery) log of the updates.
1. Assume a periodic (daily?) dump of the database (e.g. Op. Sys. backup)
2. Assume that a record of every change to the database is written to a log
{transaction-ID, data-object-ID, operation (arguments), old value, new value }
3. If a failure occurs the log can be used by the Recovery manager to REDO or UNDO
selected operations. UNDO and REDO must be idempotent (repeatable), e.g. contain before
and after values, not just “add 3”. Further crashes might occur at any time.
Transaction abort:
UNDO the operations – roll back the transaction
System failure
AIM: REDO committed transactions, UNDO uncommitted transactions
Media failure
reload the database from the last dump
REDO the operations of all the transactions that committed since then
But the log is very large to search for this information
so, to assist rapid recovery, take a CHECKPOINT at “small” time intervals
e.g. after 5 mins or after n log items – see 15
Two distinct operations:
• write a change to an object in the database
• write the log record of the change
A failure could occur between them – in which order should they be done?
If an object is updated in the database, there is no record of the previous value,
so no means of UNDOing the operation on abort.
The log must be written first.
Also, a transaction is not allowed to commit
until the log records for all its operations have been written out to the log.
Note: we can’t, and needn’t, take time to update in the database on every commit
the (few) objects involved in a transaction.
Note: a log can be written efficiently, because:
• there are enough records from the many transactions in progress at any time,
• the writes are to one place – the log file.
From 13:
The log is very large to search for this information on transactions
especially for abort of a single transaction,
so take a CHECKPOINT at “small” time intervals
e.g. After 5 mins or after n log items.
Checkpoint procedure :
• Force-write any log records in main memory out to the log (OS must do this)
• Force-write a checkpoint record to the log, containing:
- list of all transactions active (started but not committed) at the time of the checkpoint
- address within the log of each transaction’s most recent log record
- note: the log records of a given transaction are chained
• Force-write database buffers (database updates still in main memory) out to the database.
• Write the address of the checkpoint record within the log into a restart file.
the data manager keeps
object updates and log records
in its cache in main memory
main memory
log records
T1: x, add(1), 2 ->3
T2: a, add(2) 7->9
object values
x = 3
a = 9
persistent memory
log file
... many previous records ...
T1: x, add(1), 2 ->3
T2: a, add(2) 7->9
persistent system state
object values
x = 2
a = 7
checkpoint record
active Txs T1, T2
T1 most recent log location
T2 most recent log location
restart file
has the locations
of checkpoint records
in the log file
Checkpoint record says T2 and T3 are active
T1: its log records were written out before commit.
Any remaining DB updates were written out at checkpoint time. No action required.
T2: any updates made after the checkpoint are in the log and can be re-applied (REDO)
T4: log records are written on commit – can be re-applied (REDO is idempotent)
T3 and T5: any changes that might have been made can be found in the log
and previous state recovered (undone using UNDO operation)
T3 requires log to be searched before the checkpoint
– checkpoint contains pointer to previous log record.
Time checkpoint time failure time
T1: no action
T5
T4
T3
T2
T1
T2: REDO from checkpoint
T3: UNDO all
T4: REDO
T5: UNDO
Keeps: UNDO list - initially contains all transactions listed in the checkpoint record
REDO list – initially empty
Searches forward through the log starting from the checkpoint record, to the end of the log
• If it finds a start-transaction record it adds that transaction to the UNDO list
• If it finds a commit record it moves that transaction from the UNDO list to the REDO list
Then, works backwards through the log
UNDOing transactions on the UNDO list (restores state)
Finally, works forward again through the log
REDOing transactions on the REDO list

Rdbms

  • 1.
    RDBMS Relational Database ManagementSystem Fall 2016 Muhammad Adeel Rajput Scientist/Instector
  • 2.
    Contents • Introduction toDatabase management systems • RDBMS • Entity Relationship Model • Normalization • Introduction to SQL • File Organization & Indexing • Database Administration • Recovery
  • 4.
    Data, Database andDatabase Management System Data • Data is numerical, character or other symbols which can be recorded in a form suitable for processing by a computer. (e.g. names and addresses of students enrolling onto a university course). Database • A Database is a collection of related data (such as an enrolling students data) arranged for speedy search and retrieval. Database Management System • A Database Management System is a collection of programs that allows users to specify the structure of a database, to create, query and modify the data in the database and to control access to it. (e.g. limit access to the database so that only relevant staff can access details of enrolling students).
  • 5.
    PREVIOUS HISTORY OFDBMS  Before the concept of DBMS, they used to store the data (i.e. information) in the form of written copies and store them....  This made the retrieve process very difficult.  It made wastage of paper, files, storage and precious time.
  • 6.
    What is DBMS? • A set of programs to access the interrelated data. • DBMS contains information about a particular enterprise. • Computerized record keeping system. • Provides convenient environment to user to perform operations: -Creation, Insertion, Deletion, Updating & Retrieval of information.
  • 7.
    Examples of DBMS •Some of the common used DBMSs are: -Oracle, IBM’s DB2, Microsoft’s SQL Server, MS-Access and Informix. • Some of the desktop based DBMSs are: -Microsoft FoxPro, Borland dBase and Microsoft Access.
  • 8.
    Advantages of DBMS •Controlling Data Redundancy: Data is recorded in only one place in the database and it is not duplicated. • Data Consistency: Data item appears only once, and the updated value is immediately available to all users. • Control Over Concurrency : In a computer file-based system in updating, one may overwrite the values recorded by the other. • Backup and Recovery Procedures: automatically create the backup of data and restore data if required. • Data Independence: Separation of data structure of database from application program that uses the data is called data independence.
  • 9.
    Disadvantages of DBMS •Cost of Hardware and Software: Processor with high speed of data processing and memory of large size is required. • Cost of Data Conversion: Very difficult and costly method to convert data of data file into database. • Cost of Staff Training: A lot of amount for the training of staff to run the DBMS. • Appointing Technical Staff: Trained technical persons such as database administrator, application programmers, data entry operators etc. are required to handle the DBMS. • Database Damage: All data is integrated into a single database. If database is damaged due to electric failure or database is corrupted on the storage media, then your valuable data may be lost forever.
  • 10.
    Applications of DBMS.... Banking: for transactions  Airlines: reservation and schedules  Tele communications: for retrieving data of user  Credit card: for transactions  Universities: registration, retrieving marks, applications, grades  Human resources: employee records, salaries, tax deductions,
  • 12.
    RDBMS... • Most populardatabase system. • Simple and sound theoretical basis. • Developed by E F Codd in the early 1970's. • The model is based on tables, rows and columns and the manipulation of data stored within. • Relational database is a collection of these tables. • First commercial system: MULTICS in 1978. • Has overtaken Hierarchical and Network models. • Main feature: Single database can be spread across several tables. • Examples include: Oracle, IBM's DB2, Sybase, MySQL & Microsoft Access.
  • 13.
    RDBMS Advantages • Increasesthe sharing of data and faster development of new applications • Support a simple data structure, namely tables or relations • Limit redundancy or replication of data • Better integrity as data inconsistencies are avoided by storing data in one place • Provide physical data independence so users do not have to be aware of underlying objects • Offer logical database independence - data can be viewed in different ways by different users. • Expandability is relatively easy to achieve by adding new views of the data as they are required. • Support one off queries using SQL or other appropriate language. • Better backup and recovery procedures • Provides multiple interfaces • Solves many problems created by other data models • The ability to handle efficiently simple data types • Multiple users can access which is not possible in DBMS
  • 14.
    RDBMS Disadvantages • Softwareis expensive • Complex software means expensive hardware • Requires skilled knowledge to implement • Certain applications are slower processing • Increased vulnerability • More difficult to recover if data is lost • Seen as a poor representation of the real world • Difficult to represent hierarchies • Difficult to represent complex data type.
  • 15.
    Schema: - Logical structureof the database. - Doesn’t show the data in database. - Classification: 1. Physical 2. Conceptual 3. External
  • 16.
    Cont… 1. Physical Schema: -Describesthe physical storage of database. -Not in terms of blocks or devices, but describes organization of files, access path etc. 2. Conceptual Schema: -Describes structure of whole database. -Describes entities their relationships and constraints. 3. External Schema: -Provides a user’s view of data. -Shows relevant info particular to user, hides rest of the info. -one or more levels.  Instances: Actual data contained in database at a particular point of time.
  • 17.
    Differences betwen DBMSand RDBMS DBMS • Data is stored in a single large table • Single record modification affects the whole database RDBMS (Codd 1980) • Database is 'broken down' into smaller pieces • The changes will not affect the entire database
  • 18.
    The Future ofRDBMS • It is very difficult to see where RDBMS’s could go because alternatives and new versions such as XML have been introduced. From delving through forums it is plain to see that many people do not only prefer standard RDBMS’s but do not even like the alternatives. • Academics refer to relation models “It's such a simple and elegant model that it can never become unfashionable”. - Andre Naess. • Luke you may want to put something here about the incorporation of other software such as CAD. etc... Hope these slides are ok. Got my script written as well but you may want me to read different stuff out. X x x
  • 20.
    Entity-Relationship (E-R) Model CollegePrincipal College Student C Student A Student B College 3 College 2 College 1 Course C Course B Course A Student Course Admissio n Stud_Nam e Stud_Roll No Course_Id Course_Na me Relationships E-R diagram
  • 22.
    Normalization • It isa technique for designing relational database tables to minimize duplication of information. • Normalization is a practice to safeguard the database against logical and structural anomalies. • Normalization is also termed as canonical synthesis by the experts. • It is used to keep data consistent and check that no loss of data as well as data integrity is there. • Its complexity may lead to higher degree of join operations which sometimes lead to the degraded throughput times. • The normal forms like 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, DKNF & 6NF are in practice.
  • 23.
  • 24.
    SQL… The most basicOracle Database utility A Basic command-line interface The first thing you work with it in Oracle DBMS
  • 25.
    What is SQL? –When a user wants to get some information from a database file, he can issue a query. – A query is a user–request to retrieve data or information with a certain condition. – SQL is a query language that allows user to specify the conditions. (instead of algorithms)
  • 26.
    Introduction to SQL… Conceptof SQL – The user specifies a certain condition. – The result of the query will then be stored in form of a table. – Statistical information of the data. – The program will go through all the records in the database file and select those records that satisfy the condition.(searching).
  • 27.
    Structured Query Language... Relational Database Management Systems use the language known as SQL SQL is a simple programming language used for accessing and managing data in relational databases. Developed by IBM in 1970 to support its various relational products.
  • 29.
    29 File Organization • Thephysical arrangement of data in a file into records and pages on the disk • File organization determines the set of access methods for – Storing and retrieving records from a file • Therefore, ‘file organization’ synonymous with ‘access method’ • We study three types of file organization – Unordered or Heap files – Ordered or sequential files – Hash files • We examine each of them in terms of the operations we perform on the database – Insert a new record – Search for a record (or update a record) – Delete a record
  • 30.
    30 Unordered Or HeapFile • Records are stored in the same order in which they are created • Insert operation – Fast – because the incoming record is written at the end of the last page of the file • Search (or update) operation – Slow – because linear search is performed on pages • Delete Operation – Slow – because the record to be deleted is first searched for – Deleting the record creates a hole in the page – Periodic file compacting work required to reclaim the wasted space
  • 31.
    31 Ordered or SequentialFile • Records are sorted on the values of one or more fields – Ordering field – the field on which the records are sorted – Ordering key – the key of the file when it is used for record sorting • Search (or update) Operation – Fast – because binary search is performed on sorted records – Update the ordering field? • Delete Operation – Fast – because searching the record is fast – Periodic file compacting work is, of course, required • Insert Operation – Poor – because if we insert the new record in the correct position we need to shift all the subsequent records in the file – Alternatively an ‘overflow file’ is created which contains all the new records as a heap – Periodically overflow file is merged with the main file – If overflow file is created search and delete operations for records in the overflow file have to be linear!
  • 32.
    32 Hash File • Isan array of buckets – Given a record, r a hash function, h(r) computes the index of the bucket in which record r belongs – h uses one or more fields in the record called hash fields – Hash key - the key of the file when it is used by the hash function • Example hash function – Assume that the staff last name is used as the hash field – Assume also that the hash file size is 26 buckets - each bucket corresponding to each of the letters from the alphabet – Then a hash function can be defined which computes the bucket address (index) based on the first letter in the last name.
  • 33.
    33 Hash File (2) •Insert Operation – Fast – because the hash function computes the index of the bucket to which the record belongs • If that bucket is full you go to the next free one • Search Operation – Fast – because the hash function computes the index of the bucket • Performance may degrade if the record is not found in the bucket suggested by hash function • Delete Operation – Fast – once again for the same reason of hashing function being able to locate the record quick
  • 34.
    34 Indexing • Can wedo anything else to improve query performance other than selecting a good file organization? • Yes, the answer lies in indexing • Index - a data structure that allows the DBMS to locate particular records in a file more quickly – Very similar to the index at the end of a book to locate various topics covered in the book • Types of Index – Primary index – one primary index per file – Clustering index – one clustering index per file – data file is ordered on a non-key field and the index file is built on that non-key field – Secondary index – many secondary indexes per file • Sparse index – has only some of the search key values in the file • Dense index – has an index corresponding to every search key value in the file
  • 35.
    35 Primary Indexes • Thedata file is sequentially ordered on the key field • Index file stores all (dense) or some (sparse) values of the key field and the page number of the data file in which the corresponding record is stored B002 1 B003 1 B004 2 B005 2 B007 3 Branch BranchNo Street City Postcode B002 56 Clover Dr London NW10 6EU B003 163 Main St Glasgow G11 9QX B004 32 Manse Rd Bristol BS99 1NZ B005 22 Deer Rd London SW1 4EH B007 16 Argyll St Aberdeen AB2 3SU Branch B002 record Branch B003 record Branch B004 record Branch B005 record Branch B007 record 1 2 3 4
  • 36.
    36 Indexed Sequential AccessMethod • ISAM – Indexed sequential access method is based on primary index • Default access method or table type in MySQL, MyISAM is an extension of ISAM • Insert and delete operations disturb the sorting – You need an overflow file which periodically needs to be merged with the main file
  • 37.
    37 Secondary Indexes • Anindex file that uses a non primary field as an index e.g. City field in the branch table • They improve the performance of queries that use attributes other than the primary key • You can use a separate index for every attribute you wish to use in the WHERE clause of your select query • But there is the overhead of maintaining a large number of these indexes
  • 39.
    • Database administrationis the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized IT (Information Technology) personnel called Database Administrators or DBAs.
  • 40.
    • Installation, configurationand upgrading of Database server software and related products. • Evaluate Database features and Database related products. • Establish and maintain sound backup and recovery policies and procedures. • Take care of the Database design and implementation. • Implement and maintain database security (create and maintain users and roles, assign privileges). • Database tuning and performance monitoring. • Application tuning and performance monitoring. • Setup and maintain documentation and standards. • Plan growth and changes (capacity planning). • Work as part of a team and provide 24x7 support when required. • Do general technical troubleshooting and give cons. • Database recovery.
  • 41.
    There are threetypes of DBAs: • Systems DBAs: (also referred to as Physical DBAs, Operations DBAs or Production Support DBAs): focus on the physical aspects of database administration such as DBMS installation, configuration, patching, upgrades, backups, restores, refreshes, performance optimization, maintenance and disaster recovery. • Development DBAs: focus on the logical and development aspects of database administration such as data model design and maintenance, DDL (data definition language) generation, SQL writing and tuning, coding stored procedures, collaborating with developers to help choose the most appropriate DBMS feature/functionality and other pre-production activities. • Application DBAs: usually found in organizations that have purchased 3rd party application software such as ERP (enterprise resource planning) and CRM (customer relationship management) systems. Examples of such application software includes Oracle Applications, Siebel and PeopleSoft (both now part of Oracle Corp.) and SAP. Application DBAs straddle the fence between the DBMS and the application software and are responsible for ensuring that the application is fully optimized for the database and vice versa. They usually manage all the application components that interact with the database and carry out activities such as application installation and patching, application upgrades, database cloning, building and running data cleanup routines, data load process management, etc. While individuals usually specialize in one type of database administration, in smaller organizations, it is not uncommon to find a single individual or group performing more than one type of database administration.
  • 43.
    We give ashort overview of how recovery might be implemented: • Requirements for recovery • A practical approach to recovery – keep a recovery log – must be write- ahead • Example showing system components with values in DB and in-memory cache • Checkpoint procedure: to aid processing of the very large recovery log • Transaction categories for recovery • An algorithm for the recovery manager
  • 44.
    • Media failure,e.g. disc-head crash. Part of persistent store is lost – need to restore it. Transactions in progress may be using this area – abort uncommitted transactions. • System failure e.g. crash - main memory lost. Persistent store is not lost but may have been changed by uncommitted transactions. Also, committed transactions’ effects may not yet have reached persistent objects. • Transaction abort Need to undo any changes made by the aborted transaction. Our object model assumed all invocations are recorded with the object. It was not made clear how this was to be implemented – synchronously in persistent store? We need to optimise for performance reasons - not write-out every operation synchronously. We consider one method – a recovery log. i.e. update data objects in place in persistent store, as and when appropriate, and make a (recovery) log of the updates.
  • 45.
    1. Assume aperiodic (daily?) dump of the database (e.g. Op. Sys. backup) 2. Assume that a record of every change to the database is written to a log {transaction-ID, data-object-ID, operation (arguments), old value, new value } 3. If a failure occurs the log can be used by the Recovery manager to REDO or UNDO selected operations. UNDO and REDO must be idempotent (repeatable), e.g. contain before and after values, not just “add 3”. Further crashes might occur at any time. Transaction abort: UNDO the operations – roll back the transaction System failure AIM: REDO committed transactions, UNDO uncommitted transactions Media failure reload the database from the last dump REDO the operations of all the transactions that committed since then But the log is very large to search for this information so, to assist rapid recovery, take a CHECKPOINT at “small” time intervals e.g. after 5 mins or after n log items – see 15
  • 46.
    Two distinct operations: •write a change to an object in the database • write the log record of the change A failure could occur between them – in which order should they be done? If an object is updated in the database, there is no record of the previous value, so no means of UNDOing the operation on abort. The log must be written first. Also, a transaction is not allowed to commit until the log records for all its operations have been written out to the log. Note: we can’t, and needn’t, take time to update in the database on every commit the (few) objects involved in a transaction. Note: a log can be written efficiently, because: • there are enough records from the many transactions in progress at any time, • the writes are to one place – the log file.
  • 47.
    From 13: The logis very large to search for this information on transactions especially for abort of a single transaction, so take a CHECKPOINT at “small” time intervals e.g. After 5 mins or after n log items. Checkpoint procedure : • Force-write any log records in main memory out to the log (OS must do this) • Force-write a checkpoint record to the log, containing: - list of all transactions active (started but not committed) at the time of the checkpoint - address within the log of each transaction’s most recent log record - note: the log records of a given transaction are chained • Force-write database buffers (database updates still in main memory) out to the database. • Write the address of the checkpoint record within the log into a restart file.
  • 48.
    the data managerkeeps object updates and log records in its cache in main memory main memory log records T1: x, add(1), 2 ->3 T2: a, add(2) 7->9 object values x = 3 a = 9 persistent memory log file ... many previous records ... T1: x, add(1), 2 ->3 T2: a, add(2) 7->9 persistent system state object values x = 2 a = 7 checkpoint record active Txs T1, T2 T1 most recent log location T2 most recent log location restart file has the locations of checkpoint records in the log file
  • 49.
    Checkpoint record saysT2 and T3 are active T1: its log records were written out before commit. Any remaining DB updates were written out at checkpoint time. No action required. T2: any updates made after the checkpoint are in the log and can be re-applied (REDO) T4: log records are written on commit – can be re-applied (REDO is idempotent) T3 and T5: any changes that might have been made can be found in the log and previous state recovered (undone using UNDO operation) T3 requires log to be searched before the checkpoint – checkpoint contains pointer to previous log record. Time checkpoint time failure time T1: no action T5 T4 T3 T2 T1 T2: REDO from checkpoint T3: UNDO all T4: REDO T5: UNDO
  • 50.
    Keeps: UNDO list- initially contains all transactions listed in the checkpoint record REDO list – initially empty Searches forward through the log starting from the checkpoint record, to the end of the log • If it finds a start-transaction record it adds that transaction to the UNDO list • If it finds a commit record it moves that transaction from the UNDO list to the REDO list Then, works backwards through the log UNDOing transactions on the UNDO list (restores state) Finally, works forward again through the log REDOing transactions on the REDO list

Editor's Notes

  • #13 A Relational database management system (RDBMS) is a database management system (DBMS) that is based on a relational model. The model was introduced in the early 1970's by E F Codd with his series of pioneering papers. They system is by far the most popular database system used in organisations today. The first system sold as an RDBMS was Multics Relational Data Store, first sold in 1978. Others have been Berkeley Ingres QUEL and IBM BS12. Short for relational database management system and pronounced as separate letters, a type of database management system (DBMS) that stores data in the form of related tables. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways. An important feature of relational systems is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table. Almost all full-scale database systems are RDBMS's. Small database systems, however, use other designs that provide less flexibility in posing queries.
  • #14 Increases the sharing of data Avoids duplication of information Improved data integrity and security Better data accessibility Represents complex relationships among data Enforces integrity constraints More control of concurrency (simultaneous operations within the computer) Better backup and recovery procedures Provides multiple interfaces Faster development of new applications Multiple users can access which is not possible in DBMS Uses the advantages of economies of scale Performs all Data Manipulation Language operations Supports data independence Solves many problems created by other data models The ability to handle efficiently simple data types