18CSC303J
Database Management System
Unit- IV
SRM
Institute of Science and Technology
Relational Algebra
The relational algebra is a procedural query language.
•Consists of a set of operations that take one or two relations
as input and produce a new relation as their result.
•The fundamental operations in the relational algebra are
select, project, union, set difference, Cartesian product,
and rename.
•In addition to the fundamental operations, there are several
other operations—namely, set intersection, natural join,
and assignment.
Fundamental Operations
The select, project, and rename operations are called
unary operations, because they operate on one relation.
The other three operations operate on pairs of relations
and are, called binary operations.
The Select Operation
• The select operation selects tuples that satisfy a given
predicate.
• Lowercase Greek letter sigma (σ) is used to denote
selection.
• The predicate appears as a subscript to σ.
• The argument relation is in parentheses after the σ.
• To select those tuples of the instructor relation where
the instructor is in the “Physics” department, we write:
The instructor
relation
Result of σdept_name
=“Physics”
(instructor )
Find all instructors with salary greater than
$90,000.
ID Name Dept_Name Salary
12121 Wu Finance 90000
22222 Einstein Physics 95000
83821 Brandt Comp. Sci. 92000
The Select Operation
• In general, we allow comparisons using =, ≠, <, ≤, >, and
≥ in the selection predicate.
• We can combine several predicates into a larger
predicate by using the connectives and (∧), or (∨), and
not (¬).
• To find the instructors in Physics with a salary greater
than $90,000
The Select Operation
• The selection predicate may include comparisons
between two attributes.
• To illustrate, consider the relation department.
• To find all departments whose name is the same as their
building name,
The Project Operation
• Suppose we want to list all instructors’ ID, name, and
salary, but do not care about the dept name.
• The project operation allows us to produce this relation.
• The project operation is a unary operation that returns
its argument relation, with certain attributes left out.
• Since a relation is a set, any duplicate rows are
eliminated.
• Projection is denoted by the uppercase Greek letter pi
(Π).
The Project Operation
• We list those attributes that we wish to appear in the
result as a subscript to Π.
• The argument relation follows in parentheses.
• We write the query to produce such a list
Composition of Relational
Operations
• The fact that the result of a relational operation is itself a
relation is important.
• Consider the more complicated query “Find the name of
all instructors in the Physics department.”
• Instead of giving the name of a relation as the
argument of the projection operation,
– Give an expression that evaluates to a relation.
Composition of Relational
Operations
• In general, the result of a relational-algebra operation
is of the same type (relation) as its inputs,
– relational-algebra operations can be composed together
into a relational-algebra expression.
• Composing relational-algebra operations into
relational-algebra expressions
– composing arithmetic operations(such as+,−, ∗, and÷)
into arithmetic expressions.
The Union Operation
• Consider a query to find the set of all courses taught in
the Fall 2009 semester, the Spring 2010 semester, or
both.
• The information is contained in the section relation as
shown in figure.
The section
To find the set of all courses taught in the Fall
2009 semester
To find the set of all courses taught in the Spring
2010 semester
• To answer the query, we need the union of these two
sets; that is, we need all section IDs that appear in either
or both of the two relations.
• We find these data by the binary operation union,
denoted, as in set theory, by ∪.
• The expression needed is
The Union Operation
• There are 8 tuples in the result, even though there are 3
distinct courses offered in the Fall 2009 semester and 6
distinct courses offered in the Spring 2010 semester.
• Since relations are sets, duplicate values such as CS-101,
which is offered in both semesters, are replaced by a
single occurrence.
• A union operation r ∪ s to be valid, we require that two
conditions hold:
The Union Operation
The Set-Difference Operation
• denoted by −
• allows us to find tuples that are in one relation but are
not in another.
• The expression r − s produces a relation containing those
tuples in r but not in s.
find all the courses taught in the Fall 2009 semester but not in Spring
2010 semester
The Set-Difference Operation
• must ensure that set differences are taken between
compatible relations.
• Therefore, for a set-difference operation r − s to be valid,
we require that the relations r and s be of the same
arity, and that the domains of the ith
attribute of r and
the ith
attribute of s be the same, for all i.
The Cartesian-Product Operation
• Denoted by a cross (×),
• Allows us to combine information from any two
relations.
• We write the Cartesian product of relations r1 and r2 as
r1 × r2.
• Recall that a relation is by definition a subset of a
Cartesian product of a set of domains.
The Cartesian-Product Operation
• From that definition, we should already have an intuition
about the definition of the Cartesian-product operation.
• However, since the same attribute name may appear in
both r1 and r2, we need to devise a naming schema to
distinguish between these attributes.
• We do so here by attaching to an attribute the name of
the relation from which the attribute originally came.
• For example, the relation schema for r = instructor ×
teaches is
• With this schema, we can distinguish instructor.ID from
teaches.ID.
• For those attributes that appear in only one of the two
schemas, we shall usually drop the relation-name prefix.
• This simplification does not lead to any ambiguity.
• We can then write the relation schema for r as:
The Rename Operation
• Unlike relations in the database, the results of
relational-algebra expressions do not have a name that
we can use to refer to them.
• It is useful to be able to give them names; the rename
operator, denoted by the lowercase Greek letter rho (ρ),
• Given a relational-algebra expression E, the expression
• returns the result of expression E under the name x.
The Rename Operation
• A relation r by itself is considered a (trivial)
relational-algebra expression.
• Apply the rename operation to a relation r to get the
same relation under a new name.
• A second form of the rename operation is as follows:
Assume that a relational algebra expression E has arity n.
Then, the expression
• returns the result of expression E under the name x, and
with the attributes renamed to A1
, A2
, . . . , An
.
Formal Definition of the Relational
Algebra
• The operations allow us to give a complete definition of
an expression in the relational algebra.
• A basic expression in the relational algebra consists of
either one of the following:
– A relation in the database
– A constant relation
• A constant relation is written by listing its tuples within {
},
• for example { (22222, Einstein, Physics, 95000), (76543,
Singh, Finance, 80000) }.
Formal Definition of the Relational
Algebra
• A general expression in the relational algebra is
constructed out of smaller subexpressions.
• Let E1 and E2 be relational-algebra expressions.
• Then, the following are all relational-algebra expressions
The Set-Intersection Operation
• The first additional relational-algebra operation that we
shall define is set intersection (∩).
Find the set of all courses taught in both the Fall 2009 and the
Spring 2010 semesters.
Courses offered in both the Fall 2009 and Spring 2010
semesters
The Set-Intersection Operation
• Rewrite any relational-algebra expression that uses set
intersection
• by replacing the intersection operation with a pair of
set-difference operations
• Set intersection is not a fundamental operation and does
not add any power to the relational algebra.
• It is simply more convenient to write r ∩ s than to write r
− (r − s).
The Natural-Join Operation
• The natural join is a binary operation that allows us to
combine certain selections and a Cartesian product into
one operation.
• It is denoted by the join symbol
• The natural-join operation forms a Cartesian product of
its two arguments,
• Performs a selection forcing equality on those attributes
that appear in both relation schemas, and finally
removes duplicate attributes.
The Natural-Join Operation
The natural join of the instructor relation with the teaches
relation
The Natural-Join Operation
• The result relation, has only 13 tuples, the ones that give
information about an instructor and a course that that
instructor actually teaches.
• We do not repeat those attributes that appear in the
schemas of both relations; rather they appear only once.
• Notice also the order in which the attributes are listed
– first the attributes common to the schemas of both
relations,
– second those attributes unique to the schema of the first
relation, and
– finally, those attributes unique to the schema of the
second relation.
The Natural-Join Operation
• The definition of natural join is complicated, the
operation is easy to apply.
Find the names of all instructors together with the course id of all
courses they taught.
The Natural-Join Operation
• The schemas for instructor and teaches have the
attribute ID in common,
– the natural-join operation considers only pairs of tuples
that have the same value on ID.
• It combines each such pair of tuples into a single tuple
on the union of the two schemas; that is, (ID, name,
dept name, salary, course id).
• After performing the projection, we obtain the relation
in Figure.
The Natural-Join Operation
• Consider two relation schemas R and S—which are, of
course, lists of attribute names.
• If we consider the schemas to be sets, rather than lists,
– Denote those attribute names that appear in both R and
S by R ∩ S,
– Denote those attribute names that appear in R, in S, or in
both by R ∪ S.
• Similarly, those attribute names that appear in R but not
S are denoted by R − S, whereas S − R denotes those
attribute names that appear in S but not in R.
The Natural-Join Operation
Formal definition of the natural
join
• Consider two relations r (R) and s(S).
• The natural join of r and s, denoted by r s, is a relation
on schema R ∪ S formally defined as follows
• where R ∩ S = {A1
, A2
, . . . , An
}.
• Please note that if r (R) and s(S) are relations without
any attributes in common,
R ∩ S = ∅, then r s = r × s.
The Natural-Join Operation
Find the names of all instructors in the Comp. Sci. department together
with the course titles of all the courses that the instructors teach.”
The Natural-Join Operation
• we wrote instructor teaches course without inserting
parentheses to specify the order in which the
natural-join operations on the three relations should be
executed.
• The natural join is associative
• The theta join operation is a variant of the natural-join
operation that allows us to combine a selection and a
Cartesian product into a single operation.
The Assignment Operation
• It is convenient at times to write a relational-algebra
expression by assigning parts of it to temporary relation
variables.
• The assignment operation, denoted by ←, works like
assignment in a programming language.
• To illustrate this operation, consider the definition of the
natural-join operation.
• We could write r s as:
The Assignment Operation
• The evaluation of an assignment does not result in any relation
being displayed to the user.
• Rather, the result of the expression to the right of the ← is
assigned to the relation variable on the left of the←.
• This relation variable may be used in subsequent expressions.
• With the assignment operation, a query can be written as a
sequential program consisting of
– a series of assignments
– followed by an expression
– whose value is displayed as the result of the query.
The Assignment Operation
• For relational-algebra queries, assignment must always
be made to a temporary relation variable.
• Assignments to permanent relations constitute a
database modification.
• The assignment operation does not provide any
additional power to the algebra.
• It is, a convenient way to express complex queries.
Outer join Operations
• An extension of the join operation to deal with missing
information.
• Suppose that there is some instructor who teaches no
courses.
• The tuple in the “instructor relation” for that particular
instructor would not satisfy the condition of a natural
join with the “teaches relation”
• instructor’s data would not appear in the result of the
natural join, shown in Figure.
Outer join Operations
The teaches
relation
The instructor
relation
The natural join of the instructor relation with the teaches
Outer join Operations
• For example,
• instructors Califieri, Gold, and Singh do not appear in the
result of the natural join, since they do not teach any
course.
• More generally, some tuples in either or both of the
relations being joined may be “lost” in this way.
• The outer join operation works in a manner similar to
the natural join operation we have already studied, but
preserves those tuples that would be lost in an join by
creating tuples in the result containing null values.
Outer join Operations
• use the outer-join operation to avoid this loss of
information.
• actually three forms of the operation:
– left outer join, denoted
– right outer join, denoted ; and
– full outer join, denoted .
• All three forms of outer join compute the join, and add
extra tuples to the result of the join.
• For example, the results of the expression instructor
teaches and teaches instructor appear in Figures
Left Outer join Operations
• The left outer join takes all tuples in the left relation
– did not match with any tuple in the right relation,
– pads the tuples with null values for all other attributes
from the right relation, and
– adds them to the result of the natural join.
• In Figure, tuple (58583, Califieri, History, 62000, null,
null, null, null), is such a tuple.
• All information from the left relation is present in the
result of the left outer join.
Right Outer join Operations
• The right outer join is symmetric with the left outer join
• It pads tuples from the right relation that
– did not match any from the left relation with nulls and
– adds them to the result of the natural join.
• In Figure 6.18, tuple (58583, null, null, null, null, Califieri,
History, 62000), is such a tuple.
• Thus, all information from the right relation is present in
the result of the right outer join.
Full Outer join Operations
• The full outer join does both the left and right outer join
operations,
– padding tuples from the left relation that did not match
any from the right relation,
– as well as tuples from the right relation that did not
match any from the left relation, and
– adding them to the result of the join.
Extended Relational-Algebra
Operations
• Relational-algebra operations provide the ability to write
queries that cannot be expressed using the basic
relational-algebra operations.
• These operations are called extended relational-algebra
operations.
Generalized Projection
• Extends the projection operation by allowing operations
such as arithmetic and string functions to be used in the
projection list.
• The generalized-projection operation has the form
• where
– E is any relational-algebra expression,
– each of F1
, F2
, . . . , Fn
is an arithmetic expression
involving constants and attributes in the schema of E.
Generalized Projection
• Generalized projection also permits operations on other
data types, such as concatenation of strings.
Aggregation
• The second extended relational-algebra operation is the
aggregate operation ,
• Permits the use of aggregate functions such as min or
average, on sets of values.
• Aggregate functions take a collection of values and
return a single value as a result.
– Sum, Average, Count, Min, Max
The collections on which aggregate functions
operate can have multiple occurrences of a value ;
the order in which the values appear is not relevant.
Such collections are called multisets . Sets are a
special case of multisets where there is only one copy
of each element.
Find out the sum of salaries of all instructors
The result of the expression is a relation with a
single attribute, containing a single row with a
numerical value corresponding to the sum of the
Aggregation
• There are cases where we must eliminate multiple
occurrences of a value before computing an aggregate
function.
• If we do want to eliminate duplicates,
– use the same function names
– with the addition of the hyphenated string “distinct”
appended to the end of the function name
– for example, count-distinct
Find the total number of instructors who teach a course in the
Spring 2010 semester.
The aggregate function “count-distinct” ensures that
even if an instructor teaches ore than one course, she
is counted only once in the result.
To apply the aggregate function not to a single set of
tuples, but instead to a group of sets of tuples.
Find the average salary in each department
Tuples of the instructor relation,
grouped by the dept name attribute
The result relation for the query
“Find the average salary in each
The Tuple Relational Calculus
The Tuple Relational Calculus
• When we write a relational-algebra expression, we
provide a sequence of procedures that generates the
answer to our query.
• The tuple relational calculus is a nonprocedural query
language.
• It describes the desired information without giving a
specific procedure for obtaining that information.
• A query in the tuple relational calculus is expressed as
The Tuple Relational Calculus
• It is the set of all tuples “t” such that predicate “P” is
true for “t”.
• Following our earlier notation, we use t[A] to denote the
value of tuple “t” on attribute A, and we use t ∈ r to
denote that tuple t is in relation r.
Find the ID, name, dept name, salary for instructors whose salary is
greater than $80,000
Pitfalls in Relational Database
Design
• Relational database design requires that we find a
“good” collection of relation schemas.
• A bad design may lead to
– Repetition of information.
– Inability to represent certain information.
• Design Goals:
– Avoid redundant data
– Ensure that relationships among attributes are
represented
– Facilitate the checking of updates for violation of
database integrity constraints
Example
• Consider the relation schema:
• Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
Example
• Redundancy:
– Data for branch-name, branch-city, assets are repeated
for each loan that a branch makes
– Wastes space and complicates updating
• Null values
– Cannot store information about a branch if no loans exist
– Can use null values, but they are difficult to handle
Decomposition
• Decompose the relation schema Lending-schema into:
– Branch-schema = (branch-name, branch-city, assets)
– Loan-info-schema = (customer-name, loan-number,
branch-name, amount)
• All attributes of an original schema (R) must appear in
the decomposition (R1, R2):
R = R1 U R2
• Lossless-join decomposition. For all possible relations r
on schema R
r = ΠR1 (r) R2 (r)
Goal
• Decide whether a particular relation R is in “good” form.
• In the case that a relation R is not in “good” form,
– decompose it into a set of relations {R1
, R2
, ..., Rn
} such
that each relation is in good form
• The decomposition is a lossless-join decomposition
• Our theory is based on:
• functional dependencies
• multivalued dependencies
Decomposition
▪ The only way to avoid the repetition-of-information problem in the in_dep schema is
to decompose it into two schemas – instructor and department schemas.
▪ Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary)
into
employee1 (ID, name)
employee2 (name, street, city, salary)
The problem arises when we have two employees with the same name
▪ The next slide shows how we lose information -- we cannot reconstruct the original
employee relation -- and so, this is a lossy decomposition
@ V.V.R 67
A Lossy Decomposition
@ V.V.R 68
Lossless Decomposition
▪ Let R be a relation schema and let R1
and R2
form a decomposition of R . That is R
= R1
U R2
▪ We say that the decomposition is a lossless decomposition if there is no loss of
information by replacing R with the two relation schemas R1
U R2
▪ Formally,
∏ R1
(r) ∏ R2
(r) = r
▪ And, conversely a decomposition is lossy if
r ⊂ ∏ R1
(r) ∏ R2
(r) = r
@ V.V.R 69
Examples of Lossless
Decomposition
▪ Decomposition of R = (A, B, C)
R1
= (A, B) R2
= (B, C)
@ V.V.R 70
Normalization theory
▪ Decide whether a particular relation R is in “good” form.
▪ In the case that a relation R is not in “good” form, decompose it into set of relations {R1
, R2
, ..., Rn
}
such that
• Each relation is in good form
• The decomposition is a lossless decomposition
▪ Our theory is based on:
• Functional dependencies
• Multivalued dependencies
@ V.V.R 71
Normalization
The process of normalization is a formal method that identifies relational schemas based upon their primary
or candidate keys and the functional dependencies that exists amongst their attributes.
Normalization is primarily a tool to validate and improve a logical design so that it satisfies certain
constraints that avoid unnecessary duplication of data.
Normalization is the process of decomposing relation with anomalies to produce smaller, well-structured
relations.
Normalisation should remove redundancy, but not at the expense of data integrity.
@ V.V.R 72
Transforming data from a problem into relations while ensuring data integrity and eliminating data
redundancy.
■ Data integrity : consistent and satisfies
data constraint rules
■ Data redundancy: if data can be found in
two places in a single database (direct
redundancy) or calculated using data
from different parts of the database
(indirect redundancy) then redundancy
exists.
Normalisation should remove redundancy, but not at the expense of data integrity.
@ V.V.R 73
First Normal form
▪ Domain is atomic if its elements are considered to be indivisible units
• Examples of non-atomic domains:
▪ Set of names, composite attributes
▪ Identification numbers like CS101 that can be broken up into parts
▪ A relational schema R is in first normal form if the domains of all attributes of R are atomic
▪ Non-atomic values complicate storage and encourage redundant (repeated) storage of data
• Example: Set of accounts stored with each customer, and set of owners stored with each account
@ V.V.R 74
First normal form (1NF) deals with
the `shape' of the record.
A relation is in 1NF if, and only if, it
contains no repeating attributes or
groups of attributes.
Example:
■ The Student table with the repeating group is not in 1NF
■ It has repeating groups, it is an `unnormalised table'.
To remove the repeating group,
either:
■ flatten the table and extend the key, or
■ decompose the relation- leading to First Normal Form
@ V.V.R 75
Flatten Table and Extend Primary
Key
The Student table with the repeating group can be written as:
Student(matric_no, name, date_of_birth, ( subject, grade ) )
If the repeating group was flattened, it would look something like:
Student(matric_no, name, date_of_birth, subject, grade )
This does not have repeating groups, but has redundancy. For every matric_no/subject
combination, the student name and date of birth is replicated. This can lead to errors.
Redundant data is the main cause of insertion, deletion, and updating anomalies.
■ Insertion anomaly – Subject is now in the primary key, we cannot add a student until
they have at least one subject. Remember, no part of a primary key can be NULL.
■ Update anomaly – changing the name of a student means finding all rows of the
database where that student exists and changing each one separately.
■ Deletion anomaly- for example deleting all database subject information also deletes
student 960145.
@ V.V.R 76
Decomposing Relation
The alternative approach is to split the table
into two parts, one for the repeating groups
and one of the non-repeating groups.
The primary key for the original relation is
included in both of the new relations
Record:
Student
@ V.V.R 77
We now have two relations, Student and Record.
■ Student contains the original non-repeating groups
■ Record has the original repeating groups and the matric_no
Student(matric_no, name, date_of_birth )
Record(matric_no, subject, grade )
This version of the relations does not have insertion, deletion, or update anomalies.
Without repeating groups, we say the relations are in First Normal Form (1NF).
@ V.V.R 78
Second Normal Form
A relation is in 2NF if, and only if, it is in 1NF and every
non-key attribute is fully functionally dependent on the
whole key.
Thus the relation is in 1NF with no repeating groups, and all
non-key attributes must depend on the whole key, not just
some part of it. Another way of saying this is that there must
be no partial key dependencies (PKDs).
The problems arise when there is a compound key, e.g. the
key to the Record relation - matric_no, subject. In this case it
is possible for non-key attributes to depend on only part of
the key - i.e. on only one of the two key attributes. This is
what 2NF tries to prevent.
@ V.V.R 79
Consider again the Student relation from the flattened table:
Student(matric_no, name, date_of_birth, subject, grade )
There are no repeating groups, so the relation is in 1NF
However, we have a compound primary key - so we must check all
of the non-key attributes against each part of the key to ensure they
are functionally dependent on it.
■ matric_no determines name and date_of_birth, but not grade.
■ subject together with matric_no determines grade, but not name or date_of_birth.
So there is a problem with potential redundancies
@ V.V.R 80
Dependency Diagram
A dependency diagram is used to show how
non-key attributes relate to each part or
combination of parts in the primary key.
matric_no grade
subject
date_of_bith
name
Student
Fully Dependent
PKD
This relation is not in 2NF
– It appears to be two tables squashed into
one.
– the solutions is to split the relation into
component parts.
separate out all the attributes that are solely dependent on matric_no - put them in a
new Student_details relation, with matric_no as the primary key
separate out all the attributes that are solely dependent on subject - in this case no
attributes are solely dependent on subject.
separate out all the attributes that are solely dependent on matric_no + subject - put
them into a separate Student relation, keyed on matric_no + subject
Student Details
matrix_no name date_of_birth
Student
matrix_no subject grade
All attributes in each relation are fully
functionally dependent upon its primary key
These relations are now in 2NF
Third Normal Form
3NF is an even stricter normal form and removes virtually all the redundant data
:
A relation is in 3NF if, and only if, it is in 2NF and there are no transitive
functional dependencies
Transitive functional dependencies arise:
– when one non-key attribute is
functionally dependent on another
non-key attribute:
• FD: non-key attribute -> non-key attribute
– and when there is redundancy in the
database
By definition transitive functional dependency can only occur if there is more
than one non-key field, so we can say that a relation in 2NF with zero or one
non-key field must automatically be in 3NF
project_no manager address
Project has more than one
non-key field so we must
check for transitive
dependency:
p1 Black,B 32 High Street
p2 Smith,J 11 New Street
p3 Black,B 32 High Street
p4 Black,B 32 High Street
Address depends on the value of manager.
From the table we can propose:
Project(project_no, manager, address)
manager -> address
In this case address is transitively dependent on manager. The primary key is
project_no, but the LHS and RHS have no reference to this key, yet both sides are
present in the relation.
Data redundancy arises from this
– we duplicate address if a manager is in
charge of more than one project
– causes problems if we had to change the
address- have to change several entries,
and this could lead to errors.
Eliminate transitive functional dependency by splitting the table
– create two relations - one with the
transitive dependency in it, and another
for all of the remaining attributes.
– split Project into Project and Manager.
the determinant attribute becomes the primary key in the new relation -
manager becomes the primary key to the Manager relation
the original key is the primary key to the remaining non-transitive attributes - in
this case, project_no remains the key to the new Projects table.
Now we need to store the address only once
If we need to know a manager's address we can look it up in the
Manager relation
The manager attribute is the link between the two tables, and in the
Projects table it is now a foreign key.
These relations are now in third normal form.
Project project_no manager
p1 Black,B
p2 Smith,J
p3 Black,B
p4 Black,B
Manager manager address
Black,B 32 High Street
Smith,J 11 New Street
BOYCE CODD NORMAL FORM
• Many practitioners argue that placing entities
in 3NF is generally sufficient because it is rare
that entities that are in 3NF are not also in
4NF and 5NF
– The advanced forms of normalization are:
• Boyce-Codd Normal Form
• Fourth Normal Form
• Fifth Normal Form
BOYCE CODD NORMAL FORM
• Boyce-Codd normal form (BCNF) is a more
rigorous version of 3NF.
• BCNF is based on the concept of
determinants. A determinant column is one
on which some of the columns are fully
functionally dependent.
• A relational table is in BCNF if and only if every
determinant is a candidate key.
Example 1 - Address (Not in BCNF)
Scheme {City, Street, ZipCode }
1. Key1 {City, Street }
2. Key2 {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street} {ZipCode}
5. {ZipCode} {City}
6. Dependency between attributes belonging to a key
Boyce-Codd Normal Form (BCNF)
Examples
Example 2 - Movie (Not in BCNF)
Scheme {MovieTitle, MovieID, PersonName, Role, Payment }
1. Key1 {MovieTitle, PersonName}
2. Key2 {MovieID, PersonName}
3. Both role and payment functionally depend on both candidate keys thus 3NF
4. {MovieID} {MovieTitle}
5. Dependency between MovieID & MovieTitle Violates BCNF
Example 3 - Consulting (Not in BCNF)
Scheme {Client, Problem, Consultant}
6. Key1 {Client, Problem}
7. Key2 {Client, Consultant}
8. No non-key attribute hence 3NF
9. {Client, Problem} {Consultant}
10. {Client, Consultant} {Problem}
11. Dependency between attributess belonging to keys violates BCNF
Boyce Codd Normal Form (BCNF)
1. Place the two candidate primary keys in separate entities
2. Place each of the remaining data items in one of the
resulting entities according to its dependency on the
primary key.
Example 1 (Convert to BCNF)
Old Scheme {City, Street, ZipCode }
New Scheme1 {ZipCode, Street}
New Scheme2 {City, Street}
• Loss of relation {ZipCode} {City}
Alternate New Scheme1 {ZipCode, Street }
Alternate New Scheme2 {ZipCode, City}
BCNF - Decomposition
1. If decomposition does not cause any loss of information it is called a
lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost it is
called a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a
collection of smaller schemas that are in BCNF form. However the
dependency preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3rd
normal form
that also preserves the dependencies.
• 3NF may be better than BCNF in some cases
Decomposition – Loss of
Information
Use your own judgment when decomposing schemas
Example 2 (Convert to BCNF)
Old Scheme {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieTitle, PersonName}
• Loss of relation {MovieID} {MovieTitle}
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieID, MovieTitle}
• We got the {MovieID} {MovieTitle} relationship back
Example 3 (Convert to BCNF)
Old Scheme {Client, Problem, Consultant}
New Scheme {Client, Consultant}
New Scheme {Client, Problem}
BCNF - Decomposition
• Fourth normal form eliminates independent many-to-one relationships
between columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-valued attribute.
Example (Not in 4NF)
Scheme {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence BCNF
2. Many Movies can have the same Genre
3. Many Cities can have the same movie
4. Violates 4NF
Fourth Normal Form (4NF)
Movie ScreeningCity Genre
Hard Code Los Angles Comedy
Hard Code New York Comedy
Bill Durham Santa Cruz Drama
Bill Durham Durham Drama
The Code Warrier New York Horror
Example 2 (Not in 4NF)
Scheme {Manager, Child, Employee}
1. Primary Key {Manager, Child, Employee}
2. Each manager can have more than one child
3. Each manager can supervise more than one employee
4. 4NF Violated
Example 3 (Not in 4NF)
Scheme {Employee, Skill, ForeignLanguage}
5. Primary Key {Employee, Skill, Language }
6. Each employee can speak multiple languages
7. Each employee can have multiple skills
8. Thus violates 4NF
Fourth Normal Form (4NF)
Manager Child Employee
Jim Beth Alice
Mary Bob Jane
Mary NULL Adam
Employee Skill Language
1234 Cooking French
1234 Cooking German
1453 Carpentry Spanish
1453 Cooking Spanish
2345 Cooking Spanish
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.
Example 1 (Convert to 3NF)
Old Scheme {MovieName, ScreeningCity, Genre}
New Scheme {MovieName, ScreeningCity}
New Scheme {MovieName, Genre}
4NF - Decomposition
Movie Genre
Hard Code Comedy
Bill Durham Drama
The Code Warrier Horror
Movie ScreeningCity
Hard Code Los Angles
Hard Code New York
Bill Durham Santa Cruz
Bill Durham Durham
The Code Warrier New York
Example 2 (Convert to 4NF)
Old Scheme {Manager, Child, Employee}
New Scheme {Manager, Child}
New Scheme {Manager, Employee}
Example 3 (Convert to 4NF)
Old Scheme {Employee, Skill, ForeignLanguage}
New Scheme {Employee, Skill}
New Scheme {Employee, ForeignLanguage}
4NF - Decomposition
Manager Child
Jim Beth
Mary Bob
Manager Employee
Jim Alice
Mary Jane
Mary Adam
Employee Language
1234 French
1234 German
1453 Spanish
2345 Spanish
Employee Skill
1234 Cooking
1453 Carpentry
1453 Cooking
2345 Cooking
Multivalued Dependencies and
Fourth Normal Form
Definition:
• A multivalued dependency (MVD) X —>> Y specified on relation schema
R, where X and Y are both subsets of R, specifies the following constraint
on any relation state r of R: If two tuples t1
and t2
exist in r such that t1
[X]
= t2
[X], then two tuples t3
and t4
should also exist in r with the following
properties, where we use Z to denote (R 2 (X υ Y)):
– t3
[X] = t4
[X] = t1
[X] = t2
[X].
– t3
[Y] = t1
[Y] and t4
[Y] = t2
[Y].
– t3
[Z] = t2
[Z] and t4
[Z] = t1
[Z].
• An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset of X, or (b)
X υ Y = R.
Multivalued Dependencies and
Fourth Normal Form
Definition:
A relation schema R is in 4NF with respect to a set of
dependencies F (that includes functional dependencies and
multivalued dependencies) if, for every nontrivial
multivalued dependency X —>> Y in F+
, X is a superkey for R.
– Note: F+
is the (complete) set of all dependencies (functional
or multivalued) that will hold in every relation state r of R that
satisfies F. It is also called the closure of F.
Join Dependency
What is join dependency?
• If a table can be recreated by joining multiple tables and each of this table have a
subset of the attributes of the table, then the table is in Join Dependency.
• It is a generalization of Multivalued Dependency
• If the join of R1 and R2 over C is equal to relation R, then we can say that a join
dependency (JD) exists.
• Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given
relations R (A, B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R.
• A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a
lossless-join decomposition.
• The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the
relation R.
• Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of
R.
Example: <employee>
Empname EmpSkills EmpJob (Assigned Work)
Tom Networking EJ001
Harry Web Development EJ002
Katie Programming EJ002
Emp Name EmpSkills
Tom Networking
Harry Web Development
Katie Programming
The table can be decomposed into 3 tables
<EmployeeSkills>
EmpName EmpJob
Tom EJ001
Harry EJ002
Katie EJ002
<EmployeeJob>
<JobSkills>
EmpSkills EmpJob
Networking EJ001
Web Development EJ002
Programming EJ002
Join dependency
{(EmpName, EmpSkills ), ( EmpName, EmpJob),
(EmpSkills, EmpJob)}
Fifth Normal Form-5NF
• A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
• Example:
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
5NF- Continued
• In the above table, John takes both Computer and Math class for Semester
1 but he doesn't take Math class for Semester 2. In this case, combination
of all these fields required to identify a valid data.
• Suppose we add a new Semester as Semester 3 but do not know about
the subject and who will be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together acts as a primary key, so
we can't leave other two columns blank.
• So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3.
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
5NF-Continued
P1
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P2
P3
SEMESTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen

Unit 4 DATABASE MANAGEMENT SYSTEM SRMIST

  • 1.
    18CSC303J Database Management System Unit-IV SRM Institute of Science and Technology
  • 2.
  • 3.
    The relational algebrais a procedural query language. •Consists of a set of operations that take one or two relations as input and produce a new relation as their result. •The fundamental operations in the relational algebra are select, project, union, set difference, Cartesian product, and rename. •In addition to the fundamental operations, there are several other operations—namely, set intersection, natural join, and assignment.
  • 4.
    Fundamental Operations The select,project, and rename operations are called unary operations, because they operate on one relation. The other three operations operate on pairs of relations and are, called binary operations.
  • 5.
    The Select Operation •The select operation selects tuples that satisfy a given predicate. • Lowercase Greek letter sigma (σ) is used to denote selection. • The predicate appears as a subscript to σ. • The argument relation is in parentheses after the σ. • To select those tuples of the instructor relation where the instructor is in the “Physics” department, we write:
  • 6.
    The instructor relation Result ofσdept_name =“Physics” (instructor )
  • 7.
    Find all instructorswith salary greater than $90,000. ID Name Dept_Name Salary 12121 Wu Finance 90000 22222 Einstein Physics 95000 83821 Brandt Comp. Sci. 92000
  • 8.
    The Select Operation •In general, we allow comparisons using =, ≠, <, ≤, >, and ≥ in the selection predicate. • We can combine several predicates into a larger predicate by using the connectives and (∧), or (∨), and not (¬). • To find the instructors in Physics with a salary greater than $90,000
  • 9.
    The Select Operation •The selection predicate may include comparisons between two attributes. • To illustrate, consider the relation department. • To find all departments whose name is the same as their building name,
  • 10.
    The Project Operation •Suppose we want to list all instructors’ ID, name, and salary, but do not care about the dept name. • The project operation allows us to produce this relation. • The project operation is a unary operation that returns its argument relation, with certain attributes left out. • Since a relation is a set, any duplicate rows are eliminated. • Projection is denoted by the uppercase Greek letter pi (Π).
  • 11.
    The Project Operation •We list those attributes that we wish to appear in the result as a subscript to Π. • The argument relation follows in parentheses. • We write the query to produce such a list
  • 12.
    Composition of Relational Operations •The fact that the result of a relational operation is itself a relation is important. • Consider the more complicated query “Find the name of all instructors in the Physics department.” • Instead of giving the name of a relation as the argument of the projection operation, – Give an expression that evaluates to a relation.
  • 13.
    Composition of Relational Operations •In general, the result of a relational-algebra operation is of the same type (relation) as its inputs, – relational-algebra operations can be composed together into a relational-algebra expression. • Composing relational-algebra operations into relational-algebra expressions – composing arithmetic operations(such as+,−, ∗, and÷) into arithmetic expressions.
  • 14.
    The Union Operation •Consider a query to find the set of all courses taught in the Fall 2009 semester, the Spring 2010 semester, or both. • The information is contained in the section relation as shown in figure. The section
  • 15.
    To find theset of all courses taught in the Fall 2009 semester To find the set of all courses taught in the Spring 2010 semester
  • 16.
    • To answerthe query, we need the union of these two sets; that is, we need all section IDs that appear in either or both of the two relations. • We find these data by the binary operation union, denoted, as in set theory, by ∪. • The expression needed is The Union Operation
  • 17.
    • There are8 tuples in the result, even though there are 3 distinct courses offered in the Fall 2009 semester and 6 distinct courses offered in the Spring 2010 semester. • Since relations are sets, duplicate values such as CS-101, which is offered in both semesters, are replaced by a single occurrence. • A union operation r ∪ s to be valid, we require that two conditions hold: The Union Operation
  • 18.
    The Set-Difference Operation •denoted by − • allows us to find tuples that are in one relation but are not in another. • The expression r − s produces a relation containing those tuples in r but not in s. find all the courses taught in the Fall 2009 semester but not in Spring 2010 semester
  • 19.
    The Set-Difference Operation •must ensure that set differences are taken between compatible relations. • Therefore, for a set-difference operation r − s to be valid, we require that the relations r and s be of the same arity, and that the domains of the ith attribute of r and the ith attribute of s be the same, for all i.
  • 20.
    The Cartesian-Product Operation •Denoted by a cross (×), • Allows us to combine information from any two relations. • We write the Cartesian product of relations r1 and r2 as r1 × r2. • Recall that a relation is by definition a subset of a Cartesian product of a set of domains.
  • 21.
    The Cartesian-Product Operation •From that definition, we should already have an intuition about the definition of the Cartesian-product operation. • However, since the same attribute name may appear in both r1 and r2, we need to devise a naming schema to distinguish between these attributes. • We do so here by attaching to an attribute the name of the relation from which the attribute originally came. • For example, the relation schema for r = instructor × teaches is
  • 22.
    • With thisschema, we can distinguish instructor.ID from teaches.ID. • For those attributes that appear in only one of the two schemas, we shall usually drop the relation-name prefix. • This simplification does not lead to any ambiguity. • We can then write the relation schema for r as:
  • 23.
    The Rename Operation •Unlike relations in the database, the results of relational-algebra expressions do not have a name that we can use to refer to them. • It is useful to be able to give them names; the rename operator, denoted by the lowercase Greek letter rho (ρ), • Given a relational-algebra expression E, the expression • returns the result of expression E under the name x.
  • 24.
    The Rename Operation •A relation r by itself is considered a (trivial) relational-algebra expression. • Apply the rename operation to a relation r to get the same relation under a new name. • A second form of the rename operation is as follows: Assume that a relational algebra expression E has arity n. Then, the expression • returns the result of expression E under the name x, and with the attributes renamed to A1 , A2 , . . . , An .
  • 25.
    Formal Definition ofthe Relational Algebra • The operations allow us to give a complete definition of an expression in the relational algebra. • A basic expression in the relational algebra consists of either one of the following: – A relation in the database – A constant relation • A constant relation is written by listing its tuples within { }, • for example { (22222, Einstein, Physics, 95000), (76543, Singh, Finance, 80000) }.
  • 26.
    Formal Definition ofthe Relational Algebra • A general expression in the relational algebra is constructed out of smaller subexpressions. • Let E1 and E2 be relational-algebra expressions. • Then, the following are all relational-algebra expressions
  • 27.
    The Set-Intersection Operation •The first additional relational-algebra operation that we shall define is set intersection (∩). Find the set of all courses taught in both the Fall 2009 and the Spring 2010 semesters. Courses offered in both the Fall 2009 and Spring 2010 semesters
  • 28.
    The Set-Intersection Operation •Rewrite any relational-algebra expression that uses set intersection • by replacing the intersection operation with a pair of set-difference operations • Set intersection is not a fundamental operation and does not add any power to the relational algebra. • It is simply more convenient to write r ∩ s than to write r − (r − s).
  • 29.
    The Natural-Join Operation •The natural join is a binary operation that allows us to combine certain selections and a Cartesian product into one operation. • It is denoted by the join symbol • The natural-join operation forms a Cartesian product of its two arguments, • Performs a selection forcing equality on those attributes that appear in both relation schemas, and finally removes duplicate attributes.
  • 30.
    The Natural-Join Operation Thenatural join of the instructor relation with the teaches relation
  • 31.
    The Natural-Join Operation •The result relation, has only 13 tuples, the ones that give information about an instructor and a course that that instructor actually teaches. • We do not repeat those attributes that appear in the schemas of both relations; rather they appear only once. • Notice also the order in which the attributes are listed – first the attributes common to the schemas of both relations, – second those attributes unique to the schema of the first relation, and – finally, those attributes unique to the schema of the second relation.
  • 32.
    The Natural-Join Operation •The definition of natural join is complicated, the operation is easy to apply. Find the names of all instructors together with the course id of all courses they taught.
  • 33.
    The Natural-Join Operation •The schemas for instructor and teaches have the attribute ID in common, – the natural-join operation considers only pairs of tuples that have the same value on ID. • It combines each such pair of tuples into a single tuple on the union of the two schemas; that is, (ID, name, dept name, salary, course id). • After performing the projection, we obtain the relation in Figure.
  • 34.
    The Natural-Join Operation •Consider two relation schemas R and S—which are, of course, lists of attribute names. • If we consider the schemas to be sets, rather than lists, – Denote those attribute names that appear in both R and S by R ∩ S, – Denote those attribute names that appear in R, in S, or in both by R ∪ S. • Similarly, those attribute names that appear in R but not S are denoted by R − S, whereas S − R denotes those attribute names that appear in S but not in R.
  • 35.
  • 36.
    Formal definition ofthe natural join • Consider two relations r (R) and s(S). • The natural join of r and s, denoted by r s, is a relation on schema R ∪ S formally defined as follows • where R ∩ S = {A1 , A2 , . . . , An }. • Please note that if r (R) and s(S) are relations without any attributes in common, R ∩ S = ∅, then r s = r × s.
  • 37.
    The Natural-Join Operation Findthe names of all instructors in the Comp. Sci. department together with the course titles of all the courses that the instructors teach.”
  • 38.
    The Natural-Join Operation •we wrote instructor teaches course without inserting parentheses to specify the order in which the natural-join operations on the three relations should be executed. • The natural join is associative • The theta join operation is a variant of the natural-join operation that allows us to combine a selection and a Cartesian product into a single operation.
  • 39.
    The Assignment Operation •It is convenient at times to write a relational-algebra expression by assigning parts of it to temporary relation variables. • The assignment operation, denoted by ←, works like assignment in a programming language. • To illustrate this operation, consider the definition of the natural-join operation. • We could write r s as:
  • 40.
    The Assignment Operation •The evaluation of an assignment does not result in any relation being displayed to the user. • Rather, the result of the expression to the right of the ← is assigned to the relation variable on the left of the←. • This relation variable may be used in subsequent expressions. • With the assignment operation, a query can be written as a sequential program consisting of – a series of assignments – followed by an expression – whose value is displayed as the result of the query.
  • 41.
    The Assignment Operation •For relational-algebra queries, assignment must always be made to a temporary relation variable. • Assignments to permanent relations constitute a database modification. • The assignment operation does not provide any additional power to the algebra. • It is, a convenient way to express complex queries.
  • 42.
    Outer join Operations •An extension of the join operation to deal with missing information. • Suppose that there is some instructor who teaches no courses. • The tuple in the “instructor relation” for that particular instructor would not satisfy the condition of a natural join with the “teaches relation” • instructor’s data would not appear in the result of the natural join, shown in Figure.
  • 43.
    Outer join Operations Theteaches relation The instructor relation The natural join of the instructor relation with the teaches
  • 44.
    Outer join Operations •For example, • instructors Califieri, Gold, and Singh do not appear in the result of the natural join, since they do not teach any course. • More generally, some tuples in either or both of the relations being joined may be “lost” in this way. • The outer join operation works in a manner similar to the natural join operation we have already studied, but preserves those tuples that would be lost in an join by creating tuples in the result containing null values.
  • 45.
    Outer join Operations •use the outer-join operation to avoid this loss of information. • actually three forms of the operation: – left outer join, denoted – right outer join, denoted ; and – full outer join, denoted . • All three forms of outer join compute the join, and add extra tuples to the result of the join. • For example, the results of the expression instructor teaches and teaches instructor appear in Figures
  • 47.
    Left Outer joinOperations • The left outer join takes all tuples in the left relation – did not match with any tuple in the right relation, – pads the tuples with null values for all other attributes from the right relation, and – adds them to the result of the natural join. • In Figure, tuple (58583, Califieri, History, 62000, null, null, null, null), is such a tuple. • All information from the left relation is present in the result of the left outer join.
  • 48.
    Right Outer joinOperations • The right outer join is symmetric with the left outer join • It pads tuples from the right relation that – did not match any from the left relation with nulls and – adds them to the result of the natural join. • In Figure 6.18, tuple (58583, null, null, null, null, Califieri, History, 62000), is such a tuple. • Thus, all information from the right relation is present in the result of the right outer join.
  • 49.
    Full Outer joinOperations • The full outer join does both the left and right outer join operations, – padding tuples from the left relation that did not match any from the right relation, – as well as tuples from the right relation that did not match any from the left relation, and – adding them to the result of the join.
  • 50.
    Extended Relational-Algebra Operations • Relational-algebraoperations provide the ability to write queries that cannot be expressed using the basic relational-algebra operations. • These operations are called extended relational-algebra operations.
  • 51.
    Generalized Projection • Extendsthe projection operation by allowing operations such as arithmetic and string functions to be used in the projection list. • The generalized-projection operation has the form • where – E is any relational-algebra expression, – each of F1 , F2 , . . . , Fn is an arithmetic expression involving constants and attributes in the schema of E.
  • 52.
    Generalized Projection • Generalizedprojection also permits operations on other data types, such as concatenation of strings.
  • 53.
    Aggregation • The secondextended relational-algebra operation is the aggregate operation , • Permits the use of aggregate functions such as min or average, on sets of values. • Aggregate functions take a collection of values and return a single value as a result. – Sum, Average, Count, Min, Max
  • 54.
    The collections onwhich aggregate functions operate can have multiple occurrences of a value ; the order in which the values appear is not relevant. Such collections are called multisets . Sets are a special case of multisets where there is only one copy of each element. Find out the sum of salaries of all instructors The result of the expression is a relation with a single attribute, containing a single row with a numerical value corresponding to the sum of the
  • 55.
    Aggregation • There arecases where we must eliminate multiple occurrences of a value before computing an aggregate function. • If we do want to eliminate duplicates, – use the same function names – with the addition of the hyphenated string “distinct” appended to the end of the function name – for example, count-distinct
  • 56.
    Find the totalnumber of instructors who teach a course in the Spring 2010 semester. The aggregate function “count-distinct” ensures that even if an instructor teaches ore than one course, she is counted only once in the result.
  • 57.
    To apply theaggregate function not to a single set of tuples, but instead to a group of sets of tuples. Find the average salary in each department Tuples of the instructor relation, grouped by the dept name attribute The result relation for the query “Find the average salary in each
  • 58.
  • 59.
    The Tuple RelationalCalculus • When we write a relational-algebra expression, we provide a sequence of procedures that generates the answer to our query. • The tuple relational calculus is a nonprocedural query language. • It describes the desired information without giving a specific procedure for obtaining that information. • A query in the tuple relational calculus is expressed as
  • 60.
    The Tuple RelationalCalculus • It is the set of all tuples “t” such that predicate “P” is true for “t”. • Following our earlier notation, we use t[A] to denote the value of tuple “t” on attribute A, and we use t ∈ r to denote that tuple t is in relation r. Find the ID, name, dept name, salary for instructors whose salary is greater than $80,000
  • 61.
    Pitfalls in RelationalDatabase Design
  • 62.
    • Relational databasedesign requires that we find a “good” collection of relation schemas. • A bad design may lead to – Repetition of information. – Inability to represent certain information. • Design Goals: – Avoid redundant data – Ensure that relationships among attributes are represented – Facilitate the checking of updates for violation of database integrity constraints
  • 63.
    Example • Consider therelation schema: • Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)
  • 64.
    Example • Redundancy: – Datafor branch-name, branch-city, assets are repeated for each loan that a branch makes – Wastes space and complicates updating • Null values – Cannot store information about a branch if no loans exist – Can use null values, but they are difficult to handle
  • 65.
    Decomposition • Decompose therelation schema Lending-schema into: – Branch-schema = (branch-name, branch-city, assets) – Loan-info-schema = (customer-name, loan-number, branch-name, amount) • All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1 U R2 • Lossless-join decomposition. For all possible relations r on schema R r = ΠR1 (r) R2 (r)
  • 66.
    Goal • Decide whethera particular relation R is in “good” form. • In the case that a relation R is not in “good” form, – decompose it into a set of relations {R1 , R2 , ..., Rn } such that each relation is in good form • The decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies
  • 67.
    Decomposition ▪ The onlyway to avoid the repetition-of-information problem in the in_dep schema is to decompose it into two schemas – instructor and department schemas. ▪ Not all decompositions are good. Suppose we decompose employee(ID, name, street, city, salary) into employee1 (ID, name) employee2 (name, street, city, salary) The problem arises when we have two employees with the same name ▪ The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition @ V.V.R 67
  • 68.
  • 69.
    Lossless Decomposition ▪ LetR be a relation schema and let R1 and R2 form a decomposition of R . That is R = R1 U R2 ▪ We say that the decomposition is a lossless decomposition if there is no loss of information by replacing R with the two relation schemas R1 U R2 ▪ Formally, ∏ R1 (r) ∏ R2 (r) = r ▪ And, conversely a decomposition is lossy if r ⊂ ∏ R1 (r) ∏ R2 (r) = r @ V.V.R 69
  • 70.
    Examples of Lossless Decomposition ▪Decomposition of R = (A, B, C) R1 = (A, B) R2 = (B, C) @ V.V.R 70
  • 71.
    Normalization theory ▪ Decidewhether a particular relation R is in “good” form. ▪ In the case that a relation R is not in “good” form, decompose it into set of relations {R1 , R2 , ..., Rn } such that • Each relation is in good form • The decomposition is a lossless decomposition ▪ Our theory is based on: • Functional dependencies • Multivalued dependencies @ V.V.R 71
  • 72.
    Normalization The process ofnormalization is a formal method that identifies relational schemas based upon their primary or candidate keys and the functional dependencies that exists amongst their attributes. Normalization is primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. Normalization is the process of decomposing relation with anomalies to produce smaller, well-structured relations. Normalisation should remove redundancy, but not at the expense of data integrity. @ V.V.R 72
  • 73.
    Transforming data froma problem into relations while ensuring data integrity and eliminating data redundancy. ■ Data integrity : consistent and satisfies data constraint rules ■ Data redundancy: if data can be found in two places in a single database (direct redundancy) or calculated using data from different parts of the database (indirect redundancy) then redundancy exists. Normalisation should remove redundancy, but not at the expense of data integrity. @ V.V.R 73
  • 74.
    First Normal form ▪Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: ▪ Set of names, composite attributes ▪ Identification numbers like CS101 that can be broken up into parts ▪ A relational schema R is in first normal form if the domains of all attributes of R are atomic ▪ Non-atomic values complicate storage and encourage redundant (repeated) storage of data • Example: Set of accounts stored with each customer, and set of owners stored with each account @ V.V.R 74
  • 75.
    First normal form(1NF) deals with the `shape' of the record. A relation is in 1NF if, and only if, it contains no repeating attributes or groups of attributes. Example: ■ The Student table with the repeating group is not in 1NF ■ It has repeating groups, it is an `unnormalised table'. To remove the repeating group, either: ■ flatten the table and extend the key, or ■ decompose the relation- leading to First Normal Form @ V.V.R 75
  • 76.
    Flatten Table andExtend Primary Key The Student table with the repeating group can be written as: Student(matric_no, name, date_of_birth, ( subject, grade ) ) If the repeating group was flattened, it would look something like: Student(matric_no, name, date_of_birth, subject, grade ) This does not have repeating groups, but has redundancy. For every matric_no/subject combination, the student name and date of birth is replicated. This can lead to errors. Redundant data is the main cause of insertion, deletion, and updating anomalies. ■ Insertion anomaly – Subject is now in the primary key, we cannot add a student until they have at least one subject. Remember, no part of a primary key can be NULL. ■ Update anomaly – changing the name of a student means finding all rows of the database where that student exists and changing each one separately. ■ Deletion anomaly- for example deleting all database subject information also deletes student 960145. @ V.V.R 76
  • 77.
    Decomposing Relation The alternativeapproach is to split the table into two parts, one for the repeating groups and one of the non-repeating groups. The primary key for the original relation is included in both of the new relations Record: Student @ V.V.R 77
  • 78.
    We now havetwo relations, Student and Record. ■ Student contains the original non-repeating groups ■ Record has the original repeating groups and the matric_no Student(matric_no, name, date_of_birth ) Record(matric_no, subject, grade ) This version of the relations does not have insertion, deletion, or update anomalies. Without repeating groups, we say the relations are in First Normal Form (1NF). @ V.V.R 78
  • 79.
    Second Normal Form Arelation is in 2NF if, and only if, it is in 1NF and every non-key attribute is fully functionally dependent on the whole key. Thus the relation is in 1NF with no repeating groups, and all non-key attributes must depend on the whole key, not just some part of it. Another way of saying this is that there must be no partial key dependencies (PKDs). The problems arise when there is a compound key, e.g. the key to the Record relation - matric_no, subject. In this case it is possible for non-key attributes to depend on only part of the key - i.e. on only one of the two key attributes. This is what 2NF tries to prevent. @ V.V.R 79
  • 80.
    Consider again theStudent relation from the flattened table: Student(matric_no, name, date_of_birth, subject, grade ) There are no repeating groups, so the relation is in 1NF However, we have a compound primary key - so we must check all of the non-key attributes against each part of the key to ensure they are functionally dependent on it. ■ matric_no determines name and date_of_birth, but not grade. ■ subject together with matric_no determines grade, but not name or date_of_birth. So there is a problem with potential redundancies @ V.V.R 80
  • 81.
    Dependency Diagram A dependencydiagram is used to show how non-key attributes relate to each part or combination of parts in the primary key. matric_no grade subject date_of_bith name Student Fully Dependent PKD
  • 82.
    This relation isnot in 2NF – It appears to be two tables squashed into one. – the solutions is to split the relation into component parts. separate out all the attributes that are solely dependent on matric_no - put them in a new Student_details relation, with matric_no as the primary key separate out all the attributes that are solely dependent on subject - in this case no attributes are solely dependent on subject. separate out all the attributes that are solely dependent on matric_no + subject - put them into a separate Student relation, keyed on matric_no + subject Student Details matrix_no name date_of_birth Student matrix_no subject grade All attributes in each relation are fully functionally dependent upon its primary key These relations are now in 2NF
  • 83.
    Third Normal Form 3NFis an even stricter normal form and removes virtually all the redundant data : A relation is in 3NF if, and only if, it is in 2NF and there are no transitive functional dependencies Transitive functional dependencies arise: – when one non-key attribute is functionally dependent on another non-key attribute: • FD: non-key attribute -> non-key attribute – and when there is redundancy in the database By definition transitive functional dependency can only occur if there is more than one non-key field, so we can say that a relation in 2NF with zero or one non-key field must automatically be in 3NF
  • 84.
    project_no manager address Projecthas more than one non-key field so we must check for transitive dependency: p1 Black,B 32 High Street p2 Smith,J 11 New Street p3 Black,B 32 High Street p4 Black,B 32 High Street
  • 85.
    Address depends onthe value of manager. From the table we can propose: Project(project_no, manager, address) manager -> address In this case address is transitively dependent on manager. The primary key is project_no, but the LHS and RHS have no reference to this key, yet both sides are present in the relation. Data redundancy arises from this – we duplicate address if a manager is in charge of more than one project – causes problems if we had to change the address- have to change several entries, and this could lead to errors.
  • 86.
    Eliminate transitive functionaldependency by splitting the table – create two relations - one with the transitive dependency in it, and another for all of the remaining attributes. – split Project into Project and Manager. the determinant attribute becomes the primary key in the new relation - manager becomes the primary key to the Manager relation the original key is the primary key to the remaining non-transitive attributes - in this case, project_no remains the key to the new Projects table.
  • 87.
    Now we needto store the address only once If we need to know a manager's address we can look it up in the Manager relation The manager attribute is the link between the two tables, and in the Projects table it is now a foreign key. These relations are now in third normal form. Project project_no manager p1 Black,B p2 Smith,J p3 Black,B p4 Black,B Manager manager address Black,B 32 High Street Smith,J 11 New Street
  • 88.
    BOYCE CODD NORMALFORM • Many practitioners argue that placing entities in 3NF is generally sufficient because it is rare that entities that are in 3NF are not also in 4NF and 5NF – The advanced forms of normalization are: • Boyce-Codd Normal Form • Fourth Normal Form • Fifth Normal Form
  • 89.
    BOYCE CODD NORMALFORM • Boyce-Codd normal form (BCNF) is a more rigorous version of 3NF. • BCNF is based on the concept of determinants. A determinant column is one on which some of the columns are fully functionally dependent. • A relational table is in BCNF if and only if every determinant is a candidate key.
  • 90.
    Example 1 -Address (Not in BCNF) Scheme {City, Street, ZipCode } 1. Key1 {City, Street } 2. Key2 {ZipCode, Street} 3. No non-key attribute hence 3NF 4. {City, Street} {ZipCode} 5. {ZipCode} {City} 6. Dependency between attributes belonging to a key Boyce-Codd Normal Form (BCNF) Examples
  • 91.
    Example 2 -Movie (Not in BCNF) Scheme {MovieTitle, MovieID, PersonName, Role, Payment } 1. Key1 {MovieTitle, PersonName} 2. Key2 {MovieID, PersonName} 3. Both role and payment functionally depend on both candidate keys thus 3NF 4. {MovieID} {MovieTitle} 5. Dependency between MovieID & MovieTitle Violates BCNF Example 3 - Consulting (Not in BCNF) Scheme {Client, Problem, Consultant} 6. Key1 {Client, Problem} 7. Key2 {Client, Consultant} 8. No non-key attribute hence 3NF 9. {Client, Problem} {Consultant} 10. {Client, Consultant} {Problem} 11. Dependency between attributess belonging to keys violates BCNF Boyce Codd Normal Form (BCNF)
  • 92.
    1. Place thetwo candidate primary keys in separate entities 2. Place each of the remaining data items in one of the resulting entities according to its dependency on the primary key. Example 1 (Convert to BCNF) Old Scheme {City, Street, ZipCode } New Scheme1 {ZipCode, Street} New Scheme2 {City, Street} • Loss of relation {ZipCode} {City} Alternate New Scheme1 {ZipCode, Street } Alternate New Scheme2 {ZipCode, City} BCNF - Decomposition
  • 93.
    1. If decompositiondoes not cause any loss of information it is called a lossless decomposition. 2. If a decomposition does not cause any dependencies to be lost it is called a dependency-preserving decomposition. 3. Any table scheme can be decomposed in a lossless way into a collection of smaller schemas that are in BCNF form. However the dependency preservation is not guaranteed. 4. Any table can be decomposed in a lossless way into 3rd normal form that also preserves the dependencies. • 3NF may be better than BCNF in some cases Decomposition – Loss of Information Use your own judgment when decomposing schemas
  • 94.
    Example 2 (Convertto BCNF) Old Scheme {MovieTitle, MovieID, PersonName, Role, Payment } New Scheme {MovieID, PersonName, Role, Payment} New Scheme {MovieTitle, PersonName} • Loss of relation {MovieID} {MovieTitle} New Scheme {MovieID, PersonName, Role, Payment} New Scheme {MovieID, MovieTitle} • We got the {MovieID} {MovieTitle} relationship back Example 3 (Convert to BCNF) Old Scheme {Client, Problem, Consultant} New Scheme {Client, Consultant} New Scheme {Client, Problem} BCNF - Decomposition
  • 95.
    • Fourth normalform eliminates independent many-to-one relationships between columns. • To be in Fourth Normal Form, – a relation must first be in Boyce-Codd Normal Form. – a given relation may not contain more than one multi-valued attribute. Example (Not in 4NF) Scheme {MovieName, ScreeningCity, Genre) Primary Key: {MovieName, ScreeningCity, Genre) 1. All columns are a part of the only candidate key, hence BCNF 2. Many Movies can have the same Genre 3. Many Cities can have the same movie 4. Violates 4NF Fourth Normal Form (4NF) Movie ScreeningCity Genre Hard Code Los Angles Comedy Hard Code New York Comedy Bill Durham Santa Cruz Drama Bill Durham Durham Drama The Code Warrier New York Horror
  • 96.
    Example 2 (Notin 4NF) Scheme {Manager, Child, Employee} 1. Primary Key {Manager, Child, Employee} 2. Each manager can have more than one child 3. Each manager can supervise more than one employee 4. 4NF Violated Example 3 (Not in 4NF) Scheme {Employee, Skill, ForeignLanguage} 5. Primary Key {Employee, Skill, Language } 6. Each employee can speak multiple languages 7. Each employee can have multiple skills 8. Thus violates 4NF Fourth Normal Form (4NF) Manager Child Employee Jim Beth Alice Mary Bob Jane Mary NULL Adam Employee Skill Language 1234 Cooking French 1234 Cooking German 1453 Carpentry Spanish 1453 Cooking Spanish 2345 Cooking Spanish
  • 97.
    1. Move thetwo multi-valued relations to separate tables 2. Identify a primary key for each of the new entity. Example 1 (Convert to 3NF) Old Scheme {MovieName, ScreeningCity, Genre} New Scheme {MovieName, ScreeningCity} New Scheme {MovieName, Genre} 4NF - Decomposition Movie Genre Hard Code Comedy Bill Durham Drama The Code Warrier Horror Movie ScreeningCity Hard Code Los Angles Hard Code New York Bill Durham Santa Cruz Bill Durham Durham The Code Warrier New York
  • 98.
    Example 2 (Convertto 4NF) Old Scheme {Manager, Child, Employee} New Scheme {Manager, Child} New Scheme {Manager, Employee} Example 3 (Convert to 4NF) Old Scheme {Employee, Skill, ForeignLanguage} New Scheme {Employee, Skill} New Scheme {Employee, ForeignLanguage} 4NF - Decomposition Manager Child Jim Beth Mary Bob Manager Employee Jim Alice Mary Jane Mary Adam Employee Language 1234 French 1234 German 1453 Spanish 2345 Spanish Employee Skill 1234 Cooking 1453 Carpentry 1453 Cooking 2345 Cooking
  • 99.
    Multivalued Dependencies and FourthNormal Form Definition: • A multivalued dependency (MVD) X —>> Y specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in r such that t1 [X] = t2 [X], then two tuples t3 and t4 should also exist in r with the following properties, where we use Z to denote (R 2 (X υ Y)): – t3 [X] = t4 [X] = t1 [X] = t2 [X]. – t3 [Y] = t1 [Y] and t4 [Y] = t2 [Y]. – t3 [Z] = t2 [Z] and t4 [Z] = t1 [Z]. • An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X υ Y = R.
  • 100.
    Multivalued Dependencies and FourthNormal Form Definition: A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X —>> Y in F+ , X is a superkey for R. – Note: F+ is the (complete) set of all dependencies (functional or multivalued) that will hold in every relation state r of R that satisfies F. It is also called the closure of F.
  • 101.
    Join Dependency What isjoin dependency? • If a table can be recreated by joining multiple tables and each of this table have a subset of the attributes of the table, then the table is in Join Dependency. • It is a generalization of Multivalued Dependency • If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists. • Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D). • Alternatively, R1 and R2 are a lossless decomposition of R. • A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition. • The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R. • Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
  • 102.
    Example: <employee> Empname EmpSkillsEmpJob (Assigned Work) Tom Networking EJ001 Harry Web Development EJ002 Katie Programming EJ002 Emp Name EmpSkills Tom Networking Harry Web Development Katie Programming The table can be decomposed into 3 tables <EmployeeSkills>
  • 103.
    EmpName EmpJob Tom EJ001 HarryEJ002 Katie EJ002 <EmployeeJob> <JobSkills> EmpSkills EmpJob Networking EJ001 Web Development EJ002 Programming EJ002 Join dependency {(EmpName, EmpSkills ), ( EmpName, EmpJob), (EmpSkills, EmpJob)}
  • 104.
    Fifth Normal Form-5NF •A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless. • 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy. • 5NF is also known as Project-join normal form (PJ/NF). • Example: SUBJECT LECTURER SEMESTER Computer Anshika Semester 1 Computer John Semester1 Math John Semester 1 Math Akash Semester 2 Chemistry Praveen Semester 1
  • 105.
    5NF- Continued • Inthe above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data. • Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank. • So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3.
  • 106.
    SEMESTER SUBJECT Semester 1Computer Semester 1 Math Semester 1 Chemistry Semester 2 Math 5NF-Continued P1 SUBJECT LECTURER Computer Anshika Computer John Math John Math Akash Chemistry Praveen P2 P3 SEMESTER LECTURER Semester 1 Anshika Semester 1 John Semester 1 John Semester 2 Akash Semester 1 Praveen