Translating SQL queriesinto Relational Algebra
Relational Algebra
The basic set of operations for the relational model is known
as the relational algebra. These operations enable a user to
specify basic retrieval requests.
The result of a retrieval is a new relation, which may have
been formed from one or more relations.
The relational algebra can be divided into two groups.
set operations from mathematical set theory
Select and Project operations developed specifically for
relational databases
3
4.
Unary Relational Operations
SELECT Operation
SELECT operation is used to select a subset of the tuples from a
relation that satisfy a selection condition.
Denoted by <selection condition>(R)
Example: To select the EMPLOYEE tuples whose department number is
four
4
5.
Cont…
SELECT OperationProperties
The SELECT operation <selection condition>(R) produces a relation S that
has the same schema as R
The SELECT operation is commutative; i.e.,
<condition1>(< condition2> ( R)) = <condition2> (< condition1> ( R))
A cascaded SELECT operation may be replaced by a
single selection with a conjunction of all the conditions; i.e.,
<condition1>(< condition2> (<condition3> ( R))
= <condition1> AND < condition2> AND < condition3> ( R)))
5
6.
Unary Relational Operations(cont.)
PROJECT Operation
This operation selects certain columns from the table and discards
the other columns.
Denoted by <attribute list>(R)
The project operation removes any duplicate tuples,
Example: To list each employee’s first and last name and salary,
6
7.
Cont…
PROJECT OperationProperties
The number of tuples in the result of projection <list> (R)is
always less or equal to the number of tuples in R.
If the list of attributes includes a key of R, then the number of
tuples is equal to the number of tuples in R.
7
8.
Unary Relational Operations(cont.)
Rename Operation
We may want to apply several relational algebra operations one
after the other. Either we can write the operations as a single
relational algebra expression by nesting the operations, or we can
apply one operation at a time and create intermediate result
relations.
Example: To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must apply a
select and a project operation.
8
9.
Employee
Fname Lname SSNBdate Address Sex Salary Supssn Dno
Fasika Hailu 333 1955-12-08 Goro F 4000 987 5
Jemal Kedir 987 1941-06-20 Sabian M 4300 333 4
Abel Moti 654 1947-06-14 Sabian M 6800 987 4
Sara Girma 125 1949-09-12 Sabian F 7800 146 2
Hussen Kifle 146 1943-02-20 Goro M 2300 333 3
…… ….. ….. …….. ….. ……. ……. …..
Robel Tura 667 1962-09-15 Kezira M 3800 333 5
9
10.
Exercise: On unaryoperations
To select the EMPLOYEE tuples whose salary is 6500
Retrieve the name and salary of all employees who
work for department 3.
To select the EMPLOYEE address and sex
Retrieve sex, address and salary of female employees
10
11.
Quiz
11
Retrieve the address,birth date and salary of all female employees who
either earn salary less than 5000 or work for department 5.
Retrieve first name of employees in department 5 whose salary is
between 3000 and 4000.
12.
Relational Algebra OperationsFrom
Set Theory
UNION Operation
Denoted by R S, is a relation that includes all tuples that are either in
R or in S or in both R and S. Duplicate tuples are eliminated.
Example: To retrieve the social security numbers of all employees who either
work in department 5 or directly supervise an employee who works in
department 5, we can use the union operation as follows:
DEP5_EMPS DNO=5 (EMPLOYEE)
RESULT1 SSN(DEP5_EMPS)
RESULT2(SSN) SUPERSSN(DEP5_EMPS)
RESULT RESULT1 RESULT2
The union operation produces the tuples that are in either RESULT1 or
RESULT2 or both. The two operands must be “type compatible”.
12
13.
Cont…
Type Compatibility
The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn)
must have the same number of attributes, and the domains of
corresponding attributes must be compatible; that is,
dom(Ai)=dom(Bi) for i=1, 2, ..., n.
The resulting relation for R1R2,R1 R2, or R1-R2 has the
same attribute names as the first operand relation R1 (by
convention).
13
Cont…
INTERSECTION OPERATION
Denotedby R S, is a relation that includes all tuples that are in both R
and S. The two operands must be "type compatible"
Example: The result of the intersection operation (figure below) includes only
those who are both students and instructors.
STUDENT INSTRUCTOR
15
16.
Cont…
Set Difference(or MINUS) Operation
Denoted by R - S, is a relation that includes all tuples that are in R but not
in S. The two operands must be "type compatible”.
Example: The figure shows the names of students who are not instructors,
and the names of instructors who are not students.
STUDENT-INSTRUCTOR INSTRUCTOR-STUDENT
17.
Cont…
Notice thatboth union and intersection are commutative
operations; that is
R S = S R, and R S = S R
Both union and intersection can be treated as n-arry operations
applicable to any number of relations as both are associative
operations; that is
R (S T) = (R S) T, and (R S) T = R (S T)
The minus operation is not commutative; that is, in general
R - S ≠ S – R
17
18.
Cont...
CARTESIAN (orcross product) Operation
This operation is used to combine tuples from two relations in a
combinatorial fashion. In general, the result of R(A1, A2, . . ., An)
x S(B1, B2, . . ., Bm) is a relation Q with degree n + m attributes
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order. The resulting
relation Q has one tuple for each combination of tuples-one
from R and one from S.
Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS
tuples, then
| R x S | will have nR * nS tuples.
The two operands do NOT have to be "type compatible”
Example:
FEMALE_EMPS SEX=’F’(EMPLOYEE)
EMPNAMES FNAME, LNAME, SSN (FEMALE_EMPS)
18
Binary Relational Operations
JOIN Operation
The sequence of cartesian product followed by select is used
quite commonly to identify and select related tuples from two
relations, a special operation, called JOIN. It is denoted by a
This operation is very important for any relational database with
more than a single relation, because it allows us to process
relationships among relations.
The general form of a join operation on two relations R(A1, A2, . .
., An) and S(B1, B2, . . ., Bm) is:
R <join condition>S
where R and S can be any relations that result from general
relational algebra expressions.
20
21.
Cont…
Example: Suppose thatwe want to retrieve the name of the
manager of each department. To get the manager’s name, we
need to combine each DEPARTMENT tuple with the EMPLOYEE
tuple whose SSN value matches the MGRSSN value in the
department tuple. We do this by using the join operation.
DEPT_MGR DEPARTMENT MGRSSN=SSN
EMPLOYEE
21
22.
What isQuery Processing?
transform query written in high-level language, into correct and
efficient execution strategy expressed in low-level language.
What is Query Optimization?
The activity of choosing a single “efficient” execution strategy
(from hundreds) as determined by database catalog statistics.
Which relational algebra expression, equivalent to the given
query, will lead to the most efficient solution plan?
How do operations pass data (main memory buffer, disk buffer,
…)?
Will this plan minimize resource usage? (CPU/Response
Time/Disk)
22
23.
Find all Managerswho work at a London branch.
SELECT * FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
23
Example:
• Three equivalent Relational Algebra queries are:
(1) (position='Manager') (city='London')
(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position='Manager') (city='London')(
Staff Staff.branchNo=Branch.branchNo Branch)
(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo
24.
Cont…
07/30/2025
Query Processing andOptimiztion
24
Assume:
1000 tuples in Staff; 50 tuples in Branch;
50 Managers; 5 London branches;
no indexes or sort keys;
results of any intermediate operations stored on disk;
cost of the final write is ignored;
tuples are accessed one at a time.
25.
Analysis of eachQuery Expression
07/30/2025
Query Processing and Optimiztion
25
Analysis 1:
i. read each tuple from the two relations n+m reads
ii. create a table of the Cartesian product nXm writes
iii. test each tuple of step 2nXm read
Total No. of Disk access: 2(nXm) +n+m
Analysis 2:
i. read each tuple from the two relations n+m reads
ii. create a table of the Join n writes
iii. test each tuple of step 2n read
Total No. of Disk access: 3(n) +m
Analysis 3:
i. read each tuple from the two relations n+m reads
ii. create a table of the Join n writes
iii. test each tuple of step 2n read
Total No. of Disk access: 3(n) +m
26.
Cost Comparison
07/30/2025
Query Processingand Optimiztion
26
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050
(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian product and join operations much more
expensive than selection, and third option significantly
reduces size of relations being joined together.
27.
Query Processing Steps
1.Query Decomposition
It is the process of transforming a high level query into a relational
algebra query, and to check that the query is syntactically and
semantically correct. It Consists of parsing and validation
27
• Processing can be divided into :Decomposition, Optimization , and
Execution ,Code generation
28.
Typical stages inquery decomposition are:
Analysis: lexical and syntactical analysis of the
query(correctness) based on attributes, data type.. ,. Query
tree will be built for the query containing leaf node for base
relations, one or many non-leaf nodes for relations
produced by relational algebra operations and root node for
the result of the query. Sequence of operation is from the
leaves to the root.(SELECT * FROM Catalog c ,Author a
Where a.authorid = c.authorid AND c.price>200 AND
a.country= ‘ USA’ )
Normalization: convert the query into a normalized form.
The predicate WHERE will be converted to Conjunctive () or
Disjunctive (V ) Normal form.
28
29.
Cont…
Semantic Analysis: toreject normalized queries that are
not correctly formulated or contradictory. Incorrect if
components do not contribute to generate result.
Contradictory if the predicate can not be satisfied by any
tuple. Say for example,(Catalog =“BS” Catalog= “CS”)
since a given book can only be classified in either of the
category at a time
Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform the
query to a semantically equivalent but more easily and
effectively computed form. For example, If a user don’t
29
30.
2. Query Optimization
Problem of query optimization is to find the sequence of steps
that produces the answer to user request in the most efficient
manner, given the database structure.
The performance of a query is affected by the tables or queries
that underlies the query and by the complexity of the query.
Given a request for data manipulation or retrieval, an
optimizer will choose an optimal plan for evaluating the
request from among the manifold alternative strategies. i.e.
there are many ways (access paths) for accessing desired file/
record.
hence ,DBMS is responsible to pick the best execution
strategy based on various considerations( Least amount of I/O and
CPU resources. )
30
31.
Approaches to QueryOptimization : Heuristics and Cost Function
A. Heuristics Approach
• Heuristics Approach uses the knowledge of the
characteristics of the relational algebra operations and the
relationship between the operators to optimize the query.
• Thus the heuristic approach of optimization will make use
of:
– Properties of individual operators
– Association between operators
– Query Tree: a graphical representation of the operators, relations, attributes and
predicates and processing sequence during query processing.
• It is composed of three main parts:
– The Leafs: the base relations used for processing the query/ extracting
the required information
– The Root: the final result/relation as an out put based on the operation
on the relations used for query processing
– Nodes: intermediate results or relations before reaching the final result.
• Sequence of execution of operation in a query tree will start from the leaves
and continues to the intermediate nodes and ends at the root.
31
32.
Using Heuristics inQuery Optimization
Process for heuristics optimization
1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal
representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
The main heuristic is to apply first the operations that reduce
the size of intermediate results.
E.g. Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations.
Slide 15-
32
Query block: The basic unit that can be translated into the algebraic
operators and optimized.
A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if these are
part of the block.
Nested queries within a query are identified as separate query blocks.
33.
Example:
Forevery project located in ‘Stafford’, retrieve the
project number, the controlling department number and
the department manager’s last name, address and
birthdate.
SQL query:
Q2: SELECT
P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS,
E.BDATE
FROM PROJECT AS P,DEPARTMENT
AS D, EMPLOYEE AS E
Slide 15-
33
• Query tree:
– A tree data structure that corresponds to a relational algebra expression.
It represents the input relations of the query as leaf nodes of the tree,
and represents the relational algebra operations as internal nodes.
• An execution of the query tree consists of executing an internal node
operation whenever its operands are available and then replacing that
internal node by the relation that results from executing the operation.
• Query graph:
– A graph data structure that corresponds to a relational calculus
expression. It does not indicate an order on which operations to
perform first. There is only a single graph corresponding to each query.
Heuristic Optimizationof Query Trees:
The same query could correspond to many different relational
algebra expressions — and hence many different query trees.
The task of heuristic optimization of query trees is to find a final
query tree that is efficient to execute.
Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
Slide 15-
36