ASSESSMENT
QUIZ(2): 10%
LAB:15%
ASSIGNMENT:15%
MID:20%
FINAL:40%
ADVANCED DATABASE SYSTEM
Query Processing and Optimization
Chapter 1
Translating SQL queries into Relational Algebra
Relational Algebra
The basic set of operations for the relational model is known
as the relational algebra. These operations enable a user to
specify basic retrieval requests.
The result of a retrieval is a new relation, which may have
been formed from one or more relations.
The relational algebra can be divided into two groups.
 set operations from mathematical set theory
 Select and Project operations developed specifically for
relational databases
3
Unary Relational Operations
 SELECT Operation
SELECT operation is used to select a subset of the tuples from a
relation that satisfy a selection condition.

Denoted by <selection condition>(R)
Example: To select the EMPLOYEE tuples whose department number is
four
4
Cont…
 SELECT Operation Properties
 The SELECT operation <selection condition>(R) produces a relation S that
has the same schema as R
 The SELECT operation  is commutative; i.e.,
<condition1>(< condition2> ( R)) = <condition2> (< condition1> ( R))
 A cascaded SELECT operation may be replaced by a
single selection with a conjunction of all the conditions; i.e.,
<condition1>(< condition2> (<condition3> ( R))
= <condition1> AND < condition2> AND < condition3> ( R)))
5
Unary Relational Operations (cont.)
 PROJECT Operation
This operation selects certain columns from the table and discards
the other columns.

Denoted by <attribute list>(R)
 The project operation removes any duplicate tuples,
Example: To list each employee’s first and last name and salary,
6
Cont…
 PROJECT Operation Properties
 The number of tuples in the result of projection  <list> (R)is
always less or equal to the number of tuples in R.
 If the list of attributes includes a key of R, then the number of
tuples is equal to the number of tuples in R.
7
Unary Relational Operations (cont.)
 Rename Operation
We may want to apply several relational algebra operations one
after the other. Either we can write the operations as a single
relational algebra expression by nesting the operations, or we can
apply one operation at a time and create intermediate result
relations.
Example: To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must apply a
select and a project operation.
8
Employee
Fname Lname SSN Bdate Address Sex Salary Supssn Dno
Fasika Hailu 333 1955-12-08 Goro F 4000 987 5
Jemal Kedir 987 1941-06-20 Sabian M 4300 333 4
Abel Moti 654 1947-06-14 Sabian M 6800 987 4
Sara Girma 125 1949-09-12 Sabian F 7800 146 2
Hussen Kifle 146 1943-02-20 Goro M 2300 333 3
…… ….. ….. …….. ….. ……. ……. …..
Robel Tura 667 1962-09-15 Kezira M 3800 333 5
9
Exercise: On unary operations
 To select the EMPLOYEE tuples whose salary is 6500
 Retrieve the name and salary of all employees who
work for department 3.
 To select the EMPLOYEE address and sex
 Retrieve sex, address and salary of female employees
10
Quiz
11
Retrieve the address, birth date and salary of all female employees who
either earn salary less than 5000 or work for department 5.
Retrieve first name of employees in department 5 whose salary is
between 3000 and 4000.
Relational Algebra Operations From
Set Theory
 UNION Operation
Denoted by R  S, is a relation that includes all tuples that are either in
R or in S or in both R and S. Duplicate tuples are eliminated.
Example: To retrieve the social security numbers of all employees who either
work in department 5 or directly supervise an employee who works in
department 5, we can use the union operation as follows:
DEP5_EMPS  DNO=5 (EMPLOYEE)
RESULT1   SSN(DEP5_EMPS)
RESULT2(SSN)   SUPERSSN(DEP5_EMPS)
RESULT  RESULT1  RESULT2
The union operation produces the tuples that are in either RESULT1 or
RESULT2 or both. The two operands must be “type compatible”.
12
Cont…
 Type Compatibility
 The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn)
must have the same number of attributes, and the domains of
corresponding attributes must be compatible; that is,
dom(Ai)=dom(Bi) for i=1, 2, ..., n.
 The resulting relation for R1R2,R1  R2, or R1-R2 has the
same attribute names as the first operand relation R1 (by
convention).
13
Cont…
 UNION Example
STUDENT  INSTRUCTOR
14
Cont…
 INTERSECTION OPERATION
Denoted by R  S, is a relation that includes all tuples that are in both R
and S. The two operands must be "type compatible"
Example: The result of the intersection operation (figure below) includes only
those who are both students and instructors.
STUDENT  INSTRUCTOR
15
Cont…
 Set Difference (or MINUS) Operation
Denoted by R - S, is a relation that includes all tuples that are in R but not
in S. The two operands must be "type compatible”.
Example: The figure shows the names of students who are not instructors,
and the names of instructors who are not students.
STUDENT-INSTRUCTOR INSTRUCTOR-STUDENT
Cont…
 Notice that both union and intersection are commutative
operations; that is
R  S = S  R, and R  S = S  R
 Both union and intersection can be treated as n-arry operations
applicable to any number of relations as both are associative
operations; that is
R  (S  T) = (R  S)  T, and (R  S)  T = R  (S  T)
 The minus operation is not commutative; that is, in general
R - S ≠ S – R
17
Cont...
 CARTESIAN (or cross product) Operation
 This operation is used to combine tuples from two relations in a
combinatorial fashion. In general, the result of R(A1, A2, . . ., An)
x S(B1, B2, . . ., Bm) is a relation Q with degree n + m attributes
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order. The resulting
relation Q has one tuple for each combination of tuples-one
from R and one from S.
 Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS
tuples, then
| R x S | will have nR * nS tuples.
 The two operands do NOT have to be "type compatible”
Example:
FEMALE_EMPS   SEX=’F’(EMPLOYEE)
EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS)
18
19
Binary Relational Operations
 JOIN Operation
 The sequence of cartesian product followed by select is used
quite commonly to identify and select related tuples from two
relations, a special operation, called JOIN. It is denoted by a
 This operation is very important for any relational database with
more than a single relation, because it allows us to process
relationships among relations.
 The general form of a join operation on two relations R(A1, A2, . .
., An) and S(B1, B2, . . ., Bm) is:
R <join condition>S
where R and S can be any relations that result from general
relational algebra expressions.
20
Cont…
Example: Suppose that we want to retrieve the name of the
manager of each department. To get the manager’s name, we
need to combine each DEPARTMENT tuple with the EMPLOYEE
tuple whose SSN value matches the MGRSSN value in the
department tuple. We do this by using the join operation.
DEPT_MGR  DEPARTMENT MGRSSN=SSN
EMPLOYEE
21
 What is Query Processing?
 transform query written in high-level language, into correct and
efficient execution strategy expressed in low-level language.
 What is Query Optimization?
 The activity of choosing a single “efficient” execution strategy
(from hundreds) as determined by database catalog statistics.
 Which relational algebra expression, equivalent to the given
query, will lead to the most efficient solution plan?
 How do operations pass data (main memory buffer, disk buffer,
…)?
 Will this plan minimize resource usage? (CPU/Response
Time/Disk)
22
Find all Managers who work at a London branch.
SELECT * FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
23
Example:
• Three equivalent Relational Algebra queries are:
(1) (position='Manager')  (city='London') 
(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position='Manager')  (city='London')(
Staff Staff.branchNo=Branch.branchNo Branch)
(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo
Cont…
07/30/2025
Query Processing and Optimiztion
24
 Assume:
 1000 tuples in Staff; 50 tuples in Branch;
 50 Managers; 5 London branches;
 no indexes or sort keys;
 results of any intermediate operations stored on disk;
 cost of the final write is ignored;
 tuples are accessed one at a time.
Analysis of each Query Expression
07/30/2025
Query Processing and Optimiztion
25
Analysis 1:
i. read each tuple from the two relations n+m reads
ii. create a table of the Cartesian product nXm writes
iii. test each tuple of step 2nXm read
Total No. of Disk access: 2(nXm) +n+m
Analysis 2:
i. read each tuple from the two relations n+m reads
ii. create a table of the Join n writes
iii. test each tuple of step 2n read
Total No. of Disk access: 3(n) +m
Analysis 3:
i. read each tuple from the two relations n+m reads
ii. create a table of the Join n writes
iii. test each tuple of step 2n read
Total No. of Disk access: 3(n) +m
Cost Comparison
07/30/2025
Query Processing and Optimiztion
26
 Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050
(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
 Cartesian product and join operations much more
expensive than selection, and third option significantly
reduces size of relations being joined together.
Query Processing Steps
1. Query Decomposition
 It is the process of transforming a high level query into a relational
algebra query, and to check that the query is syntactically and
semantically correct. It Consists of parsing and validation
27
• Processing can be divided into :Decomposition, Optimization , and
Execution ,Code generation
Typical stages in query decomposition are:
Analysis: lexical and syntactical analysis of the
query(correctness) based on attributes, data type.. ,. Query
tree will be built for the query containing leaf node for base
relations, one or many non-leaf nodes for relations
produced by relational algebra operations and root node for
the result of the query. Sequence of operation is from the
leaves to the root.(SELECT * FROM Catalog c ,Author a
Where a.authorid = c.authorid AND c.price>200 AND
a.country= ‘ USA’ )
Normalization: convert the query into a normalized form.
The predicate WHERE will be converted to Conjunctive () or
Disjunctive (V ) Normal form.
28
Cont…
Semantic Analysis: to reject normalized queries that are
not correctly formulated or contradictory. Incorrect if
components do not contribute to generate result.
Contradictory if the predicate can not be satisfied by any
tuple. Say for example,(Catalog =“BS”  Catalog= “CS”)
since a given book can only be classified in either of the
category at a time
Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform the
query to a semantically equivalent but more easily and
effectively computed form. For example, If a user don’t
29
2. Query Optimization
 Problem of query optimization is to find the sequence of steps
that produces the answer to user request in the most efficient
manner, given the database structure.
 The performance of a query is affected by the tables or queries
that underlies the query and by the complexity of the query.
 Given a request for data manipulation or retrieval, an
optimizer will choose an optimal plan for evaluating the
request from among the manifold alternative strategies. i.e.
there are many ways (access paths) for accessing desired file/
record.
 hence ,DBMS is responsible to pick the best execution
strategy based on various considerations( Least amount of I/O and
CPU resources. )
30
Approaches to Query Optimization : Heuristics and Cost Function
A. Heuristics Approach
• Heuristics Approach uses the knowledge of the
characteristics of the relational algebra operations and the
relationship between the operators to optimize the query.
• Thus the heuristic approach of optimization will make use
of:
– Properties of individual operators
– Association between operators
– Query Tree: a graphical representation of the operators, relations, attributes and
predicates and processing sequence during query processing.
• It is composed of three main parts:
– The Leafs: the base relations used for processing the query/ extracting
the required information
– The Root: the final result/relation as an out put based on the operation
on the relations used for query processing
– Nodes: intermediate results or relations before reaching the final result.
• Sequence of execution of operation in a query tree will start from the leaves
and continues to the intermediate nodes and ends at the root.
31
Using Heuristics in Query Optimization
 Process for heuristics optimization
1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal
representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
 The main heuristic is to apply first the operations that reduce
the size of intermediate results.
 E.g. Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations.
Slide 15-
32
 Query block: The basic unit that can be translated into the algebraic
operators and optimized.
 A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if these are
part of the block.
 Nested queries within a query are identified as separate query blocks.
 Example:
 For every project located in ‘Stafford’, retrieve the
project number, the controlling department number and
the department manager’s last name, address and
birthdate.
 SQL query:
Q2: SELECT
P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS,
E.BDATE
FROM PROJECT AS P,DEPARTMENT
AS D, EMPLOYEE AS E
Slide 15-
33
• Query tree:
– A tree data structure that corresponds to a relational algebra expression.
It represents the input relations of the query as leaf nodes of the tree,
and represents the relational algebra operations as internal nodes.
• An execution of the query tree consists of executing an internal node
operation whenever its operands are available and then replacing that
internal node by the relation that results from executing the operation.
• Query graph:
– A graph data structure that corresponds to a relational calculus
expression. It does not indicate an order on which operations to
perform first. There is only a single graph corresponding to each query.
Slide 15-
34
Slide 15-
35
 Heuristic Optimization of Query Trees:
 The same query could correspond to many different relational
algebra expressions — and hence many different query trees.
 The task of heuristic optimization of query trees is to find a final
query tree that is efficient to execute.
 Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
Slide 15-
36
Slide 15-
37
Slide 15-
38

Query and optimizing operating system.pptx

  • 1.
  • 2.
    ADVANCED DATABASE SYSTEM QueryProcessing and Optimization Chapter 1
  • 3.
    Translating SQL queriesinto Relational Algebra Relational Algebra The basic set of operations for the relational model is known as the relational algebra. These operations enable a user to specify basic retrieval requests. The result of a retrieval is a new relation, which may have been formed from one or more relations. The relational algebra can be divided into two groups.  set operations from mathematical set theory  Select and Project operations developed specifically for relational databases 3
  • 4.
    Unary Relational Operations SELECT Operation SELECT operation is used to select a subset of the tuples from a relation that satisfy a selection condition.  Denoted by <selection condition>(R) Example: To select the EMPLOYEE tuples whose department number is four 4
  • 5.
    Cont…  SELECT OperationProperties  The SELECT operation <selection condition>(R) produces a relation S that has the same schema as R  The SELECT operation  is commutative; i.e., <condition1>(< condition2> ( R)) = <condition2> (< condition1> ( R))  A cascaded SELECT operation may be replaced by a single selection with a conjunction of all the conditions; i.e., <condition1>(< condition2> (<condition3> ( R)) = <condition1> AND < condition2> AND < condition3> ( R))) 5
  • 6.
    Unary Relational Operations(cont.)  PROJECT Operation This operation selects certain columns from the table and discards the other columns.  Denoted by <attribute list>(R)  The project operation removes any duplicate tuples, Example: To list each employee’s first and last name and salary, 6
  • 7.
    Cont…  PROJECT OperationProperties  The number of tuples in the result of projection  <list> (R)is always less or equal to the number of tuples in R.  If the list of attributes includes a key of R, then the number of tuples is equal to the number of tuples in R. 7
  • 8.
    Unary Relational Operations(cont.)  Rename Operation We may want to apply several relational algebra operations one after the other. Either we can write the operations as a single relational algebra expression by nesting the operations, or we can apply one operation at a time and create intermediate result relations. Example: To retrieve the first name, last name, and salary of all employees who work in department number 5, we must apply a select and a project operation. 8
  • 9.
    Employee Fname Lname SSNBdate Address Sex Salary Supssn Dno Fasika Hailu 333 1955-12-08 Goro F 4000 987 5 Jemal Kedir 987 1941-06-20 Sabian M 4300 333 4 Abel Moti 654 1947-06-14 Sabian M 6800 987 4 Sara Girma 125 1949-09-12 Sabian F 7800 146 2 Hussen Kifle 146 1943-02-20 Goro M 2300 333 3 …… ….. ….. …….. ….. ……. ……. ….. Robel Tura 667 1962-09-15 Kezira M 3800 333 5 9
  • 10.
    Exercise: On unaryoperations  To select the EMPLOYEE tuples whose salary is 6500  Retrieve the name and salary of all employees who work for department 3.  To select the EMPLOYEE address and sex  Retrieve sex, address and salary of female employees 10
  • 11.
    Quiz 11 Retrieve the address,birth date and salary of all female employees who either earn salary less than 5000 or work for department 5. Retrieve first name of employees in department 5 whose salary is between 3000 and 4000.
  • 12.
    Relational Algebra OperationsFrom Set Theory  UNION Operation Denoted by R  S, is a relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. Example: To retrieve the social security numbers of all employees who either work in department 5 or directly supervise an employee who works in department 5, we can use the union operation as follows: DEP5_EMPS  DNO=5 (EMPLOYEE) RESULT1   SSN(DEP5_EMPS) RESULT2(SSN)   SUPERSSN(DEP5_EMPS) RESULT  RESULT1  RESULT2 The union operation produces the tuples that are in either RESULT1 or RESULT2 or both. The two operands must be “type compatible”. 12
  • 13.
    Cont…  Type Compatibility The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n.  The resulting relation for R1R2,R1  R2, or R1-R2 has the same attribute names as the first operand relation R1 (by convention). 13
  • 14.
  • 15.
    Cont…  INTERSECTION OPERATION Denotedby R  S, is a relation that includes all tuples that are in both R and S. The two operands must be "type compatible" Example: The result of the intersection operation (figure below) includes only those who are both students and instructors. STUDENT  INSTRUCTOR 15
  • 16.
    Cont…  Set Difference(or MINUS) Operation Denoted by R - S, is a relation that includes all tuples that are in R but not in S. The two operands must be "type compatible”. Example: The figure shows the names of students who are not instructors, and the names of instructors who are not students. STUDENT-INSTRUCTOR INSTRUCTOR-STUDENT
  • 17.
    Cont…  Notice thatboth union and intersection are commutative operations; that is R  S = S  R, and R  S = S  R  Both union and intersection can be treated as n-arry operations applicable to any number of relations as both are associative operations; that is R  (S  T) = (R  S)  T, and (R  S)  T = R  (S  T)  The minus operation is not commutative; that is, in general R - S ≠ S – R 17
  • 18.
    Cont...  CARTESIAN (orcross product) Operation  This operation is used to combine tuples from two relations in a combinatorial fashion. In general, the result of R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm) is a relation Q with degree n + m attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order. The resulting relation Q has one tuple for each combination of tuples-one from R and one from S.  Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then | R x S | will have nR * nS tuples.  The two operands do NOT have to be "type compatible” Example: FEMALE_EMPS   SEX=’F’(EMPLOYEE) EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS) 18
  • 19.
  • 20.
    Binary Relational Operations JOIN Operation  The sequence of cartesian product followed by select is used quite commonly to identify and select related tuples from two relations, a special operation, called JOIN. It is denoted by a  This operation is very important for any relational database with more than a single relation, because it allows us to process relationships among relations.  The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is: R <join condition>S where R and S can be any relations that result from general relational algebra expressions. 20
  • 21.
    Cont… Example: Suppose thatwe want to retrieve the name of the manager of each department. To get the manager’s name, we need to combine each DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value matches the MGRSSN value in the department tuple. We do this by using the join operation. DEPT_MGR  DEPARTMENT MGRSSN=SSN EMPLOYEE 21
  • 22.
     What isQuery Processing?  transform query written in high-level language, into correct and efficient execution strategy expressed in low-level language.  What is Query Optimization?  The activity of choosing a single “efficient” execution strategy (from hundreds) as determined by database catalog statistics.  Which relational algebra expression, equivalent to the given query, will lead to the most efficient solution plan?  How do operations pass data (main memory buffer, disk buffer, …)?  Will this plan minimize resource usage? (CPU/Response Time/Disk) 22
  • 23.
    Find all Managerswho work at a London branch. SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND (s.position = ‘Manager’ AND b.city = ‘London’); 23 Example: • Three equivalent Relational Algebra queries are: (1) (position='Manager')  (city='London')  (Staff.branchNo=Branch.branchNo) (Staff X Branch) (2) (position='Manager')  (city='London')( Staff Staff.branchNo=Branch.branchNo Branch) (3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo
  • 24.
    Cont… 07/30/2025 Query Processing andOptimiztion 24  Assume:  1000 tuples in Staff; 50 tuples in Branch;  50 Managers; 5 London branches;  no indexes or sort keys;  results of any intermediate operations stored on disk;  cost of the final write is ignored;  tuples are accessed one at a time.
  • 25.
    Analysis of eachQuery Expression 07/30/2025 Query Processing and Optimiztion 25 Analysis 1: i. read each tuple from the two relations n+m reads ii. create a table of the Cartesian product nXm writes iii. test each tuple of step 2nXm read Total No. of Disk access: 2(nXm) +n+m Analysis 2: i. read each tuple from the two relations n+m reads ii. create a table of the Join n writes iii. test each tuple of step 2n read Total No. of Disk access: 3(n) +m Analysis 3: i. read each tuple from the two relations n+m reads ii. create a table of the Join n writes iii. test each tuple of step 2n read Total No. of Disk access: 3(n) +m
  • 26.
    Cost Comparison 07/30/2025 Query Processingand Optimiztion 26  Cost (in disk accesses) are: (1) (1000 + 50) + 2*(1000 * 50) = 101 050 (2) 2*1000 + (1000 + 50) = 3 050 (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160  Cartesian product and join operations much more expensive than selection, and third option significantly reduces size of relations being joined together.
  • 27.
    Query Processing Steps 1.Query Decomposition  It is the process of transforming a high level query into a relational algebra query, and to check that the query is syntactically and semantically correct. It Consists of parsing and validation 27 • Processing can be divided into :Decomposition, Optimization , and Execution ,Code generation
  • 28.
    Typical stages inquery decomposition are: Analysis: lexical and syntactical analysis of the query(correctness) based on attributes, data type.. ,. Query tree will be built for the query containing leaf node for base relations, one or many non-leaf nodes for relations produced by relational algebra operations and root node for the result of the query. Sequence of operation is from the leaves to the root.(SELECT * FROM Catalog c ,Author a Where a.authorid = c.authorid AND c.price>200 AND a.country= ‘ USA’ ) Normalization: convert the query into a normalized form. The predicate WHERE will be converted to Conjunctive () or Disjunctive (V ) Normal form. 28
  • 29.
    Cont… Semantic Analysis: toreject normalized queries that are not correctly formulated or contradictory. Incorrect if components do not contribute to generate result. Contradictory if the predicate can not be satisfied by any tuple. Say for example,(Catalog =“BS”  Catalog= “CS”) since a given book can only be classified in either of the category at a time Simplification: to detect redundant qualifications, eliminate common sub-expressions, and transform the query to a semantically equivalent but more easily and effectively computed form. For example, If a user don’t 29
  • 30.
    2. Query Optimization Problem of query optimization is to find the sequence of steps that produces the answer to user request in the most efficient manner, given the database structure.  The performance of a query is affected by the tables or queries that underlies the query and by the complexity of the query.  Given a request for data manipulation or retrieval, an optimizer will choose an optimal plan for evaluating the request from among the manifold alternative strategies. i.e. there are many ways (access paths) for accessing desired file/ record.  hence ,DBMS is responsible to pick the best execution strategy based on various considerations( Least amount of I/O and CPU resources. ) 30
  • 31.
    Approaches to QueryOptimization : Heuristics and Cost Function A. Heuristics Approach • Heuristics Approach uses the knowledge of the characteristics of the relational algebra operations and the relationship between the operators to optimize the query. • Thus the heuristic approach of optimization will make use of: – Properties of individual operators – Association between operators – Query Tree: a graphical representation of the operators, relations, attributes and predicates and processing sequence during query processing. • It is composed of three main parts: – The Leafs: the base relations used for processing the query/ extracting the required information – The Root: the final result/relation as an out put based on the operation on the relations used for query processing – Nodes: intermediate results or relations before reaching the final result. • Sequence of execution of operation in a query tree will start from the leaves and continues to the intermediate nodes and ends at the root. 31
  • 32.
    Using Heuristics inQuery Optimization  Process for heuristics optimization 1. The parser of a high-level query generates an initial internal representation; 2. Apply heuristics rules to optimize the internal representation. 3. A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query.  The main heuristic is to apply first the operations that reduce the size of intermediate results.  E.g. Apply SELECT and PROJECT operations before applying the JOIN or other binary operations. Slide 15- 32  Query block: The basic unit that can be translated into the algebraic operators and optimized.  A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block.  Nested queries within a query are identified as separate query blocks.
  • 33.
     Example:  Forevery project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate.  SQL query: Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E Slide 15- 33 • Query tree: – A tree data structure that corresponds to a relational algebra expression. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes. • An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation. • Query graph: – A graph data structure that corresponds to a relational calculus expression. It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query.
  • 34.
  • 35.
  • 36.
     Heuristic Optimizationof Query Trees:  The same query could correspond to many different relational algebra expressions — and hence many different query trees.  The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute.  Example: Q: SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = ‘AQUARIUS’AND PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’; Slide 15- 36
  • 37.
  • 38.