Query Optimizer
When a SQL query is submitted to an Oracle database, Oracle query optimizer decide how to access the data. The process of making this decision is called query optimization, Oracle looks for the optimal way to retrieve the data, using the execution path.
1. Query Optimizer
When a SQL query is submitted to an Oracle database, Oracle decide how to access the data. The process of making this decision is called query optimization, Oracle looks for the optimal way to retrieve the data, using the execution path. The query optimizer has to determine which execution path will be the fastest.
Prior to Oracle7, Oracle had only a rule-based optimizer. A cost-based optimizer was introduced with Oracle7.
2. Types of Query Optimizer
A-Rule-based Optimizer
The rule-based optimizer Uses a set of predefined rules to determine query optimization.
|
Rule |
Meaning |
|---|---|
|
Single row by ROWID |
Use ROWID in the WHERE clause, or CURRENT OF CURSOR |
|
Single row by cluster join |
Use cluster key in the WHERE clause; query returns only one row |
|
Single row by hash cluster key with a unique or primary key |
Use all columns in hash cluster key in the WHERE clause; query returns only one row |
|
Single row by unique or primary key |
Use unique or primary key in the WHERE clause; query returns only single row |
| Clustered join | Use when all tables are in cluster and all columns in the cluster key are in the WHERE clause with equality condition |
| Hash cluster key | Use when all columns in hash cluster key are in the WHERE clause with equality condition |
| Indexed cluster key | Use when all columns in indexed cluster key are in the WHERE clause with equality condition |
| Composite index | Use when all columns in composite index are in the WHERE clause with equality condition |
| Single-column index | Use when all columns in single-column index are in the WHERE clause with equality condition |
| Bounded range search on an indexed column |
Use when indexed column is in the WHERE clause with bounded values specified |
| Unbounded range search on an indexed column |
Use when indexed column is in the WHERE clause without bounded values specified |
| Sort merge join | Use when columns from join of tables are in the WHERE clause with equality condition |
| MAX or MIN on indexed column |
Use MAX or MIN on an indexed column and when no WHERE or GROUP BY clause |
| ORDER BY on indexed column |
Use a single-column index or the leading portion of a multicolumn index in the ORDER BY clause when there is a PRIMARY KEY or NOT NULL clause on at least one column to ensure a value for all rows |
| Full table scan | Scan whole table if no prior rules apply |
Rule-base optimizer is better than a random optimization and better for simple queries, but it does have some weaknesses, if a query have several table joins and many indexes rules-based optimizer did not work good enough for complex queries wher many execution plans involves.
B- Cost-based Optimizer
The cost-based optimizer does more than simply look at a set of optimization rules. It selects the execution path that requires the least number of logical I/O operations and thus the lowest cost for the completion of the query. Oracle8 and later versions will use the cost-based optimizer to identify the optimal execution plan if any statistics for any table or index are present.
When only some tables contain CBO statistics, Oracle will use the cost-based optimization and estimate statistics for the other tables in the query at runtime. This can cause significant slowdown in the performance of the individual query.
3. Optimizer Operations(Query Processing)
For any SQL statement, the optimizer performs the operations listed in.
|
Operation |
Description |
|---|---|
|
Evaluation of expressions and conditions |
The optimizer first evaluates expressions. including |
|
Statement transformation |
The statements involving, correlated subqueries or views. |
|
Choice of optimizer goals |
The optimizer determines the goal of optimization. |
|
Choice of access paths |
Optimizer chooses one or more of the available access paths to obtain table data. |
|
Choice of join orders |
For a join statement, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result. |
Sometimes, the application designer can use hints in SQL statements to instruct the optimizer about how a statement should be executed.
4. Optimizer Goals
Throughput (amount of work in a particular period of time) :Default
The goal of query optimizer is set to chooses least amount of resources necessary to process statement.
Best for Batch applications (Report application) because the user concerned with the time necessary for the application to complete and user does not examine the results of individual statements while the application is running
Best response time: This means that it uses the least amount of resources necessary to process the first row accessed by a SQL statement. For interactive appication (Forms application, SQL Plus queries) because the interactive user is waiting to see the first row or first few rows accessed by the statement
The optimizer's behavior and goal for a SQL statement is affected by the following factors:
A- OPTIMIZER_MODE Initialization Parameter
Optimizer mode parameter associated with different modes to choose for quey optimization.
|
Value |
Description |
|---|---|
| RULE | The rule-based optimizer is used. |
| CHOOSE |
Causes optimizer to choose between rule-based and cost-based approaches. Rule-based = No statistics available for any table in the query Cost-based = Statistics are avaliable for some tables in the query, statistics of missing tables guessed by optimizer. |
|
ALL_ROWS |
Optimizer uses a cost-based approach with a goal of best throughput. Note: Presence of statistics are not required. Note: Most appropriate for data warehousing queries. |
|
FIRST_ROWS_n |
Optimizer uses a cost-based approach with a goal of best response time to return the first n number of rows; n can equal 1, 10, 100, or 1000. Note: Presence of statistics are not required. Note: This mode is most appropriate for OLTP type queries |
|
FIRST_ROWS |
The optimizer uses a mix of cost and heuristics(learning themselves) to find a best plan for fast delivery of the first few rows. Note: FIRST_ROWS is available for backward compatibility and plan stability; use FIRST_ROWS_n instead. |
1 |
ALTER SESSION SET optimizer_mode = first_rows_1; --current session |
If optimizer uses cost-based approach, and statement have no statistics, then the optimizer uses internal information (number of data blocks allocated to these tables) to estimate other statistics for these tables.
it is recommend that always use the cost-based optimizer. it is better in theory and practice, such as materialized views, are considered in the cost-based Optimizer, whereas the rule-based optimizer does not recognize them.
B- Optimizer SQL Hints for Changing the Query Optimizer Goal
Hints can be used to direct the Oracle query optimizer to use a specific optimization technique for a query and can override the OPTIMIZER_MODE initialization parameter for that SQL statement
Hints are embedded as comments in a SQL query, with the following syntax:
1 |
sql_action /*+ hint */ OR sql_action --+ hint
|
|
SELECT /*+ ALL_ROWS */ |
Cost-based approach with a goal of best response time regardless of the presence of statistic. |
|
SELECT /*+ FIRST_ROWS_1 */ empno, ename |
Cost-based approach with a goal of best throughput. |
If a hint is incorrect or invalid, Oracle ignores the hint without causing an error.
C- Query Optimizer Statistics in the Data Dictionary
The statistics used by the query optimizer are stored in the data dictionary. You can collect by using the DBMS_STATS package. This enables the query optimizer to choose the best execution plan.
If no statistics are available, the optimizer will do dynamic sampling. This may cause slower parse times. The dynamic sampling is controlled by initialization parameter OPTMIZER_DYNAMIC_SAMPLING.
The types of statistics used by the cost-based optimizer are shown.
|
Entity |
Type of Statistics |
|---|---|
| Table | Number of rows, blocks, unused blocks Average available free space per block Number of chained rows Average row length Remote average row length |
| Column | Number of distinct values per column (cardinality) Remote cardinality Second lowest column value Second highest column value Column density factor Number of NULLs for the column Data distribution factor |
| Index | Depth of index B*-tree structure Number of leaf blocks Number of distinct values Average number of leaf blocks per key Average number of data blocks per key Clustering factor |
|
System |
I/O performance and utilization (new in Oracle9i) |
C.i- Gathering Statistics
You should periodically collect statistics. Prior to Oracle8i, the ANALYZE command was used to collect statistics. Oracle8i introduced the DBMS_STATS package.
You can collect statistics for the complete database or for particular objects (manually or automatically). After creating a new index or doing a data load. Updating statistics should be a part of a general maintenance routine.
C.ii- Stored Statistics
If you feel that the cost-based optimizer is working appropriately and you don’t want new statistics to change execution plans, you can store the statistics for the database with the DBMS_STATS.EXPORT_SCHEMA_STATS procedure to save them before you update the statistics.
If the new set of statistics does not deliver the desired performance or better, you can use the DBMS_STATS.IMPORT_SCHEMA_STATS procedure to reimport the saved statistics. You can also store multiple versions of statistics in a statistics table you specify.
C.iii- Histograms
The CBO assumes that the values for a column are evenly distributed. For instance, if there are two distinct values in a column, the cost-based optimizer assumes that each value applies to 50% of the entries in the column. This assumption can be incorrect for columns with extreme data value skew and can result in the optimizer ’s making the wrong choice for an execution plan.
For table columns that contain values with large variations in number of duplicates, called skewed data. You can create histograms to avoid this potential problem. Introduced with Oracle8i, histograms give the optimizer a more detailed view of the distribution of data values in the column. You can create a histogram with procedures in the DBMS_STATS package. Histograms require some overhead, so you should not use them, by default, for all columns, but they can help to improve the accuracy of execution plans involving some columns with low selectivity.
5. Enabling and Controlling Query Optimizer Features
OPTIMIZER_FEATURES_ENABLE -Enabling parameter
The OPTIMIZER_FEATURES_ENABLE parameter acts as an umbrella parameter for the query optimizer. This parameter can be used to enable a series of optimizer-related features. It accepts release numbers, such as 8.0.4, 8.1.7, and 9.2.0.
This parameter was introduced with the main goal to preserve the old behavior of quey optimizer after upgration to new version.
For example, when you upgraded from release 8.1.5 to release 8.1.6, the default value of the OPTIMIZER_FEATURES_ENABLE would be 8.1.6. but you want to use 8.1.5 For plan stability or backward compatibility reasons, you might not want the query plans to change because of new optimizer features in a new release. In such a case, you can set the parameter to an earlier version.
OPTIMIZER_FEATURES_ENABLE=8.1.5;
Oracle does not recommend explicitly setting the parameter to an earlier release.
CURSOR_SHARING -Controlling parameter
This parameter converts literal values in SQL statements to bind variables. The optimizer generates the execution plan based on the presence of the bind variables and not the actual literal values.
DB_FILE_MULTIBLOCK_READ_COUNT -Controlling parameter
Specifies the number of blocks that are read in a single I/O during a full table scan or index fast full scan. Larger values result in a cheaper cost. If this parameter is not set explicitly (or is set is 0), it will use a default value of 8.
OPTIMIZER_INDEX_CACHING -Controlling parameter
This parameter controls the costing of an index probe in conjunction with a nested loop. The range of values 0 to 100 for OPTIMIZER_INDEX_CACHING indicates percentage of index blocks in the buffer cache, which modifies the optimizer's assumptions about index caching for nested loops and IN-list iterators. A value of 100 infers that 100% of the index blocks are likely to be found in the buffer cache and the optimizer adjusts the cost of an index probe or nested loop accordingly. Use caution when using this parameter because execution plans can change in favor of index caching.
OPTIMIZER_INDEX_COST_ADJ -Controlling parameter
This parameter can be used to adjust the cost of index probes. The range of values is 1 to 10000. The default value is 100, which means that indexes are evaluated as an access path based on the normal costing model. A value of 10 means that the cost of an index access path is one-tenth the normal cost of an index access path.
OPTIMIZER_MODE -Controlling parameter
This initialization parameter sets the mode of the optimizer at instance startup. The possible values are ALL_ROWS, FIRST_ROWS_n, and FIRST_ROWS.
PGA_AGGREGATE_TARGET -Controlling parameter
This parameter automatically controls the amount of memory allocated for sorts and hash joins. Larger amounts of memory reduce the optimizer cost of these operations.
STAR_TRANSFORMATION_ENABLED -Controlling parameter
If set to true, enables the query optimizer to cost a star transformation for star queries. The star transformation combines the bitmap indexes on the various fact table columns.
6. Access path
The query optimizer chooses an access path based on the following factors:
- The available access paths for the statement
- The estimated cost of executing the statement, using each access path.
The optimizer first determines which access paths are available by examining the conditions in the statement's WHERE clause and its FROM clause. The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan, using the statistics for the index, columns, and tables accessible to the statement. Finally, the optimizer chooses the execution plan with the lowest estimated cost.
When choosing an access path, the query optimizer is influenced by the following:
- Optimizer Hints
- Old Statistics
A- Full Table Scan
This type of scan reads all rows from a table up to the high water mark (HWM). The HWM marks the last block in the table that has ever had data written to it. During a full table scan, all blocks in the table that are under the high water mark are scanned. If you have deleted all the rows then you will still read up to the HWM. Truncate resets the HWM back to the start of the table. FTS uses multi-block i/o to read the blocks from disk.
When Oracle performs a full table scan, the blocks are read sequentially. Because the blocks are adjacent, I/O calls larger than a single block can be used to speed up the process. The size of the read calls range from one block to the number of blocks indicated by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT. Using multiblock reads means a full table scan can be performed very efficiently.
FTS is not recommended for large tables unless you are reading >5-10% of it (or so) or you intend to run in parallel.
Full table scans are cheaper than index range scans when accessing a large fraction of the blocks in a table. This is because full table scans can use larger I/O calls, and making fewer large I/O calls is cheaper than making many smaller calls.
When the Optimizer Uses Full Table Scans
Lack of Index : Indexes are not avalable or the query is unable to use any existing indexes. For example, if there is a function used on the indexed column in the query, the optimizer is unable to use the index.
Large Amount of Data : If optimizer thinks that query will access most of the blocks in the table, then it uses a full table scan, even though indexes might be available.
Small Table : If a table contains less than DB_FILE_MULTIBLOCK_READ_COUNT blocks under the high water mark, which can be read in a single I/O call, then a full table scan might be cheaper than an index range scan, regardless of the indexes are present.
High Degree of Parallelism : A high degree of parallelism for a table skews the optimizer toward full table scans over range scans. Examine the DEGREE column in ALL_TABLES for the table to determine the degree of parallelism.
Full Table Scan Hints : The hint FULL(table alias) instruct optimizer to use a full table scan.
Parallel Query Execution : When a full table scan is required, response time can be improved by using multiple parallel execution servers for scanning the table. Parallel queries are used generally in low-concurrency data warehousing environments
|
Small |
Number of blocks < 20 or 2% of total cached blocks, whichever is larger |
If STATISTICS_LEVEL is se to TYPICAL or higher, Oracle decides whether to cache a table depending on the table scan history. The table is cached only if a future table scan is likely to find the cached blocks. If STATISTICS_LEVEL is set to BASIC, the table is not cached. |
|
Medium |
Larger than a small table, but < 10% of total cached blocks |
Oracle decides whether to cache a table based on its table scan and workload history. It caches the table only if a future table scan is likely to find the cached blocks. |
|
Large |
> 10% of total cached blocks |
Not cached |
Automatic caching of small tables is disabled for tables that are created or altered with the CACHE attribute.
B- Rowid Scans
The rowid of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by specifying its rowid is the fastest way to retrieve a single row.
Oracle first obtains the rowids of the selected rows, either from the statement's WHERE clause or through an index scan. Oracle then locates each selected row in the table based on its rowid.
When the Optimizer Uses Rowids : This is generally the second step after retrieving the rowid from an index. The table access might be required for any columns in the statement not present in the index.
Access by rowid does not need to follow every index scan. If the index contains all the columns needed for the statement, then table access by rowid might not occur.
C- Index Scans
In this method, a row is retrieved by traversing the index. Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table.
The index contains not only the indexed value, but also the rowids of rows in the table having that value. Therefore, if the statement accesses other columns in addition to the indexed columns, then Oracle can find the rows in the table by using either a table access by rowid or a cluster scan.
C.i- Assessing I/O for Blocks, not Rows
Oracle does I/O by blocks. Therefore, the optimizer's decision to use full table scans is influenced by the percentage of blocks accessed, not rows. This is called the index clustering factor. If blocks contain single rows, then rows accessed and blocks accessed are the same.
However, most tables have multiple rows in each block. Consequently, the desired number of rows could be clustered together in a few blocks, or they could be spread out over a larger number of blocks.
Although the clustering factor is a property of the index, the clustering factor actually relates to the spread of similar indexed column values within data blocks in the table. A lower clustering factor indicates that the individual rows are concentrated within fewer blocks in the table. Conversely, a high clustering factor indicates that the individual rows are scattered more randomly across blocks in the table. Therefore, a high clustering factor means that it costs more to use a range scan to fetch rows by rowid, because more blocks in the table need to be visited to return the data. Example 13-3 shows how the clustering factor can affect cost.
Example 13-3 Effects of Clustering Factor on Cost Assume the following situation:
-
There is a table with 9 rows.
-
There is a non-unique index on col1 for table.
-
The c1 column currently stores the values A, B, and C.
-
The table only has three Oracle blocks.
Case 1: The index clustering factor is low for the rows as they are arranged in the following diagram.
1 |
Block 1 Block 2 Block 3 |
This is because the rows that have the same indexed column values for c1 are located within the same physical blocks in the table. The cost of using a range scan to return all of the rows that have the value A is low, because only one block in the table needs to be read.
Case 2: If the same rows in the table are rearranged so that the index values are scattered across the table blocks (rather than collocated), then the index clustering factor is higher.
1 |
Block 1 Block 2 Block 3 |
This is because all three blocks in the table must be read in order to retrieve all rows with the value A in col1.
C.ii- Index Unique Scans
This scan returns a single rowid if a statement contains a UNIQUE or a PRIMARY KEY constraint.
When Optimizer Uses : This access path is used when all columns of a unique (B-tree) index or primary key constraint are specified with equality conditions.
Index Unique Scan Hints : INDEX(alias index_name)
There might be cases where the table is across a database link and being accessed from a local table, or where the table is small enough for the optimizer to prefer a full table scan.
C.iii- Index Range Scans
This scan can return more than one row. Data is returned in the ascending order of index columns
When Optimizer Uses : Optimizer uses range scan when it finds range operations e.g.
1 |
* >, <, <>, >=, <=, BETWEEN |
Range scans can use unique or non-unique indexes. Range scans avoid sorting when index columns constitute the ORDER BY/GROUP BY clause.
Index Range Scan Hints : INDEX(table_alias index_name)
C.iv- Index Range Scans Descending
An index range scan descending is identical to an index range scan, except that the data is returned in descending order. Indexes, by default, are stored in ascending order.
When the Optimizer Uses : When an order by descending clause can be satisfied by an index.
Index Range Scan Descending Hints : INDEX_DESC(table_alias index_name)
C.v- Index Skip Scans
Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks.
Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped.
The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.
Consider, for example, a table employees (sex, employee_id, address) with a composite index on (sex, employee_id). Splitting this composite index would result in two logical subindexes, one for M and one for F.
For this example, suppose you have the following index data:
1 |
('F',98) ('F',100) ('F',102) ('F',104) ('M',101) ('M',103) ('M',105)
|
The index is split logically into the following two subindexes:
-
The first subindex has the keys with the value F.
-
The second subindex has the keys with the value M.
Figure 13-2 Index Skip Scan Illustration
The column sex is skipped in the following query:
1 |
SELECT * FROM employees WHERE employee_id = 101;
|
A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M.
C.vi- Full Scans
In certain circumstances it is possible for the whole index to be scanned as opposed to a range scan. We choose an index Full Scan when we have statistics that indicate that it is going to be more efficient than a Full table scan and a sort.
For example we may do a Full index scan when we do an unbounded scan of an index and want the data to be ordered in the index order. The optimizer may decide that selecting all the information from the index and not sorting is more efficient than doing a FTS or a Fast Full Index Scan and then sorting.
An Index full scan will perform single block i/o's and so it may prove to be inefficient.
A full scan is also available when there is no predicate, if both the following conditions are met:
- All of the columns in the table referenced in the query are included in the index.
- At least one of the index columns is not null.
A full scan can be used to eliminate a sort operation, because the data is ordered by the index key. It reads the blocks singly.
C.vii- Fast Full Index Scans
Fast full index scans are an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the NOT NULL constraint. A fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation, because the data is not ordered by the index key. It reads the entire index using multiblock reads, unlike a full index scan, and can be parallelized.
You can specify fast full index scans with the initialization parameter OPTIMIZER_FEATURES_ENABLE or the INDEX_FFS hint. Fast full index scans cannot be performed against bitmap indexes.
A fast full scan is faster than a normal full index scan in that it can use multiblock I/O and can be parallelized just like a table scan.
Fast Full Index Scan Hints : INDEX_FFS, same format and arguments as the regular INDEX hint
C.viii- Index Joins
An index join is a hash join of several indexes that together contain all the table columns that are referenced in the query. If an index join is used, then no table access is needed, because all the relevant column values can be retrieved from the indexes. An index join cannot be used to eliminate a sort operation.
Index Join Hints : You can specify an index join with the INDEX_JOIN hint.
C.xi- Bitmap Indexes
A bitmap join uses a bitmap for key values and a mapping function that converts each bit position to a rowid. Bitmaps can efficiently merge indexes that correspond to several conditions in a WHERE clause, using Boolean operations to resolve AND and OR conditions.
D- Cluster Access
A cluster scan is used to retrieve, from a table stored in an indexed cluster, all rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data block. To perform a cluster scan, Oracle first obtains the rowid of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this rowid.
E- Hash Access
A hash scan is used to locate rows in a hash cluster, based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data block. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.
F Sample Table Scans
A sample table scan retrieves a random sample of data from a simple table or a complex SELECT statement, such as a statement involving joins and views. This access path is used when a statement's FROM clause includes the SAMPLE clause or the SAMPLE BLOCK clause. To perform a sample table scan when sampling by rows with the SAMPLE clause, Oracle reads a specified percentage of rows in the table. To perform a sample table scan when sampling by blocks with the SAMPLE BLOCK clause, Oracle reads a specified percentage of table blocks.
Example 13-6 uses a sample table scan to access 1% of the employees table, sampling by blocks.
1 |
SELECT * FROM employees SAMPLE BLOCK (1); |
7. Understanding Joins
Joins are statements that retrieve data from more than one table. A join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a join condition in the WHERE clause. In a join, one row set is called inner, and the other is called outer.
A- How the Query Optimizer Executes Join Statements
To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:
- Access Paths: As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement.
-
Join Method : To join each pair of row sources, Oracle must perform a join operation. Join methods include nested loop, sort merge, cartesian, and hash joins.
-
Join Order: To execute a statement that joins more than two tables, Oracle joins two of the tables and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result.
B- How the Query Optimizer Chooses Execution Plans for Joins
The query optimizer considers the following when choosing an execution plan:
-
Optimizer places tables first in the join order those have UNIQUE or PRIMARY KEY constraints and return at most one row. The optimizer then optimizes the join of the remaining set of tables.
-
The table with the outer join operator must come after the other table in the condition in the join order (optimizer does not consider that violate this rule)
The optimizer generates a set of execution plans, according to possible join orders, join methods, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in the following ways:
-
The cost of a nested loops operation is based on the cost of reading each selected row of the outer table and each of its matching rows of the inner table into memory using statistics in data dictionary.
-
The cost of a sort merge join is based largely on the cost of reading all the sources into memory and sorting them.
- he cost of a hash join is based largely on the cost of building a hash table on one of the input sides to the join and using the rows from the other of the join to probe it.
The optimizer also considers other factors when determining the cost of each operation. For example:
-
A smaller sort area size is likely to increase the cost for a sort merge join because sorting takes more CPU time and I/O in a smaller sort area.
-
A larger multiblock read count is likely to decrease the cost for a sort merge join in relation to a nested loop join. If a large number of sequential blocks can be read from disk in a single I/O, then an index on the inner table for the nested loop join is less likely to improve performance over a full table scan. The multiblock read count is specified by the initialization parameterDB_FILE_MULTIBLOCK_READ_COUNT.
With the query optimizer, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for an outer join, then the optimizer ignores the hint and chooses the order. Also, you can override the optimizer's choice of join method with hints.
C- Nested Loop Joins
Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.
First optimizer returns all the rows from row source 1 and Then optimizer probe row source 2 once for each row returned from row source
The optimizer determines the driving table and designates it as the outer table.
The other table is designated as the inner table.
- The optimizer determines the driving table and designates it as the outer table.
- The other table is designated as the inner table.
- For every row in the outer table, Oracle accesses all the rows in the inner table. The outer loop is for every row in outer table and the inner loop is for every row in the inner table. The outer loop appears before the inner loop in the execution plan, as follows:
1 |
NESTED LOOPS outer_loop inner_loop |
For each relevant row in the first table (driving table), find all matching rows in the other table (probed table).
Accessing row source 2 is known a probing the inner table For nested loops to be efficient it is important that the first row source returns as few rows as possible as this directly controls the number of probes of the second row source. Also it helps if the access method for row source 2 is efficient as this operation is being repeated once for every row returned by row source 1.
When the Optimizer Uses : When joining small number of rows, with a good driving condition between the two tables.
Nested Loop Join Hints : USE_NL(table1 table2)
Nesting Nested Loops : The outer loop of a nested loop can be a nested loop itself. You can nest two or more outer loops together to join as many tables as needed.
D- Hash Joins
Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.
When the Optimizer uses: The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:
- A large amount of data needs to be joined.
- A large fraction of a small table needs to be joined.
Hash Join Hints: USE_HASH
E- Sort Merge Joins
Rows are produced by Row Source 1 and are then sorted Rows from Row Source 2 are then both sides are merged together (joined)
1 |
MERGE |
Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:
- The row sources are sorted already.
- A sort operation does not have to be done.
However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.
In a merge join, there is no concept of a driving table. The join consists of two steps:
- Sort join operation: Both the inputs are sorted on the join key.
- Merge join operation: The sorted lists are merged together.
If the input is already sorted by the join column, then a sort join operation is not performed for that row source.
When the Optimizer Uses : The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:
- The join condition between two tables is not an equi-join.
- Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
Sort Merge Join Hints : USE_MERGE
F- Cartesian Joins
A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.
When the Optimizer Uses : The optimizer uses Cartesian joins when it is asked to join two tables with no join conditions.
Cartesian Join Hints : ORDERED
G- Outer Joins
An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join condition and also returns some or all of those rows from one table for which no rows from the other satisfy the join condition.
Nested Loop Outer Joins
This operation is used when an outer join is used between two tables. The outer join returns the outer (preserved) table rows, even when there are no corresponding rows in the inner (optional) table.
In a regular outer join, the optimizer chooses the order of tables (driving and driven) based on the cost. However, in a nested loop outer join, the order of tables is determined by the join condition. The outer table, with rows that are being preserved, is used to drive to the inner table.
The optimizer uses nested loop joins to process an outer join in the following circumstances:
-
It is possible to drive from the outer table to inner table.
-
Data volume is low enough to make the nested loop method efficient.
For an example of a nested loop outer join, you can add the USE_NL hint to Example 13-8 to instruct the optimizer to use a nested loop. For example:
1 |
SELECT /*+ USE_NL(c o) */ cust_last_name,sum(nvl2(o.customer_id,0,1)) "Count"
|
Hash Join Outer Joins
The optimizer uses hash joins for processing an outer join if the data volume is high enough to make the hash join method efficient or if it is not possible to drive from the outer table to inner table.
The order of tables is determined by the cost. The outer table, including preserved rows, may be used to build the hash table, or it may be used to probe one.
Example 13-8 shows a typical hash join outer join query. In this example, all the customers with credit limits greater than 1000 are queried. An outer join is needed so that you do not miss the customers who do not have any orders.
Hash Join Outer Joins
1 |
SELECT cust_last_name, sum(nvl2(o.customer_id,0,1)) "Count" |
The query looks for customers which satisfy various conditions. An outer join returns NULL for the inner table columns along with the outer (preserved) table rows when it does not find any corresponding rows in the inner table. This operation finds all the customers rows that do not have any orders rows.
In this case, the outer join condition is the following:
customers.customer_id = orders.customer_id(+)
The components of this condition represent the following:
-
The outer table is customers.
-
The inner table is orders.
-
The join preserves the customers rows, including those rows without a corresponding row in orders.
You could use a NOT EXISTS subquery to return the rows. However, because you are querying all the rows in the table, the hash join performs better (unless the NOT EXISTS subquery is not nested).
In Example 13-9, the outer join is to a multitable view. The optimizer cannot drive into the view like in a normal join or push the predicates, so it builds the entire row set of the view.
Outer Join to a Multitable View
1 |
SELECT c.cust_last_name, sum(revenue) |
When an outer join cannot drive from the outer (preserved) table to the inner (optional) table, it cannot use a hash join or nested loop joins. Then it uses the sort merge outer join for performing the join operation.
The optimizer uses sort merge for an outer join:
-
If a nested loop join is inefficient. A nested loop join can be inefficient because of data volumes.
-
The optimizer finds it is cheaper to use a sort merge over a hash join because of sorts already required by other operations.
Full Outer Joins
A full outer join acts like a combination of the left and right outer joins. In addition to the inner join, rows from both tables that have not been returned in the result of the inner join are preserved and extended with nulls. In other words, full outer joins let you join tables together, yet still show rows that do not have corresponding rows in the joined tables.
The query in Example 13-10 retrieves all departments and all employees in each department, but also includes:
-
Any employees without departments
-
Any departments without employees
Full Outer Join
1 |
SELECT d.department_id, e.employee_id |
Last Updated (Tuesday, 24 November 2009 10:28)



