Oracle8i Tuning
Release 8.1.5

A67775-01

Library

Product

Contents

Index

Prev Next

8
Tuning Distributed Queries

Oracle supports transparent distributed queries to access data from multiple databases. It also provides many other distributed features, such as transparent distributed transactions and a transparent, fully automatic two-phase commit. This chapter explains how the Oracle8 optimizer decomposes SQL statements, and how this affects performance of distributed queries. The chapter provides guidelines on how to influence the optimizer and avoid performance bottlenecks.

Topics include:

Remote and Distributed Queries

If a SQL statement references one or more remote tables, the optimizer first determines whether all remote tables are located at the same site. If all tables are located at the same remote site, Oracle sends the entire query to the remote site for execution. The remote site sends the resulting rows back to the local site. This is called a remote SQL statement. If the tables are located at more than one site, the optimizer decomposes the query into separate SQL statements to access each of the remote tables. This is called a distributed SQL statement. The site where the query is executed, called the "driving site," is normally the local site.

This section describes:

Remote Data Dictionary Information

If a SQL statement references multiple tables, then the optimizer must determine which columns belong to which tables before it can decompose the SQL statement. For example, for this query:

   SELECT DNAME, ENAME 
     FROM DEPT, EMP@REMOTE 
    WHERE DEPT.DEPTNO = EMP.DEPTNO 

The optimizer must first determine that the DNAME column belongs to the DEPT table and the ENAME column to the EMP table. Once the optimizer has the data dictionary information of all remote tables, it can build the decomposed SQL statements.

Column and table names in decomposed SQL statements appear between double quotes. You must enclose in double quotes any column and table names that contain special characters, reserved words, or spaces.

This mechanism also replaces an asterisk (*) in the select list with the actual column names. For example:

   SELECT * FROM DEPT@REMOTE; 

Results in the decomposed SQL statement

   SELECT A1."DEPTNO", A1."DNAME", A1."LOC" FROM "DEPT" A1; 


Note:

For simplicity, double quotes are not used in the remainder of this chapter.  


Remote SQL Statements

If the entire SQL statement is sent to the remote database, the optimizer uses table aliases A1, A2, and so on, for all tables and columns in the query, in order to avoid possible naming conflicts. For example:

   SELECT DNAME, ENAME 
     FROM DEPT@REMOTE, EMP@REMOTE 
   WHERE DEPT.DEPTNO = EMP.DEPTNO; 

is sent to the remote database as:

   SELECT A2.DNAME, A1.ENAME 
     FROM DEPT A2, EMP A1 
   WHERE A1.DEPTNO = A2.DEPTNO; 

Distributed SQL Statements

When a query accesses data on one or more databases, one site "drives" the execution of the query. This is known as the "driving site"; it is here that the data is joined, grouped and ordered. By default, the local Oracle server is the driving site. A hint called DRIVING_SITE enables you to manually specify the driving site.

The decomposition of SQL statements is important because it determines the number of records or even tables that must be sent through the network. A knowledge of how the optimizer decomposes SQL statements can help you achieve optimum performance for distributed queries.

If a SQL statement references one or more remote tables, the optimizer must decompose the SQL statement into separate queries to be executed on the different databases. For example:

   SELECT DNAME, ENAME 
   FROM DEPT, EMP@REMOTE 
   WHERE DEPT.DEPTNO = EMP.DEPTNO; 

Might be decomposed into

   SELECT DEPTNO, DNAME FROM DEPT; 

Which is executed locally, and

   SELECT DEPTNO, ENAME FROM EMP; 

which is sent to the remote database. The data from both tables is joined locally. All this is done automatically and transparently for the user or application.

In some cases, however, it might be better to send the local table to the remote database and join the two tables on the remote database. This can be achieved either by creating a view, or by using the DRIVING_SITE hint. If you decide to create a view on the remote database, a database link from the remote database to the local database is also needed.

For example (on the remote database):

   CREATE VIEW DEPT_EMP AS 
   SELECT DNAME, ENAME 
   FROM DEPT@LOCAL, EMP 
   WHERE DEPT.DEPTNO = EMP.DEPTNO; 

Then select from the remote view instead of the local and remote tables

   SELECT * FROM DEPT_EMP@REMOTE; 

Now the local DEPT table is sent through the network to the remote database, joined on the remote database with the EMP table, and the result is sent back to the local database.

See Also:

"DRIVING_SITE" for details about this hint.  

Rule-Based Optimization

Rule-based optimization does not have information about indexes for remote tables. It never, therefore, generates a nested loops join between a local table and a remote table with the local table as the outer table in the join. It uses either a nested loops join with the remote table as the outer table or a sort merge join, depending on the indexes available for the local table.

Cost-Based Optimization

Cost-based optimization can consider more execution plans than rule-based optimization. Cost-based optimization knows whether indexes on remote tables are available, and in which cases it would make sense to use them. Cost-based optimization considers index access of the remote tables as well as full table scans, whereas rule-based optimization considers only full table scans.

The particular execution plan and table access that cost-based optimization chooses depends on the table and index statistics. For example, with:

   SELECT DNAME, ENAME 
     FROM DEPT, EMP@REMOTE 
    WHERE DEPT.DEPTNO = EMP.DEPTNO 

The optimizer might choose the local DEPT table as the driving table and access the remote EMP table using an index, so the decomposed SQL statement becomes:

   SELECT ENAME FROM EMP WHERE DEPTNO = :1 

This decomposed SQL statement is used for a nested loops operation.

Using Views

If tables are on more than one remote site, it can be more effective to create a view than to use the DRIVING_SITE hint. If not all tables are on the same remote database, the optimizer accesses each remote table separately. For example:

   SELECT D.DNAME, E1.ENAME, E2.JOB 
     FROM DEPT D, EMP@REMOTE E1, EMP@REMOTE E2 
    WHERE D.DEPTNO = E1.DEPTNO 
      AND E1.MGR = E2.EMPNO; 

Results in the decomposed SQL statements

   SELECT EMPNO, ENAME FROM EMP; 

And

   SELECT ENAME, MGR, DEPTNO FROM EMP; 

If you want to join the two EMP tables remotely, you can create a view to accomplish this. Create a view with the join of the remote tables on the remote database. For example (on the remote database):

   CREATE VIEW EMPS AS 
   SELECT E1.DEPTNO, E1.ENAME, E2.JOB 
     FROM EMP E1, EMP E2 
    WHERE E1.MGR = E2.EMPNO; 

And now select from the remote view instead of the remote tables:

   SELECT D.DNAME, E.ENAME, E.JOB 
     FROM DEPT D, EMPS@REMOTE E 
   WHERE D.DEPTNO = E.DEPTNO; 

This results in the decomposed SQL statement

   SELECT DEPTNO, ENAME, JOB FROM EMPS; 

Using Hints

In a distributed query, all hints are supported for local tables. For remote tables, however, you can use only join order and join operation hints. (Hints for access methods, parallel hints, and so on, have no effect.) For remote mapped queries, all hints are supported.

See Also:

"Hints for Join Orders" and "Hints for Join Operations".  

EXPLAIN PLAN and SQL Decomposition

EXPLAIN PLAN gives information not only about the overall execution plan of SQL statements, but also about the way in which the optimizer decomposes SQL statements. EXPLAIN PLAN stores information in the PLAN_TABLE table. If remote tables are used in a SQL statement, the OPERATION column will contain the value REMOTE to indicate that a remote table is referenced, and the OTHER column will contain the decomposed SQL statement that will be sent to the remote database. For example:

   EXPLAIN PLAN FOR SELECT DNAME FROM DEPT@REMOTE 
   SELECT OPERATION, OTHER FROM PLAN_TABLE 

   OPERATION OTHER 
   --------- -------------------------------------
   REMOTE    SELECT A1."DNAME" FROM "DEPT" A1 

Note the table alias and the double quotes around the column and table names.

See Also:

Chapter 13, "Using EXPLAIN PLAN".  

Partition Views

You can use partition views to coalesce tables that have the same structure, but that also contain different partitions of data. This is useful for a distributed database where each partition resides on a database and the data in each partition has common geographical properties.

When a query is executed on such a partition view, and the query contains a predicate that contains the result set to a subset of the view's partitions, the optimizer chooses a plan which skips partitions that are not needed for the query. This partition elimination takes place at run time, when the execution plan references all partitions.

Rules for Use

This section describes the circumstances under which a UNION ALL view enables the optimizer to skip partitions. The Oracle server that contains the partition view must conform to the following rules:

Within a UNION ALL view there are multiple select statements, and each of these is called a "branch". A UNION ALL view is a partition view if each select statement it defines conforms to the following rules:

Partition elimination is based on column transitivity with constant predicates. The WHERE clause used in the query that accesses the partition view is pushed down to the WHERE clause of each of the branches in the UNION ALL view definition. Consider the following example:

   SELECT * FROM EMP_VIEW WHERE deptno=30; 

When the view EMP_VIEW is defined as:

   SELECT * FROM EMP@d10 WHERE deptno=10 
     UNION ALL 
   SELECT * FROM EMP@d20 WHERE deptno=20 
     UNION ALL 
   SELECT * FROM EMP@d30 WHERE deptno=30 
     UNION ALL 
   SELECT * FROM EMP@d40 WHERE deptno=40 

The "WHERE deptno=30" predicate used in the query is pushed down to the queries in the UNION ALL view. For a WHERE clause such as "WHERE deptno=10 and deptno=30", the optimizer applies transitivity rules to generate an extra predicate of "10=30". This extra predicate is always false, thus the table (EMP@d10) need not be accessed.

Transitivity applies to predicates which conform to the following rules:

EXPLAIN PLAN Output

To confirm that the system recognizes a partition view, check the EXPLAIN PLAN output. The following operations will appear in the OPERATIONS column of the EXPLAIN PLAN output, if a query was executed on a partition view:

VIEW  

This entry should include the optimizer cost in the COST column.  

UNION-ALL  

This entry should specify PARTITION in the OPTION column.  

FILTER  

When an operation is a child of the UNION-ALL operation, FILTER indicates that a constant predicate was generated that will always be FALSE. The partition will be eliminated.  

If PARTITION does not appear in the option column of the UNION-ALL operation, the partition view was not recognized, and no partitions were eliminated. Make sure that the UNION ALL view adheres to the rules as defined in "Rules for Use" .

Partition View Example

The following example shows a partition view CUSTOMER that is partitioned into two partitions. The EAST database contains the East Coast customers, and the WEST database contains the customers from the West Coast.

The WEST database contains the following table CUSTOMER_WEST:

   CREATE TABLE CUSTOMER_WEST 
   ( cust_no   NUMBER CONSTRAINT CUSTOMER_WEST_PK PRIMARY KEY, 
     cname     VARCHAR2(10), 
     location  VARCHAR2(10) 
    );

The EAST database contains the database CUSTOMER_EAST:

   CREATE TABLE CUSTOMER_EAST 
   ( cust_no   NUMBER CONSTRAINT CUSTOMER_EAST_PK PRIMARY KEY, 
     cname     VARCHAR2(10), 
     location  VARCHAR2(10) 
    ); 

The following partition view is created at the EAST database (you could create a similar view at the WEST database):

   CREATE VIEW customer AS 
     
   SELECT * FROM CUSTOMER_EAST 
   WHERE location='EAST'  
   UNION ALL 
   SELECT * FROM CUSTOMER_WEST@WEST 
   WHERE location='WEST'; 

If you execute the following statement, notice that the CUSTOMER_WEST table in the WEST database is not accessed:

EXPLAIN PLAN FOR SELECT * FROM CUSTOMER WHERE location='EAST'; 


Note:

The EAST database still needs column name and column datatype information for the CUSTOMER_WEST table, therefore it still needs a connection to the WEST database. In addition that the cost-based optimizer must be used. You could do this, for example, by issuing the statement ALTER SESSION SET OPTIMIZER_MODE=ALL_ROWS.  


As shown in the EXPLAIN PLAN output, the optimizer recognizes that the CUSTOMER_WEST partition need not be accessed:

SELECT LPAD(' ',LEVEL*3-3)||OPERATION OPERATION,COST,OPTIONS,
OBJECT_NODE, OTHER 
FROM PLAN_TABLE 
CONNECT BY PARENT_ID = PRIOR ID 
START WITH PARENT_ID IS NULL 
    

OPERATION                 COST OPTIONS    OBJECT_NOD OTHER 
------------------------- ---- ---------- ---------- ------------------------- 
SELECT STATEMENT             1 
   VIEW                      1 
      UNION-ALL                PARTITION 
         TABLE ACCESS        1 FULL 
         FILTER 
            REMOTE           1            WEST.WORLD SELECT "CUST_NO","CNAME", 
                                                     "LOCATION" FROM "CUSTOMER 
                                                     _WEST" "CUSTOMER_WEST" WH 
                                                     ERE "LOCATION"='EAST' AND 
                                                      "LOCATION"='WEST'   

Distributed Query Restrictions

Distributed queries within the same version of Oracle have these restrictions:

Transparent Gateways

The Transparent Gateways are used to access data from other data sources (relational databases, hierarchical databases, file systems, and so on). Transparent Gateways provide a means to transparently access data from a non-Oracle system, just as if it were another Oracle database.

Optimizing Heterogeneous Distributed SQL Statements

When a SQL statement accesses data from non-Oracle systems, it is said to be a heterogeneous distributed SQL statement. To optimize heterogeneous distributed SQL statements, follow the same guidelines as for optimizing distributed SQL statements that access Oracle databases only. However, you must take into consideration that the non-Oracle system usually does not support all the functions and operators that Oracle8 supports. The Transparent Gateways therefore tell Oracle (at connect time) which functions and operators they do support. If the other data source does not support a function or operator, Oracle will perform that function or operator. In this case Oracle obtains the data from the other data source and applies the function or operator locally. This affects the way in which the SQL statements are decomposed and can affect performance, especially if Oracle is not on the same machine as the other data source.

Gateways and Partition Views

You can use partition views with Oracle Transparent Gateways version 8 or higher. Make sure you adhere to the rules that are defined in "Rules for Use". In particular:

Summary: Optimizing Performance of Distributed Queries

You can improve performance of distributed queries in several ways:




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index