Oracle8i Tuning
Release 8.1.5

A67775-01

Library

Product

Contents

Index

Prev Next

26
Tuning Parallel Execution

Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with Decision Support Systems (DSS). You can also implement parallel execution on certain types of OLTP (Online Transaction Processing) and hybrid systems.

This chapter explains how to implement parallel execution and tune your system to optimize parallel execution performance.

See Also:

Oracle8i Concepts, for basic parallel execution concepts and your platform-specific Oracle documentation for more information about tuning parallel execution.  


Note:

Parallel execution is only available with the Oracle8i Enterprise Edition.  


This chapter discusses parallel execution in 4 phases. In Phase One, you determine whether to use automated or manual tuning. For many applications, automated tuning provides acceptable performance by automatically setting default values for parameters based on your system configuration.

Phase Two describes how to take advantage of parallelism and partitioning. In Phase Two, you determine the best type parallelism to use based on your needs. This phase also discusses how to take the best advantage of Oracle's partitioning features.

Phase Three describes how to create, populate, and refresh your database. Phase Four explains how to monitor and fine-tune parallel execution for optimal performance.

The phases and their contents are:

Phase One - Initializing and Tuning Parameters for Parallel Execution

Phase Two - Tuning Physical Database Layouts for Parallel Execution

Phase Three - Creating, Populating, and Refreshing the Database

Phase Four - Monitoring Parallel Execution Performance

Introduction to Parallel Execution Tuning

Parallel execution is useful for many types of operations accessing significant amounts of data. Parallel execution improves processing for:

You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access LOBs (Large Binary Objects).

Parallel execution benefits systems if they have all of the following characteristics:

If your system lacks any of these characteristics, parallel execution may not significantly improve performance. In fact, parallel execution can reduce system performance on over-utilized systems or systems with small I/O bandwidth.


Note:

The term "parallel execution server" designates server processes, or "threads" on NT systems, that perform parallel operations. This is not the same as the Oracle Parallel Server option that refers to multiple Oracle instances accessing the same database.  


See Also:

For more information about the Oracle Parallel Server, please refer to Oracle8i Parallel Server Concepts and Administration.  

When to Implement Parallel Execution

Parallel execution provides the greatest performance improvements in Decision Support Systems (DSS). However, Online Transaction Processing (OLTP) systems also benefit from parallel execution, but usually only during batch processing.

During the day, most OLTP systems should probably not use parallel execution. During off-hours, however, parallel execution can effectively process high-volume batch operations. For example, a bank might use parallelized batch programs to perform millions of updates to apply interest to accounts.

The more common example of using parallel execution is for DSS. Complex queries, such as those involving joins of several tables or searches for very large tables, are often best executed in parallel. It is for this reason that the remainder of this chapter discusses parallel execution with an emphasis on DSS environments.

Phase One - Initializing and Tuning Parameters for Parallel Execution

You can initialize and automatically tune parallel execution by setting the parameter PARALLEL_AUTOMATIC_TUNING to TRUE. Once enabled, automated parallel execution controls values for all parameters related to parallel execution. These parameters affect several aspects of server processing, namely, the DOP (degree of parallelism), the adaptive multi-user feature, and memory sizing.

With parallel automatic tuning enabled, Oracle determines parameter settings for each environment based on the number of CPUs on your system and the value set for PARALLEL_THREADS_PER_CPU. The default values Oracle sets for parallel execution processing when PARALLEL_AUTOMATIC_TUNING is TRUE are usually optimal for most environments. In most cases, Oracle's automatically derived settings are at least as effective as manually derived settings.

You can also manually tune parallel execution parameters, however, Oracle recommends using automated parallel execution. Manual tuning of parallel execution is more complex than using automated tuning for two reasons: Manual parallel execution tuning requires more attentive administration than automated tuning, and manual tuning is prone to user load and system resource miscalculations.

Initializing and tuning parallel execution involves the three steps described under the following headings. These steps are:

Step Three is a discussion of tuning general parameters. You may find the general parameters information useful if your parallel execution performance requires further tuning after you complete the first two steps.

Several examples describing parallel execution tuning appear at the end of this section. The example scenarios describe configurations that range from completely automated to completely manual systems.

Step One: Selecting Automated or Manual Tuning of Parallel Execution

There are several ways to initialize and tune parallel execution. You can make your environment fully automated for parallel execution, as mentioned, by setting PARALLEL_AUTOMATIC_TUNING to TRUE. You can further customize this type of environment by overriding some of the automatically derived values.

You can also leave PARALLEL_AUTOMATIC_TUNING at its default value of FALSE and manually set the parameters that affect parallel execution. For most OLTP environments and other types of systems that would not benefit from parallel execution, do not enable parallel execution.


Note:

Well established, manually tuned systems that achieve desired resource use patterns may not benefit from automated parallel execution.  


Automatically Derived Parameter Settings under Fully Automated Parallel Execution

When PARALLEL_AUTOMATIC_TUNING is TRUE, Oracle automatically sets other parameters as shown in Table 26-1. For most systems, you do not need to make further adjustments to have an adequately tuned, fully automated parallel execution environment.

Table 26-1 Parameters Affected by PARALLEL_AUTOMATIC_TUNING
Parameter  Default  Default if PARALLEL_
AUTOMATIC_
TUNING = TRUE
 
Comments 
PARALLEL_ADAPTIVE_
MULTI_USER  
FALSE   TRUE   -  
PROCESSES   6   The greater of: 1.2 x PARALLEL_ MAX_SERVERS or
PARALLEL_MAX_SERVERS
+ 6 + 5+ (CPUs x 4)  
Value is forced up to minimum if PARALLEL_AUTOMATIC_
TUNING is TRUE.  
SESSIONS   (PROCESSES x 1.1) + 5   (PROCESSES x 1.1) + 5   Automatic parallel tuning indirectly affects SESSIONS. If you do not set SESSIONS, Oracle sets it based on the value for PROCESSES.  
PARALLEL_MAX_
SERVERS  
0   CPU x 10   Use this limit to maximize the number of processes that parallel execution uses. The value for this parameter is port-specific so processing may vary from system to system.  
LARGE_POOL_SIZE   None   PARALLEL_
EXECUTION_
POOL+
MTS heap requirements +
Backup buffer requests +
600KB  
Oracle does not allocate parallel execution buffers from the SHARED_POOL.  
PARALLEL_EXECUTION_MESSAGE_SIZE   2KB
(port specific)  
4KB
(port specific)  
Default increased since Oracle allocates memory from the LARGE_POOL.  

As mentioned, you can manually adjust the parameters shown in Table 26-1, even if you set PARALLEL_AUTOMATIC_TUNING to TRUE. You might need to do this if you have a highly customized environment or if your system does not perform optimally using the completely automated settings.

Because parallel execution improves performance for a wide range of system types, you might want to use the examples at the end of this section as starting points. After observing your system's performance using these initial settings, you can further customize your system for parallel execution.

Step Two: Setting the Degree of Parallelism and Enabling Adaptive Multi-User

In this step, establish your system's degree of parallelism (DOP) and consider whether to enable adaptive multi-user.

Degree of Parallelism and Adaptive Multi-User and How They Interact

DOP specifies the number of available processes, or threads, used in parallel operations. Each parallel thread may use one or two query processes depending on the query's complexity.

The adaptive multi-user feature adjusts DOP based on user load. For example, you may have a table with a DOP of 5. This DOP may be acceptable with 10 users. But if 10 more users enter the system and you enable the PARALLEL_ADAPTIVE_MULTI_USER feature, Oracle reduces the DOP to spread resources more evenly according to the perceived system load.


Note:

Once Oracle determines the DOP for a query, the DOP does not change for the duration of the query.  


It is best to use the parallel adaptive multi-user feature when users process simultaneous parallel execution operations. If you enable PARALLEL_AUTOMATIC_TUNING, Oracle automatically sets PARALLEL_ADAPTIVE_MULTI_USER to TRUE.


Note:

Disable adaptive multi-user for single-user, batch processing systems or if your system already provides optimal performance.  


How the Adaptive Multi-User Algorithm Works

The adaptive multi-user algorithm has several inputs. The algorithm first considers the number of allocated threads as calculated by the database resource manager. The algorithm then considers the default settings for parallelism as set in INIT.ORA, as well as parallelism options used in CREATE TABLE and ALTER TABLE commands and SQL hints.

When a system is overloaded and the input DOP is larger than the default DOP, the algorithm uses the default degree as input. The system then calculates a reduction factor that it applies to the input DOP.

Enabling Parallelism for Tables and Queries

The DOP of tables involved in parallel operations affect the DOP for operations on those tables. Therefore, after setting parallel tuning-related parameters, enable parallel execution for each table you want parallelized using the PARALLEL option of the CREATE TABLE or ALTER TABLE commands. You can also use the PARALLEL hint with SQL statements to enable parallelism for that operation only.

When you parallelize tables, you can also specify the DOP or allow Oracle to set it automatically based on the value of PARALLEL_THREADS_PER_CPU.

Controlling Performance with PARALLEL_THREADS_PER_CPU

The initialization parameter PARALLEL_THREADS_PER_CPU affects algorithms controlling both the DOP and the adaptive multi-user feature. Oracle multiplies the value of PARALLEL_THREADS_PER_CPU by the number of CPUs per instance to derive the number of threads to use in parallel operations.

The adaptive multi-user feature also uses the default DOP to compute the target number of query server processes that should exist in a system. When a system is running more processes than the target number, the adaptive algorithm reduces the DOP of new queries as required. Therefore, you can also use PARALLEL_THREADS_PER_CPU to control the adaptive algorithm.

The default for PARALLEL_THREADS_PER_CPU is appropriate for most systems. However, if your I/O subsystem cannot keep pace with the processors, you may need to increase the value for PARALLEL_THREADS_PER_CPU. In this case, you need more processes to achieve better system scalability. If too many processes are running, reduce the number.

The default for PARALLEL_THREADS_PER_CPU on most platforms is 2. However, the default for machines with relatively slow I/O subsystems can be as high as 8.

Step Three: Tuning General Parameters

This section discusses the following types of parameters:

Parameters Establishing Resource Limits for Parallel Operations

The parameters that establish resource limits are:

PARALLEL_MAX_SERVERS

The recommended value is 2 x DOP x number_of_concurrent_users.

The PARALLEL_MAX_SEVERS parameter sets a resource limit on the maximum number of processes available for parallel execution. If you set PARALLEL_AUTOMATIC_TUNING to FALSE, you need to manually specify a value for PARALLEL_MAX_SERVERS.

Most parallel operations need at most twice the number of query server processes as the maximum DOP attributed to any table in the operation.

If PARALLEL_AUTOMATIC_TUNING is FALSE, the default value for PARALLEL_MAX_SERVERS is 5. This is sufficient for some minimal operations, but not enough for executing parallel execution. If you manually set the parameter PARALLEL_MAX_SERVERS, set it to 10 times the number of CPUs. This is a reasonable starting value.

To support concurrent users, add more query server processes. You probably want to limit the number of CPU-bound processes to be a small multiple of the number of CPUs: perhaps 4 to 16 times the number of CPUs. This would limit the number of concurrent parallel execution statements to be in the range of 2 to 8.

If a database's users initiate too many concurrent operations, Oracle may not have enough query server processes. In this case, Oracle executes the operations sequentially or displays an error if PARALLEL_MIN_PERCENT is set to another value other than the default value of 0 (zero).

When Users Have Too Many Processes

When concurrent users have too many query server processes, memory contention (paging), I/O contention, or excessive context switching can occur. This contention can reduce system throughput to a level lower than if parallel execution were not used. Increase the PARALLEL_MAX_SERVERS value only if your system has sufficient memory and I/O bandwidth for the resulting load. Limiting the total number of query server processes may restrict the number of concurrent users that can execute parallel operations, but system throughput tends to remain stable.

Increasing the Number of Concurrent Users

To increase the number of concurrent users, you could restrict the number of concurrent sessions that resource consumer groups can have. For example:

Limiting the Number of Resources for a User

You can limit the amount of parallelism available to a given user by establishing resource consumer group for the user. Do this to limit the number of sessions, concurrent logons, and the number of parallel processes that any one or group of users can have.

Each query server process working on a parallel execution statement is logged on with a session ID; each process counts against the user's limit of concurrent sessions. For example, to limit a user to 10 processes, set the user's limit to 11. One process is for the parallel coordinator and there remain 10 parallel processes that consist of two sets of query server servers. The user's maximum DOP would thus be 5.

See Also:

"Formula for Memory, Users, and Parallel Execution Server Processes" for more information on balancing concurrent users, DOP, and resources consumed. Also refer to the Oracle8i Administrator's Guide for more information about managing resources with user profiles and Oracle8i Parallel Server Concepts and Administration for more information on querying GV$ views.  

PARALLEL_MIN_SERVERS

The recommended value is 0 (zero).

The system parameter PARALLEL_MIN_SERVERS allows you to specify the number of processes to be started and reserved for parallel operations at startup in a single instance. The syntax is:

   PARALLEL_MIN_SERVERS=n 

Where n is the number of processes you want to start and reserve for parallel operations.

Setting PARALLEL_MIN_SERVERS balances the startup cost against memory usage. Processes started using PARALLEL_MIN_SERVERS do not exit until the database is shutdown. This way, when a query is issued the processes are likely to be available. It is desirable, however, to recycle query server processes periodically since the memory these processes use can become fragmented and cause the high water mark to slowly increase. When you do not set PARALLEL_MIN_SERVERS, processes exit after they are idle for 5 minutes.

LARGE_POOL_SIZE/SHARED_POOL_SIZE

The following discussion of how to tune the large pool is also true for tuning the shared pool, except as noted under the heading "SHARED_POOL_SIZE". You must also increase the value for this memory setting by the amount you determine

There is no recommended value for LARGE_POOL_SIZE. Instead, Oracle recommends leaving this parameter unset and having Oracle to set it for you by setting the PARALLEL_AUTOMATIC_TUNING parameter to TRUE. The exception to this is when the system-assigned value is inadequate for your processing requirements.


Note:

When PARALLEL_AUTOMATIC_TUNING is set to TRUE, Oracle allocates parallel execution buffers from the large pool. When this parameter is FALSE, Oracle allocates parallel execution buffers from the shared pool.  


Oracle automatically computes LARGE_POOL_SIZE if PARALLEL_AUTOMATIC_TUNING is TRUE. To manually set a value for LARGE_POOL_SIZE, query the V$SGASTAT view and increase or decrease the value for LARGE_POOL_SIZE depending on your needs.

For example, if Oracle displays the following error on startup:

   ORA-27102: out of memory 
   SVR4 Error: 12: Not enough space 

Consider reducing the value for LARGE_POOL_SIZE low enough so your database starts. If after lowering the value of LARGE_POOL_SIZE you see the error:

   ORA-04031: unable to allocate 16084 bytes of shared memory ("large 
   pool","unknown object","large pool hea","PX msg pool") 

Execute the following query to determine why Oracle could not allocate the 16,084 bytes:

   SELECT NAME, SUM(BYTES) FROM V$SGASTAT WHERE POOL='LARGE POOL' GROUP BY 
   ROLLUP (NAME) ; 

Oracle should respond with output similar to:

   NAME                       SUM(BYTES) 
   -------------------------- ---------- 
   PX msg pool                   1474572 
   free memory                    562132 
                                 2036704 
   3 rows selected. 

To resolve this, increase the value for LARGE_POOL_SIZE. This example shows the LARGE_POOL_SIZE to be about 2MB. Depending on the amount of memory available, you could increase the value of LARGE_POOL_SIZE to 4MB and attempt to start your database. If Oracle continues to display an ORA-4031 message, gradually increase the value for LARGE_POOL_SIZE until startup is successful.

Computing Additional Memory Requirements for Message Buffers

After you determine the initial setting for the large or shared pool, you must calculate additional memory requirements for message buffers and determine how much additional space you need for cursors.

Adding Memory for Message Buffers

You must increase the value for the LARGE_- or the SHARED_POOL_SIZE parameters to accommodate message buffers. The message buffers allow query server processes to communicate with each other. If you enable automatic parallel tuning, Oracle allocates space for the message buffer from the large pool. Otherwise, Oracle allocates space from the shared pool.

Oracle uses a fixed number of buffers per virtual connection between producer and consumer query servers. Connections increase as the square of the DOP increases. For this reason, the maximum amount of memory used by parallel execution is bound by the highest DOP allowed on your system. You can control this value using either the PARALLEL_MAX_SERVERS parameter or by using policies and profiles.

See Also:

Oracle8i Concepts, for information on how Oracle makes connections between servers.  

Calculate how much additional memory you need for message buffers according the following 5 steps. These 5 steps are nearly the same steps Oracle performs when you set the PARALLEL_AUTOMATIC_TUNING parameter to TRUE. If you enable automatic tuning and check the computed value, you will get the same result.

  1. Determine the Maximum DOP possible on your system. When determining this value, consider how you parallelize your batch jobs: you use more memory for a single job using a large DOP than you use for multiple jobs with smaller DOPs. Thus, to ensure you have enough memory for message buffers, calculate an "upper bound" DOP. This DOP should also take multiple instances into account. In other words, to use a degree of 4 in 2 instances, the number you calculate should be 8, not 4. A conservative way to compute the maximum value is to take the value of PARALLEL_MAX_SERVERS multiplied by the number of instances and divide by 4. This number is the DOP in the formula appearing after step 5.

  2. Determine the number of instances participating in the SQL statements. For most installations, this number will be 1. This value is INSTANCES in the formula.

  3. Estimate the maximum number of concurrent queries executing at this DOP. A number of 1 is a reasonable value if either PARALLEL_ADAPTIVE_MULTI_USER is set to TRUE or if you have set DOP to be a value which is either greater than or equal to the value for PARALLEL_MAX_SERVERS divided by 4. This is because your DOP is then bound by the number of servers. This number is USERS in the formula below.

  4. Calculate the maximum number of query server process groups per query. Normally, Oracle uses only one group of query server processes per query. Sometimes with subqueries, however, Oracle uses one group of query server processes for each subquery. A conservative starting value for this number is 2. This number is GROUPS in the formula appearing after step 5.

  5. Determine the parallel message size using the value for the parameter PARALLEL_MESSAGE_SIZE. This is usually 2KB or 4KB. Use the SQLPlus SHOW PARAMETERS command to see the current value for PARALLEL_MESSAGE_SIZE.

Memory Formula for SMP Systems

Most SMP systems use the following formula:


Where CONNECTIONS = (DOP2 + 2 x DOP).

Memory Formula for MPP Systems

If you are using OPS and the value for INSTANCES is greater than 1, use the following formula. This formula calculates the number of buffers needed for local virtual connections as well as for remote physical connections. You can use the value of REMOTE as the number of remote connections among nodes to help tune your operating system. The formula is:


Where:

Each instance uses the memory computed by the formula.

Add this amount to your original setting for the large or shared pool. However, before setting a value for either of these memory structures, you must also consider additional memory for cursors as explained under the following heading.

Calculating Additional Memory for Cursors

Parallel execution plans consume more space in the SQL area than serial execution plans. You should regularly monitor shared pool resource use to ensure both structures have enough memory to accommodate your system's processing requirements.

See Also:

For more information about execution plans, please refer to Chapter 13, "Using EXPLAIN PLAN". For more information about how query servers communicate, please refer to Oracle8i Concepts.  


Note:

If you used parallel execution in previous releases and now intend to manually tune it, reduce the amount of memory allocated for LARGE_POOL_SIZE to account for the decreased demand on this pool.  


Adjusting Memory After Processing Begins

The formulae in this section are just starting points. Whether you are using automated or manual tuning, you should monitor usage on an on-going basis to make sure the size of memory is not too large or too small. To do this, tune the large and shared pools pool after examining the size of structures in the large pool using a query syntax of:

   SELECT POOL, NAME, SUM(BYTES) FROM V$SGASTAT WHERE POOL LIKE '%pool%' 
   GROUP BY ROLLUP (POOL, NAME); 

Oracle responds with output similar to:

POOL        NAME                       SUM(BYTES) 
----------- -------------------------- ---------- 
large pool  PX msg pool                  38092812 
large pool  free memory                    299988 
large pool                               38392800 
shared pool Checkpoint queue                38496 
shared pool KGFF heap                        1964 
shared pool KGK heap                         4372 
shared pool KQLS heap                     1134432 
shared pool LRMPD SGA Table                 23856 
shared pool PLS non-lib hp                   2096 
shared pool PX subheap                     186828 
shared pool SYSTEM PARAMETERS               55756 
shared pool State objects                 3907808 
shared pool character set memory            30260 
shared pool db_block_buffers               200000 
shared pool db_block_hash_buckets           33132 
shared pool db_files                       122984 
shared pool db_handles                      52416 
shared pool dictionary cache               198216 
shared pool dlm shared memory             5387924 
shared pool enqueue_resources               29016 
shared pool event statistics per sess      264768 
shared pool fixed allocation callback        1376 
shared pool free memory                  26329104 
shared pool gc_*                            64000 
shared pool latch nowait fails or sle       34944 
shared pool library cache                 2176808 
shared pool log_buffer                      24576 
shared pool log_checkpoint_timeout          24700 
shared pool long op statistics array        30240 
shared pool message pool freequeue         116232 
shared pool miscellaneous                  267624 
shared pool processes                       76896 
shared pool session param values            41424 
shared pool sessions                       170016 
shared pool sql area                      9549116 
shared pool table columns                  148104 
shared pool trace_buffers_per_process     1476320 
shared pool transactions                    18480 
shared pool trigger inform                  24684 
shared pool                              52248968 
                                         90641768 
41 rows selected.

Evaluate the memory used as shown in your output and alter the setting for LARGE_POOL_SIZE based on your processing needs.

To obtain more memory usage statistics, execute the query:

   SELECT * FROM V$PX_PROCESS_SYSSTAT WHERE STATISTIC LIKE 'Buffers%';

Oracle responds with output similar to:

STATISTIC                      VALUE      
------------------------------ ---------- 
Buffers Allocated                   23225 
Buffers Freed                       23225 
Buffers Current                         0 
Buffers HWM                          3620 
4 Rows selected. 

The amount of memory used appears in the statistics "Buffers Current" and "Buffers HWM". Calculate a value for the bytes by multiplying the number of buffers by the value for PARALLEL_EXECUTION_MESSAGE_SIZE. Compare the high water mark to the parallel execution message pool size to determine if you allocated too much memory. For example, in the first output, the value for large pool as shown in 'px msg pool' is 38092812, or 38MB. The 'Buffers HWM' from the second output is 3,620, which when multiplied by a parallel execution message size of 4,096 is 14,827,520, or approximately 15MB. In this case, the high water mark has reached approximately 40% of its capacity.

SHARED_POOL_SIZE

As mentioned earlier, if PARALLEL_AUTOMATIC_TUNING is FALSE, Oracle allocates query server processes from the shared pool. In this case, tune the shared pool as described under the previous heading for large pool except for the following:

You must also take into account that using parallel execution generates more cursors. Look at statistics in the V$SQLAREA view to determine how often Oracle recompiles cursors. If the cursor hit ratio is poor, increase the size of the pool.

Use the following query to determine how much memory each component uses and thus to tune the value for SHARED_POOL_SIZE.

  SELECT POOL, NAME, SUM(BYTES) FROM V$SGASTAT WHERE POOL LIKE '%pool%' 
  GROUP BY ROLLUP (POOL, NAME); 

Oracle responds with output similar to:

POOL        NAME                       SUM(BYTES) 
----------- -------------------------- ---------- 
shared pool Checkpoint queue                38496 
shared pool KGFF heap                        1320 
shared pool KGK heap                         4372 
shared pool KQLS heap                      904528 
shared pool LRMPD SGA Table                 23856 
shared pool PLS non-lib hp                   2096 
shared pool PX msg pool                    373864 
shared pool PX subheap                      65188 
shared pool SYSTEM PARAMETERS               55828 
shared pool State objects                 3877520 
shared pool character set memory            30260 
shared pool db_block_buffers               200000 
shared pool db_block_hash_buckets           33132 
shared pool db_files                       122984 
shared pool db_handles                      36400 
shared pool dictionary cache               181792 
shared pool dlm shared memory             5387924 
shared pool enqueue_resources               25560 
shared pool event statistics per sess      189120 
shared pool fixed allocation callback        1376 
shared pool free memory                  36255072 
shared pool gc_*                            64000 
shared pool latch nowait fails or sle       34944 
shared pool library cache                  559676 
shared pool log_buffer                      24576 
shared pool log_checkpoint_timeout          24700 
shared pool long op statistics array        21600 
shared pool message pool freequeue         116232 
shared pool miscellaneous                  230016 
shared pool network connections             17280 
shared pool processes                       53736 
shared pool session param values            58684 
shared pool sessions                       121440 
shared pool sql area                      1232748 
shared pool table columns                  148104 
shared pool trace_buffers_per_process     1025232 
shared pool transactions                    18480 
shared pool trigger inform                  16456 
shared pool                              51578592 
                                         51578592 
40 rows selected.

You can then monitor the number of buffers used by parallel execution in the same way as explained previously, and compare the 'shared pool PX msg pool' to the current high water mark reported in output from the view V$PX_PROCESS_SYSSTAT.

PARALLEL_MIN_PERCENT

The recommended value for this parameter is 0 (zero).

This parameter allows users to wait for an acceptable DOP depending on the application in use. Setting this parameter to values other than 0 (zero) causes Oracle to return an error when the required minimum DOP cannot be satisfied by the system at a given time.

For example, if you set PARALLEL_MIN_PERCENT to 50, which translates to "50%", and the DOP is reduced because of the adaptive algorithm or because of a resource limitation, then oracle returns ORA-12827. For example:

   SELECT /*+ PARALLEL(e, 4, 1) */ d.deptno, SUM(SAL) 
   FROM emp e, dept d WHERE e.deptno = d.deptno 
   GROUP BY d.deptno ORDER BY d.deptno; 

Oracle responds with this message:

   ORA-12827: INSUFFICIENT PARALLEL QUERY SLAVES AVAILABLE

PARALLEL_SERVER_INSTANCES

The recommended value is to set this parameter equal to the number of instances in your parallel server environment.

The PARALLEL_SERVER_INSTANCES parameter specifies the number of instances configured in a parallel server environment. Oracle uses the value of this parameter to compute values for LARGE_POOL_SIZE when PARALLEL_AUTOMATIC_TUNING is set to TRUE.

Parameters Affecting Resource Consumption

The parameters that affect resource consumption are:

The first group of parameters discussed in this section affects memory and resource consumption for all parallel operations, and in particular for parallel execution. A second subset of parameters discussed in this section explains parameters affecting parallel DML and DDL. Chapter 27, "Understanding Parallel Execution Performance Issues" describes in detail how these parameters interrelate.

To control resource consumption, configure memory at two levels:

The SGA is typically part of real physical memory. The SGA is static and of fixed size; if you want to change its size, shut down the database, make the change, and restart the database. Oracle allocates the large and shared pools out of the SGA.

A large percentage of the memory used in data warehousing operations is more dynamic. This memory comes from process memory and both the size of process memory and the number of processes can vary greatly. This memory is controlled by the HASH_AREA_SIZE and SORT_AREA_SIZE parameters. Together these parameters control the amount of virtual memory used by Oracle.

Process memory, in turn, comes from virtual memory. Total virtual memory should be somewhat larger than available real memory, which is the physical memory minus the size of the SGA. Virtual memory generally should not exceed twice the size of the physical memory minus the SGA size. If you set virtual memory to a value several times greater than real memory, the paging rate may increase when the machine is overloaded.

As a general rule for memory sizing, each process requires adequate address space for hash joins. A dominant factor in high volume data warehousing operations is the relationship between memory, the number of processes, and the number of hash join operations. Hash joins and large sorts are memory-intensive operations, so you may want to configure fewer processes, each with a greater limit on the amount of memory it can use. Sort performance, however, degrades with increased memory use.

HASH_AREA_SIZE

Set HASH_AREA_SIZE using one of two approaches. The first approach examines how much memory is available after configuring the SGA and calculating the amount of memory processes the system uses during normal loads.

The total amount of memory that Oracle processes are allowed to use should be divided by the number of processes during the normal load. These processes include parallel execution servers. This number determines the total amount of working memory per process. This amount then needs to be shared among different operations in a given query. For example, setting HASH_AREA_SIZE or SORT_AREA_SIZE to half or one third of this number is reasonable.

Set these parameters to the highest number that does not cause swapping. After setting these parameters as described, you should watch for swapping and free memory. If there is swapping, decrease the values for these parameters. If a significant amount of free memory remains, you may increase the values for these parameters.

The second approach to setting HASH_AREA_SIZE requires a thorough understanding of the types of hash joins you execute and an understanding of the amount of data you will be querying against. If the queries and query plans you execute are well understood, this approach is reasonable.

HASH_AREA_SIZE should be approximately half of the square root of S, where S is the size in megabytes of the smaller of the inputs to the join operation. In any case, the value for HASH_AREA_SIZE should not be less than 1MB.

This relationship can be expressed as follows:


For example, if S equals 16MB, a minimum appropriate value for HASH_AREA_SIZE might be 2MB summed over all parallel processes. Thus if you have 2 parallel processes, a minimum value for HASH_AREA_SIZE might be 1MB. A smaller hash area is not advisable.

For a large data warehouse, HASH_AREA_SIZE may range from 8MB to 32MB or more. This parameter provides for adequate memory for hash joins. Each process performing a parallel hash join uses an amount of memory equal to HASH_AREA_SIZE.

Hash join performance is more sensitive to HASH_AREA_SIZE than sort performance is to SORT_AREA_SIZE. As with SORT_AREA_SIZE, too large a hash area may cause the system to run out of memory.

The hash area does not cache blocks in the buffer cache; even low values of HASH_AREA_SIZE will not cause this to occur. Too small a setting, however, could adversely affect performance.

HASH_AREA_SIZE is relevant to parallel execution operations and to the query portion of DML or DDL statements.

SORT_AREA_SIZE

The recommended values for this parameter fall in the range from 256KB to 4MB.

This parameter specifies the amount of memory to allocate per query server process for sort operations. If you have a lot of system memory, you can benefit from setting SORT_AREA_SIZE to a large value. This can dramatically increase the performance of sort operations because the entire process is more likely to be performed in memory. However, if memory is a concern for your system, you may want to limit the amount of memory allocated for sort and hash operations.

If the sort area is too small, an excessive amount of I/O is required to merge a large number of sort runs. If the sort area size is smaller than the amount of data to sort, then the sort will move to disk, creating sort runs. These must then be merged again using the sort area. If the sort area size is very small, there will be many runs to merge and multiple passes may be necessary. The amount of I/O increases as SORT_AREA_SIZE decreases.

If the sort area is too large, the operating system paging rate will be excessive. The cumulative sort area adds up quickly because each query server process can allocate this amount of memory for each sort. For such situations, monitor the operating system paging rate to see if too much memory is being requested.

SORT_AREA_SIZE is relevant to parallel execution operations and to the query portion of DML or DDL statements. All CREATE INDEX statements must do some sorting to generate the index. Commands that require sorting include:

PARALLEL_EXECUTION_MESSAGE_SIZE

The recommended value for PARALLEL_EXECUTION_MESSAGE_SIZE is 4KB. If PARALLEL_AUTOMATIC_TUNING is TRUE, the default is 4KB. If PARALLEL_AUTOMATIC_TUNING is FALSE, the default is slightly greater than 2KB.

The PARALLEL_EXECUTION_MESSAGE_SIZE parameter specifies the upper limit for the size of parallel execution messages. The default value is operating system specific and this value should be adequate for most applications. Larger values for PARALLEL_EXECUTION_MESSAGE_SIZE require larger values for LARGE_POOL_SIZE or SHARED_POOL_SIZE, depending on whether you've enabled parallel automatic tuning.

While you may experience significantly improved response time by increasing the value for PARALLEL_EXECUTION_ MESSAGE_SIZE, memory use also drastically increases. For example, if you double the value for PARALLEL_EXECUTION_ MESSAGE_SIZE, parallel execution requires a message source pool that is twice as large.

Therefore, if you set PARALLEL_AUTOMATIC_TUNING to FALSE, then you must adjust the SHARED_POOL_SIZE to accommodate parallel execution messages. If you have set PARALLEL_AUTOMATIC_TUNING to TRUE, but have set LARGE_POOL_SIZE manually, then you must adjust the LARGE_POOL_SIZE to accommodate parallel execution messages.

OPTIMIZER_PERCENT_PARALLEL

The recommended value is 100/number_of_concurrent_users.

This parameter determines how aggressively the optimizer attempts to parallelize a given execution plan. OPTIMIZER_PERCENT_PARALLEL encourages the optimizer to use plans with low response times because of parallel execution, even if total resource used is not minimized.

The default value of OPTIMIZER_PERCENT_PARALLEL is 0 (zero), which, if possible, parallelizes the plan using the fewest resources. Here, the execution time of the operation may be long because only a small amount of resource is used.


Note:

Given an appropriate index, Oracle can quickly select a single record from a table; Oracle does not require parallelism to do this. A full scan to locate the single row can be executed in parallel. Normally, however, each parallel process examines many rows. In this case, the response time of a parallel plan will be longer and total system resource use will be much greater than if it were done by a serial plan using an index. With a parallel plan, the delay is shortened because more resources are used. The parallel plan could use up to n times more resources where n is equal to the value set for the degree of parallelism. A value between 0 and 100 sets an intermediate trade-off between throughput and response time. Low values favor indexes; high values favor table scans.  


A nonzero setting of OPTIMIZER_PERCENT_PARALLEL is overridden if you use a FIRST_ROWS hint or set OPTIMIZER_MODE to FIRST_ROWS.

PARALLEL_BROADCAST_ENABLE

The default value is FALSE.

Set this parameter to TRUE if you are joining a very large join result set with a very small result set (size being measured in bytes, rather than number of rows). In this case, the optimizer has the option of broadcasting the small set's rows to each of the query server processes that are processing the rows of the larger set. The result is enhanced performance.

You cannot dynamically set the parameter PARALLEL_BROADCAST_ENABLE as it only affects only hash joins and merge joins.

Parameters Affecting Resource Consumption for Parallel DML and Parallel DDL

The parameters that affect parallel DML and parallel DDL resource consumption are:

Parallel inserts, updates, and deletes require more resources than serial DML operations require. Likewise, PARALLEL CREATE TABLE ... AS SELECT and PARALLEL CREATE INDEX may require more resources. For this reason you may need to increase the value of several additional initialization parameters. These parameters do not affect resources for queries.

See Also:

Oracle8i SQL Reference for complete information about parameters.  

TRANSACTIONS

For parallel DML and DDL, each query server process starts a transaction. The parallel coordinator uses the two-phase commit protocol to commit transactions; therefore the number of transactions being processed increases by the DOP. You may thus need to increase the value of the TRANSACTIONS initialization parameter.

The TRANSACTIONS parameter specifies the maximum number of concurrent transactions. The default assumes no parallelism. For example, if you have a DOP of 20, you will have 20 more new server transactions (or 40, if you have two server sets) and 1 coordinator transaction; thus you should increase TRANSACTIONS by 21 (or 41), if they are running in the same instance. If you do not set this parameter, Oracle sets it to 1.1 x SESSIONS.

ROLLBACK_SEGMENTS

The increased number of transactions for parallel DML and DDL requires more rollback segments. For example, one command with a DOP of 5 uses 5 server transactions distributed among different rollback segments. The rollback segments should belong to tablespaces that have free space. The rollback segments should also be unlimited, or you should specify a high value for the MAXEXTENTS parameter of the STORAGE clause. In this way they can extend and not run out of space.

FAST_START_PARALLEL_ROLLBACK

If a system crashes when there are uncommitted parallel DML or DDL transactions, you can speed up transaction recovery during startup by using the FAST_START_PARALLEL_ROLLBACK parameter.

This parameter controls the DOP used when recovering "dead transactions." Dead transactions are transactions that are active before a system crash. By default, the DOP is chosen to be at most two times the value of the CPU_COUNT parameter.

If the default DOP is insufficient, set the parameter to the HIGH. This gives a maximum DOP to be at most 4 times the value of the CPU_COUNT parameter. This feature is available by default.

LOG_BUFFER

Check the statistic "redo buffer allocation retries" in the V$SYSSTAT view. If this value is high, try to increase the LOG_BUFFER size. A common LOG_BUFFER size for a system generating numerous logs is 3 to 5MB. If the number of retries is still high after increasing LOG_BUFFER size, a problem may exist with the disk on which the log files reside. In that case, tune the I/O subsystem to increase the I/O rates for redo. One way of doing this is to use fine-grained striping across multiple disks. For example, use a stripe size of 16KB. A simpler approach is to isolate redo logs on their own disk.

DML_LOCKS

This parameter specifies the maximum number of DML locks. Its value should equal the total of locks on all tables referenced by all users. A parallel DML operation's lock and enqueue resource requirement is very different from serial DML. Parallel DML holds many more locks, so you should increase the value of the ENQUEUE_RESOURCES and DML_LOCKS parameters by equal amounts.

Table 26-2 shows the types of locks acquired by coordinator and query server processes for different types of parallel DML statements. Using this information, you can determine the value required for these parameters. A query server process can work on one or more partitions or subpartitions, but a partition or subpartition can only be worked on by one server process (this is different from parallel execution).

Table 26-2 Locks Acquired by Parallel DML Statements
Type of statement  Coordinator process acquires:  Each parallel execution server acquires: 

Parallel UPDATE or DELETE into partitioned table; WHERE clause pruned to a subset of partitions/subpartitions  

1 table lock SX

1 partition lock X per pruned (sub)partition  

1 table lock SX

1 partition lock NULL per pruned (sub)partition owned by the query server process

1 partition-wait lock S per pruned (sub)partition owned by the query server process  

Parallel row-migrating UPDATE into partitioned table; WHERE clause pruned to a subset of (sub)partitions  

1 table lock SX  

1 table lock SX  

1 partition X lock per pruned (sub)partition  

1 partition lock NULL per pruned (sub)partition owned by the query server process

1 partition-wait lock S per pruned partition owned by the query server process  

1 partition lock SX for all other (sub)partitions  

1 partition lock SX for all other (sub)partitions  

Parallel UPDATE, DELETE, or INSERT into partitioned table  

1 table lock SX

Partition locks X for all (sub)partitions  

1 table lock SX

1 partition lock NULL per (sub)partition owned by the query server process

1 partition-wait lock S per (sub)partition owned by the query server process  

Parallel INSERT into nonpartitioned table  

1 table lock X  

None  


Note:

Table, partition, and partition-wait DML locks all appear as TM locks in the V$LOCK view.  


Consider a table with 600 partitions running with a DOP of 100. Assume all partitions are involved in a parallel UPDATE/DELETE statement with no row-migrations.

The coordinator acquires:  

1 table lock SX.  

 

600 partition locks X.  

Total server processes acquire:  

100 table locks SX.  

 

600 partition locks NULL.  

 

600 partition-wait locks S.  

ENQUEUE_RESOURCES

This parameter sets the number of resources that can be locked by the lock manager. Parallel DML operations require many more resources than serial DML. Therefore, increase the value of the ENQUEUE_RESOURCES and DML_LOCKS parameters by equal amounts.

See Also:

"DML_LOCKS".  

Parameters Related to I/O

The parameters that affect I/O are:

These parameters also affect the optimizer which ensures optimal performance for parallel execution I/O operations.

DB_BLOCK_BUFFERS

When you perform parallel updates and deletes, the buffer cache behavior is very similar to any system running a high volume of updates. For more information, see "Tuning the Buffer Cache".

DB_BLOCK_SIZE

The recommended value is 8KB or 16KB.

Set the database block size when you create the database. If you are creating a new database, use a large block size.

DB_FILE_MULTIBLOCK_READ_COUNT

The recommended value is 8 for 8KB block size, or 4 for 16KB block size.

This parameter determines how many database blocks are read with a single operating system READ call. The upper limit for this parameter is platform-dependent. If you set DB_FILE_MULTIBLOCK_READ_COUNT to an excessively high value, your operating system will lower the value to the highest allowable level when you start your database. In this case, each platform uses the highest value possible. Maximum values generally range from 64KB to 1MB.

HASH_MULTIBLOCK_IO_COUNT

The recommended value is 4.

This parameter specifies how many blocks a hash join reads and writes at once. Increasing the value of HASH_MULTIBLOCK_IO_COUNT decreases the number of hash buckets. If a system is I/O bound, you can increase the efficiency of I/O by having larger transfers per I/O.

Because memory for I/O buffers comes from the HASH_AREA_SIZE, larger I/O buffers mean fewer hash buckets. There is a trade-off, however. For large tables (hundreds of gigabytes in size) it is better to have more hash buckets and slightly less efficient I/Os. If you find an I/O bound condition on temporary space during hash joins, consider increasing the value of HASH_MULTIBLOCK_IO_COUNT.

SORT_MULTIBLOCK_READ_COUNT

The recommended value is to use the default value.

The SORT_MULTIBLOCK_READ_COUNT parameter specifies the number of database blocks to read each time a sort performs a read from a temporary segment. Temporary segments are used by a sort when the data does not fit in SORT_AREA_SIZE of memory.

If the system is performing too many I/Os per second during sort operations and the CPUs are relatively idle during that time, consider increasing the SORT_MUTLIBLOCK_READ_COUNT parameter to force the sort operations to perform fewer, larger I/Os.

See Also:

For more information, please see "Tuning Sorts".  

DISK_ASYNCH_IO and TAPE_ASYNCH_IO

The recommended value is TRUE.

These parameters enable or disable the operating system's asynchronous I/O facility. They allow query server processes to overlap I/O requests with processing when performing table scans. If the operating system supports asynchronous I/O, leave these parameters at the default value of TRUE.

Figure 26-1 Asynchronous Read


Asynchronous operations are currently supported for parallel table scans, hash joins, sorts, and serial table scans. However, this feature may require operating system specific configuration and may not be supported on all platforms. Check your Oracle operating system specific documentation.

Example Parameter Setting Scenarios for Parallel Execution

The following examples describe a limited variety of parallel execution implementation possibilities. Each example begins by using either automatic or manual parallel execution tuning. Oracle automatically sets other parameters based on each sample system's characteristics and on how parallel execution tuning was initialized. The examples then describe setting the degree of parallelism and the enabling of the adaptive multi-user feature.

The effects that the parameter settings in these examples have on internally-derived settings and overall performance are only approximations. Your system's performance characteristics will vary depending on operating system dependencies and user workloads.

With additional adjustments, you can fine tune these examples to make them more closely resemble your environment. To further analyze the consequences of setting PARALLEL_AUTOMATIC_TUNING to TRUE, refer to Table 26-1.

In your production environment, after you set the DOP for your tables and enable the adaptive multi-user feature, you may want to analyze system performance as explained in "Phase Four - Monitoring Parallel Execution Performance". If your system performance remains poor, refer to Phase One's explanation of "Step Three: Tuning General Parameters".

The following four examples describe different system types in ascending order of size and complexity.

Example One: Small Datamart

In this example, the DBA has limited parallel execution experience and does not have time to closely monitor the system.

The database is mostly a star type schema with some summary tables and a few tables in third normal form. The workload is mostly "ad hoc" in nature. Users expect parallel execution to improve the performance of their high-volume queries.

Other facts about the system are:

The DBA makes the following settings:

Oracle automatically makes the following default settings:

Parameter Settings for DOP and the Adaptive Multi-User Feature

The DBA parallelizes every table having more than 10,000 rows using a command similar to the following:

   ALTER TABLE employee PARALLEL; 

In this example, because PARALLEL_THREADS_PER_CPU is 2 and the number of CPUs is 4, the DOP is 8. Because PARALLEL_ADAPTIVE_MULTI_USER is set to TRUE, Oracle may reduce this DOP in response to the system load that exists at the time of the query's initiation.

Example Two: Medium-sized Data Warehouse

In this example the DBA is experienced but is also busy with other responsibilities. The DBA knows how to organize users into resource consumer groups and uses views and other roles to control access to parallelism. The DBA also has experimented with manually adjusting the settings that automated parallel tuning generates and has chosen to use all of them except for the PARALLEL_ADAPTIVE_MULTI_USER parameter which the DBA sets to FALSE.

The system workload involves some adhoc queries and a high volume of batch operations to convert a central repository into summary tables and star schemas. Most queries on this system are generated by Oracle Express and other tools.

The database has source tables in third normal form and end-user tables in a star schema and summary form only.

Other facts about the system are:

The DBA makes the following settings:

The DBA also sets other parameters unrelated to parallelism. As a result, Oracle responds by automatically adjusting the following parameter settings:

Parameter Settings for DOP and the Adaptive Multi-User Feature

The DBA parallelizes some tables in the data warehouse while creating other views for special users:

   ALTER TABLE sales PARALLEL; 
   CREATE VIEW invoice_parallel AS SELECT /*+ PARALLEL(P) */ * FROM invoices P;

The DBA allows the system to use the PARALLEL_THREADS_PER_CPU setting of 4 with 8 CPUs. The DOP for the tables is 32. This means a simple query uses 32 processes while more complex queries use 64.

Example Three: Large Data Warehouse

In this example, the DBA is experienced and is occupied primarily with managing this system. The DBA has good control over resources and understands how to tune the system. The DBA schedules large queries in batch mode.

The workload includes some adhoc parallel queries. As well, a large number of serial queries are processed against a star schema. There is also some batch processing that generates summary tables and indexes. The database is completely denormalized and the Oracle Parallel Server option is in use.

Other facts about the system are:

The DBA uses manual parallel tuning by setting the following:

The DBA also sets other parameters unrelated to parallel execution. Because PARALLEL_AUTOMATIC_TUNING is set to FALSE, Oracle allocates parallel execution buffers from the SHARED_POOL.

Parameter Settings for DOP and the Adaptive Multi-User Feature

The DBA parallelizes tables in the data warehouse by explicitly setting the DOP using syntax similar to the following:

   ALTER TABLE department1 PARALLEL 10; 
   ALTER TABLE department2 PARALLEL 5; 
   CREATE VIEW current_sales AS SELECT /*+ PARALLEL(P, 20) */ * FROM sales P; 

In this example, Oracle does not make calculations for parallel execution because the DBA has manually set all parallel execution parameters.

EXAMPLE Four: Very Large Data Warehouse

In this example, the DBA is very experienced and is dedicated to administering this system. The DBA has good control over the environment, but the variety of users requires the DBA to devote constant attention to the system.

The DBA sets PARALLEL_AUTOMATIC_TUNING to TRUE which makes Oracle allocate parallel execution buffers from the large pool. PARALLEL_ADAPTIVE_MULTI_USER is automatically enabled. After gaining experience with the system, the DBA fine-tunes the system supplied defaults to further improve performance.

The database is a very large data warehouse with data marts residing on the same machine. The data marts are generated and refreshed from data in the warehouse. The warehouse is mostly normalized while the marts are mostly star schemas and summary tables. The DBA has carefully customized system parameters through experimentation.

Other facts about the system are:

The DBA makes the following settings:

Parameter Settings for DOP and the Adaptive Multi-User Feature

The DBA has carefully evaluated which users and tables require parallelism and has set the values according to their requirements. The DBA has taken all steps mentioned in the earlier examples, but in addition, the DBA also uses the following command during peak user hours to enable the adaptive DOP algorithms:

   ALTER SYSTEM SET PARALLEL_ADAPTIVE_MULTI_USER = TRUE; 

During off hours when batch processing is about to begin, the DBA disables adaptive processing by issuing the command:

   ALTER SYSTEM SET PARALLEL_ADAPTIVE_MULTI_USER = FALSE;

Phase Two - Tuning Physical Database Layouts for Parallel Execution

This section describes how to tune the physical database layout for optimal performance of parallel execution.

Types of Parallelism

Different parallel operations use different types of parallelism. The optimal physical database layout depends on what parallel operations are most prevalent in your application.

The basic unit of parallelism is a called a granule. The operation being parallelized (a table scan, table update, or index creation, for example) is divided by Oracle into granules. Query server processes execute the operation one granule at a time. The number of granules and their size affect the DOP (degree of parallelism) you can use. It also affects how well the work is balanced across query server processes.

Block Range Granules

Block range granules are the basic unit of most parallel operations. This is true even on partitioned tables; it is the reason why, on Oracle, the parallel degree is not related to the number of partitions.

Block range granules are ranges of physical blocks from a table. Because they are based on physical data addresses, Oracle can size block range granules to allow better load balancing. Block range granules permit dynamic parallelism that does not depend on static pre-allocation of tables or indexes. On SMP (symmetric multi-processors) systems, granules are located on different devices to drive as many disks as possible. On many MPP (massively parallel processing) systems, block range granules are preferentially assigned to query server processes that have physical proximity to the disks storing the granules. Block range granules are also used with global striping.

When block range granules are used predominantly for parallel access to a table or index, there are administrative considerations such as recovery or using partitions for deleting portions of data that may influence partition layout more than performance considerations. The number of disks that you stripe partitions over should be at least equal to the value of the DOP so that parallel execution operations can take advantage of partition pruning.

See Also:

For MPP systems, see your platform-specific documentation.  

Partition Granules

When partition granules are used, a query server process works on an entire partition or subpartition of a table or index. Because partition granules are statically determined when a table or index is created, partition granules do not allow as much flexibility in parallelizing an operation. This means that the allowable DOP might be limited, and that load might not be well balanced across query server processes.

Partition granules are the basic unit of parallel index range scans and parallel operations that modify multiple partitions of a partitioned table or index. These operations include parallel update, parallel delete, parallel direct-load insert into partitioned tables, parallel creation of partitioned indexes, and parallel creation of partitioned tables. Operations such as parallel DML and CREATE LOCAL INDEX, do not recognize block range granules.

When partition granules are used for parallel access to a table or index, it is important that there be a relatively large number of partitions (at least three times the DOP), so Oracle can effectively balance work across the query server processes.

See Also:

Oracle8i Concepts for information on disk striping and partitioning.  

Striping Data

To avoid I/O bottlenecks during parallel processing, tablespaces accessed by parallel operations should be striped. As shown in Figure 26-2, tablespaces should always stripe over at least as many devices as CPUs; in this example, there are four CPUs. As was mentioned for partitioned granules, the number of disks over which you stripe these tablespaces should be at least equal to the value set for DOP.

Stripe tablespaces for tables, tablespaces for indexes, rollback segments, and temporary tablespaces. You must also spread the devices over controllers, I/O channels, and/or internal busses.

Figure 26-2 Striping Objects Over at Least as Many Devices as CPUs


To stripe data during loads, use the FILE= clause of parallel loader to load data from multiple load sessions into different files in the tablespace. To make striping effective, ensure that enough controllers and other I/O components are available to support the bandwidth of parallel data movement into and out of the striped tablespaces.

The operating system or volume manager can perform striping (OS striping), or you can perform striping manually for parallel operations.

We recommend using a large stripe size of at least 64KB with OS striping when possible. This approach always performs better than manual striping, especially in multi-user environments.

Operating System Striping

Operating system striping is usually flexible and easy to manage. It supports multiple users running sequentially as well as single users running in parallel. Two main advantages make OS striping preferable to manual striping, unless the system is very small or availability is the main concern:

Stripe size must be at least as large as the I/O size. If stripe size is larger than I/O size by a factor of 2 or 4, then certain tradeoffs may arise. The large stripe size can be beneficial because it allows the system to perform more sequential operations on each disk; it decreases the number of seeks on disk. The disadvantage is that it reduces the I/O parallelism so fewer disks are simultaneously active. If you encounter problems, increase the I/O size of scan operations (going, for example, from 64KB to 128KB), instead of changing the stripe size. The maximum I/O size is platform-specific (in a range, for example, of 64KB to 1MB).

With OS striping, from a performance standpoint, the best layout is to stripe data, indexes, and temporary tablespaces across all the disks of your platform. In this way, maximum I/O performance (both in terms of throughput and number of I/Os per second) can be reached when one object is accessed by a parallel operation. If multiple objects are accessed at the same time (as in a multi-user configuration), striping automatically limits the contention. If availability is a major concern, associate this method with hardware redundancy, for example RAID5, which permits both performance and availability.

Manual Striping

You can use manual striping on all platforms. To do this, add multiple files to each tablespace, each on a separate disk. If you use manual striping correctly, your system will experience significant performance gains. However, you should be aware of several drawbacks that may adversely affect performance if you do not stripe correctly.

First, when using manual striping, the DOP is more a function of the number of disks than of the number of CPUs. This is because it is necessary to have one server process per datafile to drive all the disks and limit the risk of experiencing I/O bottlenecks. Also, manual striping is very sensitive to datafile size skew which can affect the scalability of parallel scan operations. Second, manual striping requires more planning and set up effort that operating system striping.

See Also:

Oracle8i Concepts for information on disk striping and partitioning. For MPP systems, see your platform-specific Oracle documentation regarding the advisability of disabling disk affinity when using operating system striping.  

Local and Global Striping

Local striping, which applies only to partitioned tables and indexes, is a form of non-overlapping disk-to-partition striping. Each partition has its own set of disks and files, as illustrated in Table 26-3. There is no overlapping disk access, and no overlapping of files.

An advantage of local striping is that if one disk fails, it does not affect other partitions. Moreover, you still have some striping even if you have data in only one partition.

A disadvantage of local striping is that you need many more disks to implement it--each partition requires a few disks of its own. Another major disadvantage is that after partition pruning to only a single or a few partitions, the system will have limited I/O bandwidth. As a result, local striping is not very practical for parallel operations. For this reason, consider local striping only if your main concern is availability, and not parallel execution. A good compromise might be to use global striping associated with RAID5, which permits both performance and availability.

Figure 26-3 Local Striping


Global striping, illustrated in Figure 26-4, entails overlapping disks and partitions.

Figure 26-4 Global Striping


Global striping is advantageous if you have partition pruning and need to access data only in one partition. Spreading the data in that partition across many disks improves performance for parallel execution operations. A disadvantage of global striping is that if one disk fails, all partitions are affected.

See Also:

"Striping and Media Recovery".  

Analyzing Striping

There are two considerations when analyzing striping issues for your applications. First, consider the cardinality of the relationships among the objects in a storage system. Second, consider what you can optimize in your striping effort: full table scans, general tablespace availability, partition scans, or some combinations of these goals. These two topics are discussed under the following headings.

Cardinality of Storage Object Relationships

To analyze striping, consider the following relationships:

Figure 26-5 Cardinality of Relationships


Figure 26-5 shows the cardinality of the relationships among objects in a typical Oracle storage system. For every table there may be:

Goals. You may wish to stripe an object across devices to achieve one of three goals: