4
How Oracle Implements Parallel Processing

This chapter gives a high-level view of how the Oracle Parallel Server (OPS) provides high performance parallel processing. Key issues include:

Enabling and Disabling Parallel Server

OPS can be enabled or disabled:

Oracle + Option

Parallel Server
Disabled

Parallel Server Enabled

Single Node

Multiple Nodes

OPS not installed

Yes: default

No

No

OPS installed

Yes: default

Yes: Single Shared

Yes: Multiple Shared

Oracle + Option	Parallel Server Disabled	Parallel Server Enabled
OPS not installed	Yes: default	No	No
OPS installed	Yes: default	Yes: Single Shared	Yes: Multiple Shared

When parallel server is disabled, only one Oracle instance can mount or open the database. This mode is necessary to create and completely recover a database. It is useful to install OPS and disable it if standard Oracle functionality meets your needs. You can later enable OPS.

When OPS is enabled, one or more instances of a parallel server mount the same database. All instances mount the database and read from and write to the same datafiles. Single shared mode describes an OPS configuration with only one instance. Global operations exist, but are not needed at the moment. The instance operates as though it is in a cluster with Integrated Distributed Lock Manager (IDLM) overhead, and so on, although there is no contention for resources. Multiple shared mode describes an OPS configuration with multiple instances running.

Figure 4-1 illustrates a typical OPS configuration with three instances on separate nodes accessing the same database.

Figure 4-1 Shared Mode Sharing Disks

Note:
Each instance can access the redo log files of the other instances.

See Also:
"Enabling Parallel Server and Starting Instances".

Synchronization

Inter-node synchronization is an issue that does not need to be addressed in standard Oracle. But with OPS you must have a broad understanding of the dimensions in which synchronization must occur. Some of these include:

In OPS exclusive mode, all synchronization is done within the instance. In shared mode, synchronization is accomplished with the help of the IDLM component.

Block Level Locking

Block access between instances is done on a per-block level. When an instance locks a block in exclusive mode, other instances cannot access the block. Every time Oracle tries to read a block from the database it needs to obtain an instance lock. Ownership of the lock is thus assigned to the instance.

Since OPS runs in environments with multiple memories, there can be multiple copies of the same data block in each instance's memory. Internode synchronization using the IDLM ensures the validity of all copies of the block: these block-level locks are the buffer cache locks.

Block level locking occurs only when OPS is enabled. It is transparent to the user and to the application. (Row level locking also operates, whether OPS is enabled or disabled.)

Row Level Locking

OPS, as well as single-instance Oracle, provides row level locking in addition to block level locking in the buffer cache. In fact, row level locks are stored within the block.

Consider the following example: Instance 1 reads file 2, block 10 to update row 1. Instance 2 also reads file 2, block 10, to update row 2. Here, instance 1 obtains an instance lock on block 10, then locks and updates row 1. (The row lock is implicit because of the UPDATE statement.)

Instance 2 then forces instance 1 to write the updated block to disk, and instance 1 relinquishes ownership of the lock on block 10 so instance 2 can assume ownership of it. Instance 2 then locks row 2 and performs an UPDATE.

Space Management

Free lists and free list groups optimize space management in OPS.

The problem of allocating space for inserts illustrates space management issues. When a table uses more space, how can you ensure no one else uses the same space? How can you ensure two nodes are not inserting into the same space on the same disk in the same file?

Consider the following example: Instance 1 reads file 2, block 10 to insert a row. Instance 2 reads file 3, block 20, to insert another row. Each instance proceeds to insert rows. If one block were responsible for assigning space for all these inserts, that block would constantly ping between the instances. Instance 1 would lose block ownership when instance 2 needs to insert, and so on. The situation involves a great deal of contention and performance suffers.

By contrast, free list groups make good space management possible. If two instances insert into the same object (such as a table), but each instance has its own set of free lists for that object, then contention for a single block is avoided. Each instance inserts into a different block belonging to the object.

System Change Number

In standard Oracle, the system change number (SCN) is maintained and incremented in the SGA by an exclusive mode instance. In OPS shared mode, the SCN must be maintained globally. Its implementation may vary from platform to platform. The SCN may be handled by the IDLM, by the Lamport SCN scheme, or by using a hardware clock or dedicated SCN server.

High Performance Features

OPS takes advantage of systems of linked processors sharing resources without sacrificing transaction processing features. The following sections discuss in more detail certain features that optimize performance on OPS.

Within a single instance, Oracle uses a buffer cache in memory to reduce the amount of disk I/O necessary for database operations. Since each node in the parallel server has its own memory that is not shared with other nodes, OPS must coordinate the buffer caches of different nodes while minimizing additional disk I/O that could reduce performance. The Oracle parallel cache management technology maintains the high-performance features of Oracle while coordinating multiple buffer caches.

Fast Commits, Group Commits, and Deferred Writes

Fast commits, group commits, and deferred writes operate on a per-instance basis in Oracle and work the same whether in exclusive or shared mode.

Oracle only reads data blocks from disk if they are not already in the buffer cache of the instance requesting the data. Because data block writes are deferred, they often contain modifications from multiple transactions.

Optimally, Oracle writes modified data blocks to disk only when necessary:

When the blocks have not been used recently and new data requires buffer cache space (in shared or exclusive mode)
During checkpoints (shared or exclusive mode)
When another instance needs the blocks (only in shared mode)

Oracle may also perform unnecessary writes to disk caused by forced reads or forced writes.

See Also:
"Detecting False Pinging".

Row Locking and Multiversion Read Consistency

The Oracle row locking feature allows multiple transactions on separate nodes to lock and update different rows of the same data block, without any of the transactions waiting for the others to commit. If a row has been modified but not yet committed, the original row values are available to all instances for read access. This is called multiversion read consistency.

Online Backup and Archiving

OPS supports all Oracle backup features in exclusive mode, including both online and offline backups of either an entire database or individual tablespaces.

If you operate Oracle in ARCHIVELOG mode, online redo log files are archived before they can be overwritten. In OPS, each instance can automatically archive its own redo log files or one or more instances can manually archive the redo log files for all instances.

In ARCHIVELOG mode, you can make both online and offline backups. If you operate Oracle in NOARCHIVELOG mode, you can only make offline backups. We strongly recommend operating production databases in ARCHIVELOG mode.

Cache Fusion

Cache Fusion improves inter-instance communication and reduces pinging for reader/writer cache coherency conflicts. When an instance requests a block for updating and another instance holds the block, Cache Fusion prepares a consistent read copy of the block and sends it directly to the requesting instance without pinging. Cache Fusion does this by copying blocks directly from the holding instance's memory cache to the requesting instance's memory cache.

Sequence Number Generators

OPS allows users on multiple instances to generate unique sequence numbers with minimal cooperation and contention among instances.

The sequence number generator allows multiple instances to access and increment a sequence without contention among instances for sequence numbers and without waiting for transactions to commit. Each instance can have its own sequence cache for faster access to sequence numbers. IDLM locks coordinate sequences across instances in OPS.

Lamport SCN Generation

The System Change Number (SCN) is a logical timestamp Oracle uses to order events within a single instance and across all instances. For example, Oracle assigns an SCN to each transaction. Conceptually, there is a global serial point that generates SCNs. In practice, however, SCNs can be read and generated in parallel. One of the SCN generation schemes is called the Lamport SCN generation scheme.

The Lamport SCN generation scheme is fast and scalable because it generates SCNs in parallel on all instances. In this scheme, all messages across instances, including lock messages, piggyback SCNs. Piggybacked SCNs propagate causalities within Oracle. As long as causalities are respected in this way, multiple instances can generate SCNs in parallel, with no need for extra communication among these instances.

On most platforms, Oracle uses the Lamport SCN generation scheme when the MAX_COMMIT_PROPAGATION_DELAY is larger than a platform-specific threshold. This is generally the default. This value is typically set to 7 seconds. You can examine the alert log after instance startup to see whether the Lamport SCN generation scheme is in use.

Free Lists

Standard Oracle can use multiple free lists as a way to reduce contention on blocks. A free list is a list of data blocks, located in extents, that have free space. These blocks with free space are used when inserts or updates are made to a database object such as a table or a cluster. No contention among instances occurs when different instances' transactions insert data into the same table. This is achieved by locating free space for the new rows using free space lists that are associated with one or more instances. The free list may be from a common pool of blocks, or multiple free lists may be partitioned so specific extents in files are allocated to objects.

With a single free list when multiple inserts are taking place, single threading occurs as these processes attempt to allocate space from the free list. The advantage of using multiple free lists is that it allows processes to search a specific pool of blocks when space is needed. This reduces contention among users for free space.

Free List Groups

OPS can use free list groups to eliminate contention among instances for access to a single block containing free lists. By default, only one free list group is available. This means all free lists for an object reside in the segment header block.

Therefore, if multiple free lists reside in a single block in an OPS environment the block with the free lists could have pinging, or forced reads/writes among all the instances. To avoid this problem, free lists can be grouped, with one group assigned to each instance. Each instance then has its own block containing free lists. Since each instance uses its own free lists, there is no contention among instances to access the same block containing free lists.

Disk Affinity

Disk affinity determines the instance that will perform parallelized DML or query operations. Affinity is especially important for parallel DML in OPS configurations. Affinity information that is consistent across statements improves buffer cache hit ratios and reduces forced reads/writes.

The granularity of parallelism for most PDML operations is by partition. For parallel query, however, granularity is by rowid. Parallel DML operations need a partition-to-instance mapping to implement affinity. The segment header of the partition is used to determine the affinity of the partition for MPP systems. You can achieve improved performance by having nodes access local devices. This provides a better buffer cache hit ratio for every node.

For other OPS configurations, a deterministic mapping of partitions to instances is used. Partition-to-instance affinity information is used to determine process allocation and work assignments for all OPS/MPP configurations.

Job and Instance Affinity

Use this feature to control which instances process which jobs. Using the package DBMS_JOB, you can distribute jobs across a cluster in a manner that makes the most sense given each job's functions. This improves load balancing and limits block contention since only the SNP processes of the selected instance can execute the job.

As an example, simultaneously using OPS and replication often results in pinging on the deferred transaction queue if all instances in a clustered environment propagate transactions from the deferred transaction queue. To limit activity against tables to only one instance, use DBMS_JOB to assign the work of processing jobs in the queue to a particular OPS instance.

Although the following examples use replication to illustrate job affinity, you can use this feature for other scenarios.

Using the DBMS_JOB Package

For this example, a constant in DBMS_JOB indicates "no mapping" among jobs and instances, that is, jobs can be executed by any instance.

DBMS_JOB.SUBMIT

To submit a job to the job queue, use the following syntax:

   DBMS_JOB.SUBMIT( JOB OUT BINARY_INTEGER,
   WHAT IN VARCHAR2, NEXT_DATE IN DATE DEFAULTSYSDATE, 
   INTERVAL IN VARCHAR2 DEFAULT 'NULL',
   NO_PARSE IN BOOLEAN DEFAULT FALSE,
   INSTANCE IN BINARY_INTEGER DEFAULT ANY_INSTANCE,
   FORCE IN BOOLEAN DEFAULT FALSE)

Use the parameters INSTANCE and FORCE to control job and instance affinity. The default value of INSTANCE is 0 (zero) to indicate that any instance can execute the job. To run the job on a certain instance, specify the INSTANCE value. Oracle displays error ORA-23319 if the INSTANCE value is a negative number or a NULL.

The FORCE parameter defaults to FALSE. If force is TRUE, any positive integer is acceptable as the job instance. If FORCE is FALSE, the specified instance must be running, or Oracle displays error number ORA-23428.

DBMS_JOB.INSTANCE

To assign a particular instance to execute a job, use the following syntax:

   DBMS_JOB.INSTANCE(  JOB IN BINARY_INTEGER,
   INSTANCE IN BINARY_INTEGER, 
   FORCE IN BOOLEAN DEFAULT FALSE)

The FORCE parameter in this example defaults to FALSE. If the instance value is 0 (zero), job affinity is altered and any available instance can execute the job despite the value of force. If the INSTANCE value is positive and the FORCE parameter is FALSE, job affinity is altered only if the specified instance is running, or Oracle displays error ORA-23428.

If the FORCE parameter is TRUE, any positive integer is acceptable as the job instance and the job affinity is altered. Oracle displays error ORA-23319 if the INSTANCE value is negative or NULL.

DBMS_JOB.CHANGE

To alter user-definable parameters associated with a job, use the following syntax:

   DBMS_JOB.CHANGE(  JOB IN BINARY_INTEGER,
   WHAT IN VARCHAR2 DEFAULT NULL,
   NEXT_DATE IN DATE DEFAULT NULL,
   INTERVAL IN VARCHAR2 DEFAULT NULL,
   INSTANCE IN BINARY_INTEGER DEFAULT NULL,
    FORCE IN BOOLEAN DEFAULT FALSE )

Two parameters, INSTANCE and FORCE, appear in this example. The default value of INSTANCE is NULL indicating that job affinity will not change.

The default value of FORCE is FALSE. Oracle displays error ORA-23428 if the specified instance is not running and error ORA-23319 if the INSTANCE number is negative.

DBMS_JOB.RUN

The FORCE parameter for DBMS_JOB.RUN defaults to FALSE. If force is TRUE, instance affinity is irrelevant for running jobs in the foreground process. If force is FALSE, the job can run in the foreground only in the specified instance. Oracle displays error ORA-23428 if force is FALSE and the connected instance is the incorrect instance.

   DBMS_JOB.RUN( JOB IN BINARY_INTEGER,
   FORCE IN BOOLEAN DEFAULT FALSE)

Transparent Application Failover

Application failover enables an application to automatically reconnect to a database if the connection is broken. Active transactions roll back, but the new database connection is identical to the original one. This is true regardless of how the connection was lost.

With transparent application failover, a client sees no loss of connection as long as there is one instance left serving the application. The DBA controls which applications run on which instances and also creates a failover order for each application.

Cache Coherency

Cache coherency is the technique of keeping multiple copies of an object consistent. This section describes:

Parallel Cache Management Issues

With OPS, separate Oracle instances run simultaneously on one or more nodes using a technology called parallel cache management (PCM).

PCM uses IDLM locks (IDLM) to coordinate access to resources required by the instances of OPS. Rollback segments, dictionary entries, and data blocks are examples of database resources. The most often required database resources are data blocks.

Cache coherency is provided by the Parallel Cache Manager for the buffer caches of instances located on separate nodes. The set of global constant (GC_*) initialization parameters associated with PCM buffer cache locks are not the same locks as those used with the dictionary cache, library cache, and so on.

PCM ensures that a master copy data block in an SGA has identical copies in other SGAs requiring a copy of the master. Thus, the most recent copy of the block in all SGAs contains all changes made to that block by all instances, regardless of whether any transactions on those instances have committed.

If a data block is modified in one buffer cache, then all existing copies in other buffer caches are no longer current. New copies can be obtained after the modification operation completes.

PCM enforces cache coherency while minimizing I/O and use of the IDLM. I/O and lock operations for cache coherency are only done when the current version of a data block is in one instance's buffer cache and another instance requests that block for update.

Multiple transactions running on a single OPS instance can share access to a set of data blocks for reading purposes without additional instance lock operations. In this case, there is no contention or conflict. This remains true as long as the blocks are not needed for writing by transactions running on other instances.

In shared mode, the IDLM maintains instance lock status. In exclusive mode, all locks are local and the IDLM does not coordinate database resources.

Instances use instance locks to indicate ownership of a resource master copy. When an instance becomes a database resource master or "owner", it also inherently becomes owner of the instance lock covering the resource, with fixed locking. However, releasable locks are, of course, released.

A master copy indicates it is an updatable copy of the resource. The instance only gives up the instance lock when another instance requests the resource for update. Once another instance owns the master copy of the resource, it becomes the owner of the instance lock.

Note:
Transactions and parallel cache management are autonomous mechanisms in Oracle. PCM locks function independently of any form of transaction lock.

Example

Consider the following example and the illustrations in Figure 4-2. This example assumes one PCM lock covers one block, although many blocks could be covered.

Instance X becomes the owner of a PCM lock covering data block n containing row 1 and updates the row
Instance Y requests the block to update row 4
Instance X writes the data block to disk and releases the PCM lock
Instance Y becomes the owner of the block and the PCM lock and then updates row 4
Instance X requests the block to update row 7
Instance Y writes the data block to disk and releases the block and the PCM lock
Instance X becomes the owner of the block and PCM lock and updates row 7
Instance X commits its transaction and still owns the PCM lock and the master copy of the block until another instance requests the block

See Also:
"How Buffer State and Lock Mode Change".

Figure 4-2 Multiple Instances Updating the Same Data Block

PCM Lock and Row Lock Independence

PCM locks and row locks operate independently. An instance can disown a PCM lock without affecting row locks held in the set of blocks covered by the PCM lock. A row lock is acquired during a transaction. A database resource such as a data block acquires a PCM lock when it is read for update by an instance. During a transaction, a PCM lock can therefore be disowned and owned many times if the blocks are needed by other instances.

In contrast, transactions do not release row locks until changes to the rows are either committed or rolled back. Oracle uses internal mechanisms for concurrency control to isolate transactions so modifications to data made by one transaction are not visible to other transactions until the transaction modifying the data commits. The row lock concurrency control mechanisms are independent of parallel cache management: concurrency control does not require PCM locks, and PCM lock operations do not depend on individual transactions committing or rolling back.

Instance Lock Modes

An instance can acquire the instance lock that covers a set of data blocks in either shared or exclusive mode, depending on the access type required.

Exclusive lock mode allows the instance to update a set of blocks.

If one instance needs to update a data block and a second instance already owns the instance lock covering the block, the first instance uses the IDLM to request that the second instance disown the instance lock, writing the block(s) to disk if necessary.
Read lock mode only allows the instance to read blocks.

Multiple instances can own an instance lock in shared mode as long as they only intend to read, not modify, blocks covered by that instance lock. Thus, all instances can be sure that their memory-resident copies of the block are current, or that they can read a current copy from disk without any instance lock operations to request the block from another instance. This means instances do not have to disown instance locks for the portion of a database accessed for read-only use, which may be a substantial portion of the time in many applications.
Null lock mode allows instances to keep a lock without any permissions on the block(s).

This mode is used so that locks need not be continually obtained and released. Locks are simply converted from one mode to another.

See Also:
Chapter 15, "Allocating PCM Instance Locks", for a detailed description of allocating PCM locks for datafiles.

Non-PCM Cache Management Issues

OPS ensures that all standard Oracle caches are synchronized across instances. Changing a block on one node, and its ramifications for the other nodes, is a familiar example of synchronization. However, synchronization has broader implications.

Understanding how OPS synchronizes caches across instances can help you understand the overhead affecting system performance. Consider a five-node parallel server where a user drops a table on one of the nodes. Since each of the five dictionary caches has a copy of the definition of the dropped table, the node dropping the table from its cache must also cause the other four dictionary caches to drop their copies of the dropped table. OPS handles this automatically through the IDLM. Users on the other nodes are notified of the change in lock status.

There are significant advantages to having each node store library and table information. Occasionally, the DROP TABLE command forces other caches to be flushed, but the brief effect this has on performance does not necessarily diminish the advantage of having multiple caches.

Oracle + Option	Parallel Server Disabled	Parallel Server Enabled
Oracle + Option	Parallel Server Disabled	Single Node	Multiple Nodes
OPS not installed	Yes: default	No	No
OPS installed	Yes: default	Yes: Single Shared	Yes: Multiple Shared

4How Oracle Implements Parallel Processing