Oracle8i Tuning
Release 8.1.5

A67775-01

Library

Product

Contents

Index

Prev Next

11
Optimizing Data Warehouse Applications

This chapter introduces integrated Oracle features for tuning enterprise-scale data warehouses. By intelligently tuning the system, the data layout, and the application, you can build a high performance, scalable data warehouse.

Topics in this chapter include:

Characteristics of Data Warehouse Applications

Data warehousing applications process a substantial amount of data by means of many CPU- and I/O-bound, data-intensive tasks such as:

The resources required to complete the tasks on many gigabytes of data distinguishes data warehousing applications from other types of data processing. The bulk and complexity of your data may clearly indicate that you need to deploy multiple CPUs, investigate parallel processing, or consider specific data processing features that are directly relevant to the tasks at hand.

For example, in a typical data warehousing application, data-intensive tasks might involve 100 or more gigabytes of data. At a processing speed of 2GB to 30GB of data per hour per CPU, a single CPU might need several hours to perform this size of a task.

With more than a single gigabyte of data, and certainly with upwards of 10G, you need to consider increasing the number of CPUs.

Similarly, if you need to copy 10 gigabytes of data, consider that using Export/Import might take a single CPU several hours. By contrast, using parallel CREATE TABLE . . . AS SELECT on 10 CPUs or more might take less than an hour.

Actual processing time depends on many factors, such as the complexity of the queries, the processing speed to which a particular hardware configuration can be tuned, and so on. Always run simple tests on your own system to find out its performance characteristics with regard to particular operations.

Building a Data Warehouse

This section briefly describes features useful for building a data warehouse. It includes:

Materialized Views and Dimensions

Materialized views are stored summaries of queries containing precomputed results. Materialized views greatly improve data warehouse query processing. Dimensions describe the relationships among data and are required for performing more complex types of query rewrites.

See Also:

For more information about materialized views, please see Section VI, "Materialized Views", Oracle8i Concepts, the Oracle8i Administrator's Guide, and the Oracle8i Supplied Packages Reference.  

Parallel CREATE TABLE . . . AS SELECT

The ability to perform CREATE TABLE . . . AS SELECT operations in parallel enables you to reorganize extremely large tables efficiently. You might find it prohibitive to take a serial approach to reorganizing or reclaiming space in a table containing tens or thousands of gigabytes of data. Using Export/Import to perform such a task might result in an unacceptable amount of downtime. If you have a lot of temporary space available, you can use CREATE TABLE . . . AS SELECT to perform such tasks in parallel. With this approach, redefining integrity constraints is optional. This feature is often used for creating summary tables, which are precomputed results stored in the data warehouse.

See Also:

"Creating and Populating Tables in Parallel" and Oracle8i Concepts.  

Parallel Index Creation

The ability to create indexes in parallel benefits both data warehousing and OLTP applications. On extremely large tables, rebuilding an index may take a long time. Periodically DBAs may load a large amount of data and rebuild the index. With the ability to create indexes in parallel, you may be able to drop an index before loading new data, and re-create it afterwards.

See Also:

"Creating Indexes in Parallel".  

Fast Full Index Scan

FAST FULL SCAN on the index is a faster alternative to a full table scan when an existing index already contains all the columns that are needed for the query. It can use multiblock I/O and can be parallelized just like a table scan. The hint INDEX_FFS enforces fast full index scan.

See Also:

"Using Fast Full Index Scans" and "INDEX_FFS".  

Partitioned Tables

You can avoid downtime with very large or mission-critical tables by using partitions. You can divide a large table into multiple physical tables using partitioning criteria. In a data warehouse you can manage historical data by partitioning by date. You can then perform on a partition level all of the operations you might normally perform on the table level, such as backup and restore. You can add space for new data by adding a new partition, and delete old data by dropping an existing partition. Queries that use a key range to select from a partitioned table retrieve only the partitions that fall within that range. In this way partitions offer significant improvements in availability, administration and table scan performance.


Note:

For performance reasons, in Oracle partitioned tables should be used rather than partition views. Please see Oracle8i Migration for instructions on migrating from partition views to partitioned tables.  


See Also:

"Partitioning Data" and Oracle8i Concepts for information about partitioned tables.  

ANALYZE Statement

You can use the ANALYZE statement to analyze the storage characteristics of tables, indexes, and clusters to gather statistics which are then stored in the data dictionary. The optimizer uses these statistics in a cost-based approach to determine the most efficient execution plan for the SQL statements you issue. Statistics can be either computed or estimated, depending on the amount of overhead you are willing to allow for this purpose.

See Also:

The Oracle8i Administrator's Guide.  

Parallel Load

When very large amounts of data must be loaded, the destination table may be unavailable for an unacceptable amount of time. The ability to load data in parallel can dramatically slash the amount of downtime necessary.

See Also:

Chapter 26, "Tuning Parallel Execution", especially "Phase Three - Creating, Populating, and Refreshing the Database" and Oracle8i Utilities for a description of SQL*Loader conventional and direct path loads.  

Constraints

Validated constraints degrade performance for DML statements, loads, and index maintenance. However, some query optimizations depend on the presence of validated constraints.

The fastest way to move a set of constraints from the DISABLE NOVALIDATED state to ENABLE VALIDATED is to first ENABLE NOVALIDATE them all. Then validate them individually. Validation still requires a long time, but you can query and modify all tables as soon as the ENABLE NOVALIDATE is finished. A direct load automatically re-enables constraints this way.

Data warehouses sometimes benefit from the DISABLE VALIDATE state. This state allows the optimizer to recognize the validity of a unique or primary key, yet it does not require an index. Inserts, updates, and deletes are disallowed on keys in the DISABLE VALIDATE state.

See Also:

For more information, please refer to the chapter on "Managing Integrity" in the Oracle8i Administrator's Guide. The Oracle8i SQL Reference. also contains information about the DISABLE VALIDATE/NOVALIDATE keywords.  

Querying a Data Warehouse

This section summarizes Oracle features useful for querying a data warehouse. It includes:

Oracle Parallel Server Option

The Oracle Parallel Server option provides the following benefits important to both OLTP and data warehousing applications:

Oracle Parallel Server failover capability (the ability of the application to reconnect automatically if the connection to the database is broken) results in improved availability, a primary benefit for OLTP applications. Likewise, scalability in the number of users that can connect to the database is a major benefit in OLTP environments. OLTP performance on Oracle Parallel Server can scale as well, if an application's data is isolated onto a single server.

For data warehousing applications, scalability of performance is a primary benefit of Oracle Parallel Server. The architecture of Oracle Parallel Server allows parallel execution to perform excellent load balancing of work at runtime. If a node in an Oracle Parallel Server cluster or MPP is temporarily slowed down, work that was originally assigned to parallel execution servers on that node (but not yet commenced by those servers) may be performed by servers on other nodes, hence preventing that node from becoming a serious bottleneck. Even though Oracle Parallel Server is a cornerstone of parallel execution on clusters and MPPs, in a mostly query environment the overhead on the distributed lock manager is minimal.

See Also:

Please refer to the Oracle8i Parallel Server Concepts and Administration text.  

Parallel-Aware Optimizer

Knowledge about parallelism is incorporated into the Oracle cost-based optimizer. Parallel execution considerations are thus a fundamental component in arriving at query execution plans. In addition, you can control the trade-off of throughput for response time in plan selection.

The optimizer chooses intelligent defaults for the degree of parallelism based on available processors and the number of disk drives storing data the query will access. Access path choices (such as table scans vs. index access) take into account the degree of parallelism, resulting in plans that are optimized for parallel execution. Execution plans are more scalable, and there is improved correlation between optimizer cost and execution time for parallel execution.

The initialization parameter OPTIMIZER_PERCENT_PARALLEL defines the weighting that the optimizer uses to minimize response time in its cost functions.

See Also:

"OPTIMIZER_PERCENT_PARALLEL".  

Parallel Execution

Parallel execution enables multiple processes to simultaneously process a single query or DML statement. By dividing the task among multiple server processes, Oracle executes the operation more quickly than if only one server process were used.

Parallel execution dramatically improves performance for data-intensive data warehousing operations. It helps systems scale in performance when adding hardware resources. The greatest performance benefits are on SMP (Symmetric Multiprocessing), clustered, or MPP (Massively Parallel Platforms) where query processing can be effectively spread out among many CPUs on a single system.

See Also:

Chapter 26, "Tuning Parallel Execution", Chapter 27, "Understanding Parallel Execution Performance Issues", and Oracle8i Concepts for more details on parallel execution.  

Bitmap Indexes

Regular B*-tree indexes work best when each key or key range references only a few records, such as employee names. Bitmap indexes, by contrast, work best when each key references many records, such as employee gender.

Bitmap indexing provides the same functionality as regular indexes but uses a different internal representation that makes it very fast and space efficient. Bitmap indexing benefits data warehousing applications with large amounts of data queried on an ad hoc basis but with a low level of concurrent transactions.

Bitmap indexes reduce the response time for many types of ad hoc queries. They also offer considerably reduced space usage compared to other indexing techniques and dramatic performance gains even on very low-end hardware. You can create bitmap indexes in parallel; they are completely integrated with cost-based optimization.

Domain Indexes

You can create an application-specific index, known as a "domain index," by using the indextype schema object. Domain indexes are used for indexing data in application-specific domains. A domain index is an instance of an index that is created, managed, and accessed by routines supplied by an indextype. This is in contrast to B*-tree indexes which are maintained by the database and are referred to as "indexes".

See Also:

"Using Domain Indexes".  

Star Queries

One type of data warehouse design is known as a "star" schema. This typically consists of one or more very large "fact" tables and a number of much smaller "dimension" or reference tables. A star query is one that joins several of the dimension tables, usually by predicates in the query, to one of the fact tables.

Oracle cost-based optimization recognizes star queries and generates efficient execution plans for them; indeed, you must use cost-based optimization to get efficient star query execution. To enable cost-based optimization, ANALYZE your tables and make sure not to set the OPTIMIZER_MODE parameter to RULE.

See Also:

Oracle8i Concepts regarding optimization of star queries and "STAR" for information about the STAR hint.  

Star Transformation

Star transformation is a cost-based transformation designed to execute star queries efficiently. Whereas star optimization works well for schemas with a small number of dimensions and dense fact tables, star transformation works well for schemas with a large number of dimensions and sparse fact tables.

Star transformation is enabled by setting the initialization parameter STAR_TRANSFORMATION_ENABLED to TRUE. You can use the STAR_TRANSFORMATION hint to make the optimizer use the best plan in which the transformation has been used.

See Also:

For more information, the discussion under the heading "STAR_TRANSFORMATION" explains how to use this hint. Also refer to Oracle8i Concepts for a full discussion of star transformation. Oracle8i Reference describes the STAR_TRANSFORMATION_ENABLED initialization parameter.  

Query Rewrites

If you have defined materialized views, Oracle can transparently rewrite queries to use summary tables rather than detail tables.

Tuning Data Warehouse Applications

Tuning data warehouse applications involves both serial and parallel SQL statement tuning.

Shared SQL is not recommended with data warehousing applications. Use literal values in these SQL statements, rather than bind variables. If you use bind variables, the optimizer will make a blanket assumption about the selectivity of the column. If you specify a literal value, by contrast, the optimizer can use value histograms and so provide a better access plan.

See Also:

Chapter 10, "Managing SQL and Shared PL/SQL Areas".  

Backup and Recovery of the Data Warehouse

Recoverability has various levels: recovery from disk failure, human error, software failure, fire, and so on, requires different procedures. Oracle provides only part of the solution. Organizations must decide how much to spend on backup and recovery by considering the business cost of a long outage.

The NOLOGGING option enables you to perform certain operations without the overhead of generating a log. Even without logging, you can avoid disk failure if you use disk mirroring or RAID technology. If you load your warehouse from tapes every day or week, you might satisfactorily recover from all failures simply by saving copies of the tape in several remote locations and reloading from tape when something goes wrong.

At the other end of the spectrum, you could both mirror disks and take backups and archive logs, and maintain a remote standby system. The mirrored disks prevent loss of availability for disk failure, and also protect against total loss in the event of human error (such as dropping the wrong table) or software error (such as disk block corruption). In the event of fire, power failure, or other problems at the primary site, the backup site prevents long outages.

See Also:

For more information on recovery and the NOLOGGING option, see "[NO]LOGGING Option", the Oracle8i Administrator's Guide and Oracle8i SQL Reference.  

Tuning Fast-start Parallel Recovery

Fast-start Parallel Recovery improves recovery throughput and decreases the time required to apply rollback segments by using multiple, concurrent processes. The initialization parameter FAST_START_PARALLEL_ROLLBACK determines the upper limit for the degree of parallel recovery used.

Fast-start Parallel Recovery automatically starts enough recovery processes to have at least one process for each unrecovered rollback segment. When possible, this feature assigns multiple processes to roll back a single transaction. However, the number of active recovery processes never exceeds the upper limit set by the value for FAST_START_PARALLEL_ROLLBACK.

When to Use Fast-start Parallel Recovery

Typically, DSS environments are more likely to have large transactions than OLTP environments. Therefore, Fast-start Parallel Recovery is more often applicable to DSS systems.

For any instance, determine how aggressively to deploy Fast-start Parallel Recovery by monitoring your transaction recovery progress. Do this by examining the contents of two recovery-related V$ views as explained later in this section.

Determining Adequate Parallelism for Fast-start Parallel Recovery

The default setting of 20 for FAST_START_PARALLEL_ROLLBACK is adequate for most systems. However, you can more accurately tune this parameter using information in the V$FAST_START_SERVERS and V$FAST_START_TRANSACTIONS views as described under the following headings. Also consider your overall recovery processing goals.

V$FAST_START_SERVERS View

This view provides data about the progress of server processes performing Fast-start Parallel Recovery. Enable Fast-start Parallel Recovery by setting FAST_START_PARALLEL_ROLLBACK to a value of 1 or greater. When enabled, SMON recovers transactions serially and V$FAST_START_SERVERS has only one row.

When you enable Fast-start Parallel Recovery, the total number of rows in this view is one more than the number of active recovery processes. The additional process is SMON. The value for PARALLEL_TRANSACTION_RECOVERY_DEGREE limits the number of rows in this view. Each row in the view corresponds to a recovery process. The V$FAST_START_SERVERS columns and their definitions are:

State  

The value of state is either "IDLE" or "RECOVERING".  

Undoblocksdone  

The percentage of the assigned work completed by the server.  

PID  

The oracle PID.  

V$FAST_START_TRANSACTIONS View

This view provides information about recovery progress for each rollback transaction. This view has one row for each rollback transaction. The V$FAST_START_TRANSACTIONS columns and their definitions are:

Rollback segment number  

The undo segment number of the transaction.  

Slot  

The slot the transaction occupies within the rollback segment.  

Wrap  

The incarnation number of the slot.  

State  

The state of the transaction may be one of "TO BE RECOVERED", "RECOVERING", or "RECOVERED".  

Work_done  

The percentage of recovery completed on this transaction.  

Oracle PID  

The ID of the current server process that recovery of this transaction has been assigned to.  

Time elapsed  

The amount of time in seconds that has elapsed since recovery on the transaction began.  

Parent xid  

The transaction ID of the parent transaction. Valid only for PDML transactions.  

Oracle updates both tables continuously during Fast-start Parallel Recovery.

Determining a Degree of Parallelism for Fast-start Parallel Recovery

Tune the parameter FAST_START_PARALLEL_ROLLBACK to balance the performance of recovery and system resource use. Heavy Fast-start Parallel Recovery processing can degrade response times of unrelated database operations due to CPU and disk use. If you need to adjust the processing of Fast-start Parallel Recovery, change the value for FAST_START_PARALLEL_ROLLBACK at any time, even while the server is running. When you change this value during recovery, SMON restarts recovery with the new degree of parallelism.

Change the value of FAST_START_PARALLEL_ROLLBACK to HIGH when you have more recovery work to do. Set it to FALSE when you do not want parallel recovery to occupy the CPU. The CPU_COUNT parameter is correctly set because Oracle spawns 2 or 4 times the number of recovery servers.




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index