1
Spatial Concepts

Oracle8i Spatial is an integrated set of functions and procedures that enables spatial data to be stored, accessed, and analyzed quickly and efficiently in an Oracle8i database.

Spatial data represents the essential location characteristics of real or conceptual objects as those objects relate to the real or conceptual space in which they exist.

1.1 What Is the Spatial Product?

Oracle8i Spatial, often referred to as Spatial, provides a standard SQL schema and functions that facilitate the storage, retrieval, update, and query of collections of spatial features in an Oracle8i database. It consists of four components:

A schema that prescribes the storage, syntax, and semantics of supported geometric data types
A spatial indexing mechanism
A set of operators and functions for performing area-of-interest and spatial join queries
Administrative utilities

The spatial attribute of a spatial feature is the geometric description of its shape in some coordinate space. This is referred to as its geometry.

This release of Spatial supports two mechanisms for representing geometry. The first, an object-relational scheme, uses a table with single column of type MDSYS.SDO_GEOMETRY and a single row per geometry instance. The second, a relational scheme, uses a table with a predefined set of columns of type NUMBER and one or more rows for each geometry instance. These mechanisms roughly correspond to two alternatives described in the OpenGIS ODBC/SQL specification for geospatial features. The first corresponds to a "SQL with Geometry Types" implementation of spatial feature tables, and the second an implementation of spatial feature tables using numeric SQL types for geometry storage.

Implementation-specific details are described in Part I "Object-Relational Model" and Part II "Relational Model" of this guide. The remainder of this chapter describes Spatial concepts and features, without reference to their implementation wherever possible.

1.2 Introduction to Spatial Data

The Spatial option is designed to make spatial data management easier and more natural to users or applications such as a Geographic Information System (GIS). Once this data is stored in an Oracle database, it can be easily manipulated, retrieved, and related to all the other data stored in the database.

A common example of spatial data can be seen in a road map. A road map is a two-dimensional object that contains points, lines, and polygons that can represent cities, roads, and political boundaries such as states or provinces. A road map is a visualization of geographic information. The location of cities, roads, and political boundaries that exist on the surface of the Earth are projected onto a two-dimensional display or piece of paper, preserving the relative positions and relative distances of the rendered objects.

The data that indicates the Earth location (latitude and longitude, or height and depth) of these rendered objects is the spatial data. When the map is rendered, this spatial data is used to project the locations of the objects on a two-dimensional piece of paper. A GIS is often used to store, retrieve, and render this Earth-relative spatial data.

Other types of spatial data that can be stored using the Spatial option besides GIS data include data from computer-aided design (CAD) and computer-aided manufacturing (CAM) systems. Instead of operating on objects on a geographic scale, CAD/CAM systems work on a smaller scale such as for an automobile engine or printed circuit boards.

The differences among these three systems are only in the scale of the data, not its complexity. They might all actually involve the same number of data points. On a geographic scale, the location of a bridge can vary by a few tenths of an inch without causing any noticeable problems to the road builders. Whereas, if the diameter of an engine's pistons are off by a few tenths of an inch, the engine will not run. A printed circuit board is likely to have many thousands of objects etched on its surface that are no bigger than the smallest detail shown on a road builder's blueprints.

These applications all store, retrieve, update, or query some collection of features that have both nonspatial and spatial attributes. Examples of nonspatial attributes are name, soil_type, landuse_classification, and part_number. The spatial attribute is a coordinate geometry, or vector-based representation of the shape of the feature. The spatial attribute, referred to as the geometry, is an ordered sequence of vertices that are connected by straight line segments or circular arcs. The semantics of the geometry are determined by its type, which may be one of point, line string, or polygon.

1.3 Geometric Types for Relational and Object-Relational Models

The relational model of the Spatial option supports three geometric primitive types and geometries composed of collections of these types. The three primitive types are as follows:

2-D Point and Point Cluster
2-D Line Strings
2-D N-Point Polygons

2-D points are elements composed of two ordinates, X and Y, often corresponding to longitude and latitude. Line strings are composed of one or more pairs of points that define line segments. Polygons are composed of connected line strings that form a closed ring and the interior of the polygon is implied. Figure 1-1 illustrates the supported geometric primitive types.

Figure 1-1 Geometric Primitive Types

Self-crossing polygons are not supported although self-crossing line strings are. If a line string crosses itself, it does not become a polygon. A self-crossing line string does not have any implied interior.

The object-relational implementation supports the types listed in Figure 1-1, as well as the types shown in Figure 1-2.

The object-relational model adds the following types to those previously listed:

2-D Arc Line Strings (All arcs are generated as circular arcs.)
2-D Arc Polygons
2-D Compound Polygons
2-D Compound Line Strings
2-D Circles
2-D Optimized Rectangles

Figure 1-2 New Geometry Types Using the Object-Relational Model

1.4 Data Model

The Spatial data model is a hierarchical structure consisting of elements, geometries, and layers, which correspond to representations of spatial data. Layers are composed of geometries which in turn are made up of elements.

For example, a point might represent a building location, a line string might be a road or flight path, and a polygon could be a state, city, zoning district, or city block.

1.4.1 Element

An element is the basic building block of a geometry. The supported spatial element types are points, line strings, and polygons. For example, elements might model star constellations (point clusters), roads (line strings), and county boundaries (polygons). Each coordinate in an element is stored as an X,Y pair. The exterior ring and the interior ring of a polygon with holes are considered as two distinct elements that together make up a complex polygon.

Point data¹ consists of one coordinate. Line data consists of two coordinates representing a line segment of the element. Polygon data consists of coordinate pair values, one vertex pair for each line segment of the polygon. Coordinates are defined in either a clockwise or counter-clockwise order around the polygon.

1.4.2 Geometry

A geometry is the representation of a user's spatial feature, modeled as an ordered set of primitive elements. In the relational model, each geometry is required to be uniquely identified by a geometry identifier (GID) associating it with the other attributes of the feature. This is not required in the object-relational model.

A geometry can consist of a single element, which is an instance of one of the supported primitive types, or a homogeneous or heterogeneous collection of elements. A multipolygon, such as one used to represent a set of islands is a homogeneous collection. A heterogeneous collection is one in which the elements are of different types.

In the relational model, a complex geometry such as a polygon with holes would be stored as a sequence of polygon elements. All subelements of a multielement polygon are wholly contained within the outermost element. This is not required using the object-relational model.

An example of a geometry might describe the buildable land in a town. This could be represented as a polygon with holes where water or zoning prevents construction.

1.4.3 Layer

A layer is a heterogeneous collection of geometries having the same attribute set. For example, one layer in a GIS might include topographical features, while another describes population density, and a third describes the network of roads and bridges in the area (lines and points). Each layer's geometries and their associated spatial index are stored in the database in standard tables.

1.5 Query Model

Spatial uses a two-tier query model to resolve spatial queries and spatial joins. The term is used to indicate that two distinct operations are performed in order to resolve queries. The output of both operations yields the exact result set.

The two operations are referred to as primary and secondary filter operations.

The primary filter permits fast selection of candidate records to pass along to the secondary filter. The primary filter compares geometry approximations to reduce computation complexity and is considered a lower cost filter. Because the primary filter compares geometric approximations, it returns a superset of the result set.
The secondary filter applies exact computations to geometries that result from the primary filter. The secondary filter yields an accurate answer to a spatial query. The secondary filter operation is computationally expensive, but it is only applied to the primary filter results, not the entire data set.

Figure 1-3 illustrates the relationship between the primary and secondary filters.

Figure 1-3 Query Model

Spatial uses a linear quadtree-based spatial index to implement the primary filter. This is described in detail in following sections.

The function SDO_GEOM.RELATE() is used as a secondary filter. It evaluates the topological relationship-- such as whether two given geometries are touching, covering each other, or have any interaction.

Spatial does not require the use of both the primary and secondary filters. In some cases, just using the primary filter is sufficient. For example, a zoom feature in a mapping application queries for data that overlaps a rectangle representing visible boundaries. The primary filter very quickly returns a superset of the query. The mapping application can then apply clipping routines to display the target area.

The purpose of the primary filter is to quickly create a subset of the data and reduce the processing burden on the secondary filter. The primary filter therefore should be as efficient, that is selective yet fast, as possible. This is determined by the characteristics of the spatial index on the data.

1.6 Indexing Methods

The introduction of spatial indexing capabilities into the Oracle database engine is a key feature of the Spatial product. A spatial index, like any other index, provides a mechanism to limit searches within tables (or data spaces) based on spatial criteria such as intersection, and containment. A spatial index is required to:

Find objects within an indexed data space that overlap a given point or area-of-interest (window query)
Find pairs of objects from within two indexed data spaces that spatially interact with each other (spatial join)

A spatial index is considered a logical index. The entries in the spatial index are dependent on the location of the geometries in a coordinate space, but the index values are in a different domain. Index entries take on values from a linearly ordered integer domain while the coordinates for a geometry may be pairs of integer, floating-point, or double-precision numbers. Spatial uses a linear quadtree-based indexing scheme, also known as z-ordering, which works as described in the following paragraphs.

The coordinate space (for the layer where all geometric objects are located) is subjected to a process called tessellation, which defines exclusive and exhaustive cover tiles for every stored geometry. Tessellation is done by decomposing the coordinate space in a regular hierarchical manner. The range of coordinates, the coordinate space, is viewed as a rectangle. At the first level of decomposition, the rectangle is divided into halves along each coordinate dimension generating four tiles. Each tile that interacts with the geometry being tessellated is further decomposed into four tiles. This process continues until some termination criteria, such as size of the tiles or the maximum number of tiles to cover the geometry, is met.

Spatial can use either fixed-size or variable-sized tiles to cover a geometry. Fixed-size tiles are controlled by tile resolution. Variable-sized tiling is controlled by the value supplied for the maximum number of tiles. If the resolution is the sole controlling factor, then tessellation terminates when the coordinate space has been decomposed a specific number of times. Therefore, each tile is of a fixed size and shape. If the number of tiles per geometry, N, is the sole controlling factor, the tessellation terminates when N tiles have been used to cover the given geometry.

Fixed-size tile resolution and the number of variable-sized tiles used to cover a geometry are user-selectable parameters called SDO_LEVEL and SDO_NUMTILES respectively. Smaller fixed-size tiles or more variable-sized tiles provides better geometry approximations. The fewer the number of tiles or the larger the tiles, the coarser the approximations.

Spatial supports two valid combinations of SDO_LEVEL and SDO_NUMTILES. The first, with a non-null SDO_LEVEL and a null SDO_NUMTILES value, results in fixed-sized tiles (called fixed indexing in this guide.) The second, with a non-null SDO_LEVEL and a non-null SDO_NUMTILES, results in hybrid indexing. Hybrid indexing generates two sets of tiles per geometry. One set contains fixed-size tiles and the other set contains variable-sized tiles.

1.6.1 Tessellation of a Layer During Indexing

The process of determining which tiles cover a given geometry is called tessellation. The tessellation process is a quadtree decomposition, where the two-dimensional coordinate space is broken down into four equal-sized covering tiles. Successive tessellations divide those tiles that interact with the geometry down into smaller tiles, and this process continues until the desired level or number of tiles has been achieved. The results of the tessellation process on a geometry are stored in a table, referred to as the SDOINDEX table.

The tiles at a particular level can be linearly sorted by systematically visiting tiles in an order determined by a space-filling curve as shown in Figure 1-4. The tiles can also be assigned unique numeric identifiers, known as Morton codes or z-values. The terms tile and tile code will be used interchangeably in this and other sections related to spatial indexing.

Figure 1-4 Quadtree Decomposition and Morton Codes

1.6.2 Fixed Indexing

Fixed-size tile spatial indexing is the preferred indexing method for the relational model. This method uses cover tiles of equal size to cover a geometry. Because all the tiles are the same size, they all have codes of the same length, and the standard SQL equality operator (=) can be used to compare tiles during a join operation. This results in excellent performance characteristics.

Two geometries are likely to interact, and hence pass the primary filter stage, if they share one or more tiles. The SQL statement for the primary filter stage is:

SELECT DISTINCT <select_list for geometry identifiers> 

  FROM table1_sdoindex A, table2_sdoindex B

  WHERE A.sdo_code = B.sdo_code

The effectiveness and efficiency of this indexing method depends on the tiling level and the variation in size of the geometries in the layer. If you select a small fixed-size tile to cover small geometries and then try to use the same size tile to cover a very large geometry, a large number of tiles would be required. However, if the chosen tile size is large, so that fewer tiles are generated in the case of a large geometry, then the index selectivity suffers because the large tiles do not approximate the small geometries very well. Figure 1-5 and Figure 1-6 illustrate the relationships between tile size, selectivity, and the number of cover tiles.

Using a small fixed-size tile as shown in Figure 1-5, selectivity is good, but a large number of tiles is needed to cover large geometries. A window query would easily identify geometries A and B, but would reject C.

Figure 1-5 Fixed-Size Tiling with Many Small Tiles

Using a large fixed-size tile as shown in Figure 1-6, fewer tiles are needed to cover the geometries, but the selectivity is not as good. A window query would likely pick up all three geometries. Any object that shares tile T1 or T2 would identify object C as a candidate, even though the objects may be far apart, such as objects B and C are in this figure.

The SDO_TUNE package has an ESTIMATE_TILING_LEVEL() function that helps determine an appropriate tiling level for your data set.

Figure 1-6 Fixed-Size Tiling with Fewer Large Tiles

Figure 1-7 illustrates geometry 1013 tessellated to three fixed-sized tiles at level 1. The codes for these cover tiles are then stored in an SDOINDEX table.

Figure 1-7 Tessellated Figure

Only three of the four tiles generated by the first tessellation interact with the geometry. Only those tiles that interact with the geometry are stored in the SDOINDEX table, as shown in Table 1-1. In this example, three fixed-size tiles are used. The table structure is shown for illustrative purposes only. The column names of this table differ depending on which implementation method, relational or object-relational, is in use. In the relational model, you have to directly access the index tables. In the object-relational model, this is both unnecessary and not recommended.

Table 1-1 SDOINDEX Table Using Fixed-Size Tiles

SDO_GID <number>	SDO_CODE <raw>
1013	T0
1013	T2
1013	T3

All elements in a geometry are tessellated. In a multielement geometry like 1013, Element 1 is already covered by tile T2 from the tessellation of Element 0. If, however, the specified tiling resolution were such that tile T2 were further subdivided and one of these smaller tiles were completely contained in Element 1 then that tile would be excluded because it would not interact with the geometry.

1.6.3 Hybrid Indexing

Hybrid indexing is the preferred method for indexing the object-relational model. Hybrid indexing uses a combination of fixed- and variable-sized tiles for spatially indexing a layer. Variable-sized tile spatial indexing uses tiles of different sizes to approximate a geometry. For each geometry, you will have a set of fixed-size tiles that fully cover the geometry, and a set of variable-sized tiles that fully cover the geometry.

In Figure 1-8, the variable-sized cover tiles closely approximate each geometry. This results in good selectivity. The number of variable tiles needed to cover a geometry is controlled using the SDO_NUMTILES parameter.

Figure 1-8 Variable-Sized Tile Spatial Indexing

A variable tile is subdivided if it interacts with the geometry, and subdivision will not result in tiles that are smaller than a predetermined size. This size, or tiling resolution, is determined by a default SDO_MAXLEVEL parameter. A user may modify this parameter, but it is not recommended.

Figure 1-9 illustrates how geometry OBJ_1, represented using the object-relational implementation, is approximated with hybrid indexing (SDO_LEVEL = 1 and SDO_NUMTILES = 4). These are not recommended values for SDO_LEVEL and SDO_NUMTILES; they were chosen to simplify this example. The cover tiles are stored in the SDOINDEX table as shown in Table 1-2. Note that the tiles have been numbered for simplicity and do not reflect the format used in Spatial.

Figure 1-9 Decomposition of the Geometry

In Figure 1-9, note which fixed-size tiles are associated with geometry OBJ_1. Only three (T0, T2, T3) of the four large tiles (T0, T1, T2, T3) generated by the tessellation actually interact with the geometry. Only those are stored in the SDOINDEX table. In examining which variable sized tiles are used, tile T0 shows a further tessellation to four smaller tiles, two of which (T02, T03) are used to cover a portion of the geometry. The variable-sized tiles are stored in the SDO_CODE column in the Spatial index table. The fixed-size tiles are stored in the SDO_GROUPCODE column. The spatial index structure is discussed in Section 2.4.

SDO_ROWID <RAW>	SDO_CODE <RAW>	SDO_MAXCODE <RAW>	SDO_GROUPCODE <RAW>	SDO_META <RAW>
GID_OBJ_1	T02	<binary data>	T0	<binary data>
GID_OBJ_1	T03	<binary data>	T0	<binary data>
GID_OBJ_1	T2	<binary data>	T2	<binary data>
GID_OBJ_1	T3	<binary data>	T3	<binary data>

Table 1-2 Section of the SDOINDEX Table

SDO_ROWID <RAW> SDO_CODE <RAW> SDO_MAXCODE <RAW> SDO_GROUPCODE <RAW> SDO_META <RAW>

GID_OBJ_1

T02

<binary data>

T0

<binary data>

GID_OBJ_1

T03

<binary data>

T0

<binary data>

GID_OBJ_1

T2

<binary data>

T2

<binary data>

GID_OBJ_1

T3

<binary data>

T3

<binary data>

As with the fixed-size tile model, all elements in a geometry are tessellated in one step. In a multielement geometry like OBJ_1, Element 1 is covered by a redundant tile from the tessellation of Element 0 but the tile, T2, is stored only once.

The SDO_TUNE package has some functions that help determine appropriate SDO_LEVEL and SDO_NUMTILES values. Appendix A contains suggestions on when hybrid indexing may be beneficial, and how to select values for the two required parameters.

1.7 Spatial Relations and Filtering

Spatial uses filter methods to determine the spatial relationship between entities in the database. The spatial relation is based on geometry locations. The most common spatial relations are based on topology and distance. For example, the boundary of an area consists of a set of curves that separate the area from the rest of the coordinate space. The interior of an area consists of all points in the area that are not on its boundary. Given this, two areas are said to be adjacent if they share part of a boundary but no points in their interior. Next, the distance between two spatial objects is the minimum distance between any points in them. Two objects are said to be within a given distance of one another if their distance is less than the given distance.

Spatial has two secondary filter methods. One method evaluates topological criteria and a second method determines if two spatial objects are within a Euclidean distance of each other. The secondary filter that evaluates topological criteria is called RELATE. The syntax is given in subsequent chapters that describe geometry functions and operators. RELATE implements a 9-intersection model for categorizing binary topological relations between points, lines, and polygons.

Each spatial object has an interior, a boundary, and an exterior. The boundary consists of points or lines that separate the interior from the exterior. The boundary of a line consists of its end-points. The boundary of a polygon is the line that describes its perimeter. The interior consists of points that are in the object but not on its boundary and the exterior consists of those points that are not in the object.

Given that an object A has three components -- a boundary Ab, an interior Ai, and an exterior Ae, any pair of objects will have nine possible interactions between their components. Pairs of components will have an empty (0), or a non-empty (1) set intersection. The set of interactions between two geometries is represented by a 9-intersection matrix that specifies which pairs of components intersect and which do not. Figure 1-10 shows the 9-intersection matrix for two polygons that are adjacent to one another. This matrix yields the following bit mask, generated in row-major form: "101001111".

Figure 1-10 The 9-Intersection Model

Some of the topological relationships identified in the seminal work by Dr. Egenhofer² and colleagues have names associated with them. Spatial uses the following names:

DISJOINT -- The boundaries and interiors do not intersect.
TOUCH --The boundaries intersect but the interiors do not.
OVERLAPBDYDISJOINT --The interior of one object intersects the boundary and interior of the other object but the two boundaries do not intersect. This relation occurs, for example, when a line originates outside a polygon and ends inside that polygon.
OVERLAPBDYINTERSECT -- The boundaries and interiors of the two objects intersect.
EQUAL -- The two objects have the same boundary and interior.
CONTAINS -- The interior and boundary of one object is completely contained in the interior of the other.
COVERS --The interior of one object is completely contained in the interior of the other and their boundaries intersect.
INSIDE -- The opposite of CONTAINS. A INSIDE B implies B CONTAINS A.
COVEREDBY -- The opposite of COVERS. A COVEREDBY B implies
B COVERS A
ANYINTERACT -- The objects are non-disjoint.

The other secondary filter, WITHIN_DISTANCE, determines if two spatial objects, A and B, are within a Euclidean distance of one another. First it constructs a distance buffer, Db, around the reference object B. It then checks that A and Db are non-disjoint.The distance buffer of an object consists of all points within the given distance from that object. Figure 1-11 shows the distance buffers for point, line, and area objects. Notice how the buffer is rounded near the corners of the objects.

Figure 1-11 Distance Buffers for Points, Lines, and Polygons

1.8 Partitioned Point Data

Point data, unlike line and polygon data, has the unique characteristic of always using only one tile per point. There are cases where this difference can be exploited.

Spatial has an enhanced spatial indexing mechanism capable of handling very large data sets consisting of complex geometries. For applications handling point data sets that are several tens of gigabytes or larger, further performance gains can be achieved by using Oracle8i table partitioning features.

Table partitioning is available only with the Partitioning Option of Oracle8i Enterprise Edition. If the Partitioning Option is available to you, the preferred method is to use Oracle8i table partitioning in conjunction with spatial indexing (using the relational model). See the Oracle8i Concepts guide for a description of Oracle8i Partitioning. See Section A.2.6.3 for a description of a sample script that uses table partitioning with point data.

A previous release of Spatial Data Option (from which the current Spatial product has evolved) utilized its own version of table partitioning instead of spatial indexing. Appendix C briefly describes the deprecated partitioning scheme for those customers with legacy point data sets. While this feature is still enabled in the current release, it may be removed in the future.

¹ Point data can also be stored in a partitioned table. See Appendix C, "Partitioning Legacy Point Data" for details.

² Dr. Max Egenhofer, University of Maine, Orono.

1 Spatial Concepts