Oracle8(TM) ConText(R) Cartridge Application Developer's Guide
Release 2.0

A54630-01

Library

Product

Contents

Index

Prev Next

1
Introduction

This chapter provides an overview of the Oracle8 ConText Cartridge.

The following topics are covered in this chapter:

The ConText Cartridge Solution

Most of today's business data is not stored as structured data; it is stored as non-structured text in thousands of formats: letters, memos, manuals, reports, news articles, electronic mail, notes, messages, etc.

For many businesses, this huge volume of text is a vast, valuable and unmanageable information resource. Relevant documents are usually difficult to locate, hard to retrieve, and often impossible to digest. Oracle solves the text management problem with ConText.

ConText is built on the power and scalability of Oracle Universal Server. It uses advanced text analysis and retrieval technology to give users the exact information they need when they need it. With ConText, Oracle Universal Server is a complete solution for managing any data resource -relational, text, spatial, image, video, or audio-in any application, at any scale.

ConText manages unstructured text as quickly and as easily as structured data. It is an online text management system that uses SQL or PL/SQL to search through large volumes of text stored in either structured databases or system files.

Using ConText, developers can quickly and efficiently build mission-critical applications that provide hundreds or even thousands of concurrent users with fast, efficient access to text-based information. And, because text is now a supported datatype in the Oracle Universal Server, new applications and extensions to existing Oracle applications are quick and easy to build with standard tools.

Advantages of Oracle ConText

The advantages of ConText include:

Powerful Text Handling Capabilities

Using ConText's advanced indexing, retrieval, reduction, and classification features, users pinpoint and access required textual information quickly and easily from large volumes of text data.

Extensible Framework for Languages and Formats

ConText's extensible framework easily integrates new languages, formats, specialized search engines and text processing services. This adaptability to new requirements preserves an enterprise's investment in its text storage and retrieval applications and provides a healthy environment for long-term application development.

ConText currently recognizes, indexes, and retrieves text for most of the NLS-compliant, single-byte languages (7-bit and 8-bit character sets). All of these languages can be processed by the basic lexer provided with ConText.

ConText also supports query expansion, in the form of stemming, soundex, and fuzzy matching, for English and the following Western European languages: French, Spanish, Italian, German, and Dutch.

For multi-byte languages, ConText provides the following lexers: Japanese, Korean (BETA), and Chinese (BETA). The Japanese lexer is provided recognizes three of the Japanese writing systems: Kanji, Hiragana and Katakana.

Database-quality Architecture for Managing Text

Because ConText is fully integrated with Oracle8, users can manage text with the same reliability, scalability, security, integrity, fault tolerance, and administrative ease they expect from an enterprise-caliber relational database system.

Standards-based Development Environment

ConText takes full advantage of Oracle's standard interfaces and third party tools-Power Builder, SQL*Windows, OLE Automation tools, and Visual Basic, for example. By installing ConText on one or more servers, client tools like SQL*Plus, Oracle Forms and Pro*C can be used to access and manipulate text just as easily and efficiently as structured data.

While standalone text-retrieval products often burden developers with separate development environments, ConText treats text and relational data as peers and uses standard SQL to locate and retrieve relevant text information.

Text and Linguistic Features

ConText features that facilitate text management and retrieval include:

Linguistic Analysis

ConText provided a sophisticated natural language parser that can analyze English-language text and return detailed thematic information about the text. This theme information can be used in two very distinct and powerful ways to manipulate text:

Theme Queries

Theme queries provide a powerful alternative or extension to text queries. In a text query, the occurrence of a word in a document is sufficient for the document to be returned in the results of the query. However, this type of query may generate more hits than the user wants.

Theme queries let the user search for documents based on the main ideas or concepts in the documents. In a theme query, only those documents in which a particular topic was sufficiently developed to be classified as a document-level theme are returned.

Theme Viewing

Themes and thematic content (Gists) can be generated on a per document basis through the Linguistic Services. This information can then be used to view documents by their themes, as well as their thematically-relevant paragraphs.

The application developer uses the Linguistics Services to create various levels of shorter abstracts that the user can use to quickly review the essential content of documents and determine their relevance.

Roles

The individuals involved in developing, supporting, maintaining and using ConText facilities are:

End User

An end user is the individual or organization that uses an application to locate, retrieve, and read text. The End User defines the data or information requirements that must be satisfied by the application. The End User also defines the document environment from which text will be selected.

Application Developer

The application developer designs the application, defines the environment required to support the application, works with the System Administrator to create the environment, and writes the programs and procedures that satisfy user requirements. This book is targeted to this audience.

Database Administrator

The database administrator maintains the Oracle system facilities, the databases, and the system environment that supports a ConText application.

ConText Administrator

The ConText administrator maintains the ConText environment that supports text applications, for example the policies and preferences that define text columns and indexes. The way in which your database administrator creates policies and preference affect the way you, the application developer, execute your queries.

Creating the Text Processing Environment

The collection of text to be managed must be stored in an environment that is accessible to Oracle and ConText either as columns in an Oracle database or as pointers to system files outside the database.

Documents must be properly loaded into the database (or identified by external pointers) and indexed before text/theme queries can be executed.

In addition, linguistic output must be generated for each document before the linguistic information can be viewed for the documents.

To index a document or generate linguistic output for the document, the column storing the document must be defined as a text column. ConText recognizes a text column in a table if the column has one or more policies attached to it.

A table can contain more than one text column, but each text column requires a separate policy.

The process of loading documents, defining text columns, and creating ConText indexes for the columns is documented in the Oracle8 ConText Cartridge Administrator's Guide.

In particular, the Oracle8 ConText Cartridge Administrator's Guide explains how to:




Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index