Development of a Three-Tier Metadata Documentation Scheme:
Examining Level I as an Internet Accessible Metadata Input and Search Tool

Kristine M. Kuhlman
University of Maryland Baltimore County
Department of Geography - Spatial Analysis Laboratory
1000 Hilltop Circle
Baltimore, MD, USA 21250
phone: (410) 455-3847
fax: (410) 455-1056
kuhlman@umbc.edu

Aya Soffer
Computer Science and Electrical Engineering Department
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD, USA 21250
soffer@cs.umbc.edu

T.W. Foresman
Director, Spatial Analysis Lab - Department of Geography
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD, USA 21250
foresman@umbc.edu

ABSTRACT

The University of Maryland at Baltimore County (UMBC) and the Baltimore- Washington Regional Collaboratory partners have developed a three-tier approach to document metadata for geospatial data. Level I contains common fields or represents lowest common denominator fields for data browsers and general users and producers. Level II is an extension of Level I and contains more detailed metadata fields that are institution specific. Level III conforms to the original 1994 FGDC metadata standards. An overview of the three-tier approach is presented including a detailed example of the Level I fields. UMBC's GEM! (General Entry Metadata), an internet accessible tool, that provides a simple yet comprehensive interface for inputing all Level I metadata fields is described. The metadata database and an internet accessible search tool that is used to store and search Level I metadata is also described. The search engine is adaptable so it can be easily modified to reflect changes in the metadata fields based on case study activities. Furthermore, the search engine is dynamic and reflects the current holdings of the repository.


Copyright 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

1.0 Introduction

The University of Maryland at Baltimore County (UMBC) and the Baltimore- Washington Regional Collaboratory (Collaboratory)partners, under a National Spatial Data Infrastructure (NSDI) grant, has developed a three-tier approach to document metadata for geospatial data. As a part of our investigation, we are questioning the practicality of the current published FGDC standards for the vast majority of regional spatial data user community [1,2]. The multi-entity Collaboratory represents multiple governments, private businesses, and citizens using multiple data types, sources, and platforms, making the task of documenting our data sets onerous. Therefore, we are driven to develop an improved metadata standard and an efficient methodology to implement this standard to members outside the typical NSDI community. We will be presenting our three-tier approach and demonstrating our Internet accessible level I metadata input and search tool, as well as reporting on the use of the tools.

Levels I-III represent subsets of the Federal Geographic Data Committee's (FGDC) metadata standards. Level I is built upon the research question: what metadata fields approximate 100% documentation. So, Level I contains common fields or represents lowest common denominator fields for data browsers and general users. Level II is an extension of Level I and contains more detailed metadata fields that are institution specific. Level III conforms to the original FGDC metadata standards. We are gathering baseline data from prototype implementation with our user community on Levels I-III metadata use.

Our research examines which FGDC fields comprise Level I metadata for federal, state, and local spatial data users. A Level I metadata input tool has been developed and is being tested by users over the Internet ( http://www.umbc.edu/bwrdc). We are in the process of interviewing federal, state, and local GIS users about their experience in using this tool and identifying case study participants for more in-depth study of the tool. We are also continuing development of a metadata search tool that is tightly coupled with our three-tier approach. This search tool enables users to search for data based on those fields of the metadata that have been identified by them as useful search word criteria. The search engine is adaptable so it can be easily modified to reflect changes in the metadata that may be mandated based on our case studies. Furthermore, the search engine is dynamic and reflects the current holdings of the repository.

2.0 Three-Tier Metadata Scheme

UMBC is working with our Collaboratory partners (GIS users and data developers from federal, state, regional and local agencies) to explore an alternative standard to FGDCs metadata standard. This effort is based on the fact that metadata is invaluable to the geospatial data community, but the costs associated with data documentation currently outweigh the benefits. For example, metadata allows for appropriate data use, facilitates data sharing and eliminates the need to re-develop data whose properties are unknown or forgotten, therefore, reducing costs by reducing duplicate efforts. However, much inertia is associated with documenting data since user-friendly tools with easily understandable terms are not readily available to the GIS community. Therefore, UMBC has developed an Internet accessible tool that uses less technical terminology and is based on Level I of our three-tier metadata scheme [3].

The three levels of metadata were designed to be more intuitive and user-friendly than existing FGDC standards and to reflect the different requirements associated with varying levels of technical expertise. Each level builds on the next, with each containing more detailed information. UMBC's approach takes iterative steps toward documentation:

Table 1. UMBC's GEM - General Entry Metadata - Level I

METADATA FIELD

DESCRIPTION

EXAMPLE

Organization Username

Organization Entering Metadata

UMBC

File Name/Coverage

File/coverage name

urb1792a.zip

Data Structure

Political or natural geographic division by which data is divided (county, watershed, quadrangle) (free text)

None

Status of Data Set

Level of completion of data set, ("Complete", "In Progress", "Planned) updating schedule, and last updated

Complete

Data Layer Theme

Theme that describes the data layer

Land Cover

Description of Data Set

Short description of the data set
(free text)

GIS coverage with polygons depicting built-up/urban areas in 1792

Project

Project/research grant for which data set was created(free text)

Baltimore-Washington Regional Collaboratory

Key Words

Words summarizing the content of the data set (free text)

Urban, Built-up, Historical, Temporal

Geographic Area Covered &

Bounding Coordinates

The political or physical geographic boundary of the data set by state, county & other geographic area (if applicable) &
Lat/Long coordinates

Baltimore-Washington Region
West Bounding Coordinate: -78.0
East Bounding Coordinate: - 76.0
North Bounding Coordinate: 40.0
South Bounding Coordinate: 38.0

Coordinate System, Units and Datum

Grid (and zone) or geographic coordinate system and units of measure (free text)

UTM, meters, NAD83

Type of Data

(vector/raster/point/database/text/map)

vector

Format of Data

Data format

ArcInfo export file (.e00)

Software & Version

Software and version

ArcInfo 7.0.3

File Size

Kilobytes/Megabytes/Gigabytes -uncompressed file size (free text)

2 Megs

Originating Organization

Agency/organization that created the data set (free text)

University of Maryland Baltimore County

Type of Source Material

Map/satellite/aerial photos/existing digital data/ field collection (domain different than FGDC)

Historical maps

Scale of Source Material

Map scale or pixel resolution

1:24,000 to 1:1000,000

Date(s) of Source Material

Dd/mm/yyyy format

01/01/1790 - 01/01/1804

Method of Data Collection

Phrase which describes how data for the database was created (free text)

Delineated built-up areas from historical maps onto mylar overlays

Method of Digital Production

Phrase which describes how data was transferred to a digital form (free text)

Digitized polygons into Arc/Info coverage

Accuracy Resolution

Describe any ground-truthing tests used, or pixel resolution of raster data (free text)

Not yet field tested. Planned summer 1997.

Attribute Description

Short phrase describing any attributes in database (free text)

Polygons attributed with Land Use/Land Cover codes

Distribution Information

Cost of data, ordering instructions

Free, available over the internet

Available Media

On -line or hard copy formats (if data is on Internet, please provide address)

On-line at http://www.umbc.edu/bwrdc

 

Data Distribution Contact

Contact Name and Email address

Chris Steele, csteele@strabo.umbc.edu

Access and Use Constraints

Restrictions for accessing and using the data set

No constraints. The authors of this data set would appreciate acknowledgment in products derived from these data.

Metadata Contact Information

Organization, Name Address, Phone Number, Electronic mail address, Date for metadata contact

Kristine Kuhlman, UMBC - Spatial Analysis Laboratory, 1000 Hilltop Circle, (410) 455-3846, kuhlman@umbc.edu, 19970625

Level II Metadata

Status of Level II Metadata

Complete

Level III Metadata

Status of Level II Metadata

Complete

3.0 Metadata Entry Tool (GEM! - General Entry Metadata)

The next step toward metadata documentation is to provide an operational tool to assist users with documentation. UMBC's GEM! (General Entry Metadata) is an Internet accessible tool with the above Level I metadata fields. These fields reflect our search engine fields and mandatory FGDC metadata fields. Level I fields are based on pilot projects and workshops with federal, state, regional and local agencies. Metadata fields are hot-linked to documentation which defines the field name and provides an example of an appropriate response.

Users need to provide a username and password to access GEM! in order to provide some security to the database holdings. Users can then enter their metadata through a combination of pull-down menus and free text entry fields. Figure 1 shows the first screen of the input tool. Once the metadata fields are entered and submitted, another screen pops up so the user can review the input (Figure 2). If changes need to be made, users may simply skip back to the previous screen and make changes. Once the user is satisfied with the input, an output format is selected. The results may be emailed or downloaded in "text" or "block" format. Text format is a pipe delimited format with the metadata field name separated from the input to facilitate spreadsheet and database import. Block format is formatted according to the FGDC numbering scheme using the appropriate section number and indentation of the FGDC field corresponding to each GEM! field. We are in the process of developing a third output format which includes a framework for all FGDC fields so the user may fill out more fields if necessary. In addition to providing the user with the output, the metadata is immediately entered into the metadata database for data search and retrieval capabilities.

UMBC's GEM! offers a realistic alternative metadata standard that facilitates documentation for a large extended GIS community. Agencies that have not historically documented their data are now using GEM!. The challenge is to couple state-wide technical requirements with institutional requirements. This we hope to do though our involvement with MSGIC (Maryland State Government Geographic Information Coordinating Council) Standards Sub-Committee.

Figure 1. UMBC's GEM! - General Entry Metadata Tool.

Figure 2. Form for reviewing metadata entry.

4.0 The Metadata Database and Search Tool

The metadata database stores all of the level I metadata fields in relational tables. In addition to these fields each record also contains the username of the user who created this metadata entry and a timestamp that is automatically generated when the record is created. A metadata record can be uniquely identified based on the <username,timestamp> pair. Users may request to see all of the records that they have created and modify these when necessary.

One of the interesting features of the metadata database and the search tool is that it is self describing. That is, all of the information regarding the metadata scheme is stored in the database itself in a metadata descriptor table . This information is used to dynamically create the user interface based on the metadata scheme. As a result, changing the metadata scheme does not require any changes in the search engine code and the database and search engine can easily adapt to changes in the metadata fields that may be mandated based on our case studies. In addition, we could feasibly support several databases using different schemes based on user needs. This metadata descriptor table includes the names of the fields, their types and a flag that indicates whether they are searchable fields. This data is also stored in a relational table. Currently we have decided apriori which subset of the level I metadata fields are useful search criteria and have set them as searchable fields. In future implementations, users will be able to select the searchable fields and they will be recorded as such in the metadata descriptor table and subsequently be automatically incorporated into the search tool.

The type attribute in the metadata descriptor table indicates the interface search field with the corresponding metadata field. Currently we have 6 types of fields: text, number, date, region, single-value menu, and multiple-value menu. Text fields (e.g., keywords) are specified by using a simple text entry box. Number fields are also specified by using a text entry box. Proper formulation of the number field is verified before submitting the query to the database. Date fields are specified using a date "format". The selection of year is by means of a pull-down menu. Only those years for which there is actual data in the database will appear as choices in this menu. Single-value menus are specified using a pull-down menu. The choices in this menu are generated dynamically based on the values in the database. For example, the "Type of Data" field has 5 possible choices in the input tool: vector, raster, point, database, text. If the database only contains records that have been designated as either raster or vector and "Type of Data" is designated as a searchable field, then it will appear as a field in the search interface and only the choices raster and vector would be listed in the pull-down menu. Fields that are of type multiple-value menus (e.g., theme) are specified using a list, as more than one selection is allowed in that case. Again the list of choices will contain those values that actually exist in the database and not all of the available choices in the input tool. The region field is searchable via the map. Only one region field can be specified as a searchable field. The feature described above whereby the menus are created on-the-fly based on the actual contents of the database is very powerful. As a result, the search tool also functions as a browser and helps users assess the current holdings of the database and formulate queries that most likely will return a small number of relevant hits.

Figure 3 shows the web based interface of the search engine. The user is searching for free on-line Land Use/Land Cover data for Baltimore or Howard county in Maryland dated from 1/1/1988 to 12/31/1995. As mentioned above, the search tool interface is dynamically generated from the metadata descriptor. Those fields that have been designated as searchable will appear in the search form. The interface for each field is determined by the type of the field. In addition to these search fields the search engine also displays a map that can be used to specify the region of interest for the query. This map is also constructed dynamically from the database. The map is in essence a collection of polygons. Each polygon has a name associated with it. Additionally, each polygon may have a subdivision associated with it. The subdivision is simply another map (i.e. a collection of polygons). If the user activates the "zoom in" button (magnifying glass with a plus inside) and selects one of the polygons, then the next level of subdivision will be displayed. Activating the "zoom out" button and clicking anywhere on the map will go back one level. Currently the top level subdivision is the United States where each polygon represents a state. The next level of subdivision is a county map. Thus, activating zoom in and then selecting Maryland, for example, would change the map to a county map of Maryland. We could have alternate subdivisions for each state such as census tracts or watersheds. Furthermore, additional levels of subdivision could also be defined, such as subdividing a county by zip codes.

The region of interest can be specified in three ways. A textual description (e.g., county name) can be entered in the "Geographic Area Covered" field as plain text. A particular polygon that represents a state, county, or another subdivision (this depends on the current map) can be selected by pointing to this polygon and clicking with the mouse. The name associated with the current polygon is displayed in the upper corner of the map and is updated as the mouse moves from polygon to polygon. Once a polygon is selected, its outline is highlighted and the associated name is inserted into the "Geographic Area Covered" field. Finally a region of interest can be drawn interactively by activating the "select region" button (square symbol). Once a region is specified, the bounding box corresponding to the region of interest is computed, and the database is searched for all records for which the area specified by the "Bounding Box" field intersects this region of interest.

The database is implemented using POSTGRES95 [4]. POSTGRES95 is an Object-Relational DBMS (ORDBMS), derived from the Berkeley Postgres database management system. It provides rich data types and easy extensibility. The Postgres95 database includes a backend SQL server that communicates with client applications via APIs. The search tool is a Java applet and runs on the client machine. The applet communicates with the search tool via Postgres95's Java API. When invoking the search tool it communicates with the server in order to retrieve the metadata descriptor information and the initial geographic subdivision. Once a query is formulated, it is passed to the SQL server for processing. The resulting records are displayed in a table format. If the metadata contains a url for the actual data, it can be downloaded directly. A demo of the search engine can be accessed at http/:/www.cs.umbc.edu/~zhan/demo.

Figure 3. Metadata Search Tool.

5.0 Concluding Remarks

We have found through our pilot projects with federal, state, local and regional data developers that even with using a subset of FGDC metadata fields (mandatory and user defined fields) some fields are consistently left blank (e.g. accuracy reports and data resolution descriptions). In addition to satisfying the technical requirements associated with developing the metadata input and search tools, we are faced with two additional challenges: 1) reaching and satisfying an extended user community (especially local governments who generate a majority of the large scale data sets) and 2) integrating this approach with the rigid federal standard.

6.0 References

[1] Executive Order 12906, published in the April 13, 1994, edition of the Federal Register, Volume 59, Number 71, pp. 17671-17674.

[2] Federal Geospatial Data Committee. 1994. Content Standards for Digital Geospatial Metadata(June 8). Federal Geospatial Data Committee. Washington, D. C.

[3] T. Foresman, H. Wiggins, and D. Porter. (1996). "Metadata Myth: Misunderstanding the Implications of Federal Metadata Standards". In Proceedings of the First IEEE Metadata Conference. Silver Spring, MD.

[4] Postgres95 home page, URL http://www.postgresql.org/.


ACKNOWLEDGEMENTS

This research is supported by the National Spatial Data Infrastructure under grant # 05-5-28166 and by USRA/CESDIS and NASA Goddard Space Flight Center.