Aya Soffer
Computer Science and Electrical Engineering Department
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD, USA 21250
soffer@cs.umbc.edu
T.W. Foresman
Director, Spatial Analysis Lab - Department of Geography
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD, USA 21250
foresman@umbc.edu
The University of Maryland at Baltimore County (UMBC) and the Baltimore- Washington Regional Collaboratory partners have developed a three-tier approach to document metadata for geospatial data. Level I contains common fields or represents lowest common denominator fields for data browsers and general users and producers. Level II is an extension of Level I and contains more detailed metadata fields that are institution specific. Level III conforms to the original 1994 FGDC metadata standards. An overview of the three-tier approach is presented including a detailed example of the Level I fields. UMBC's GEM! (General Entry Metadata), an internet accessible tool, that provides a simple yet comprehensive interface for inputing all Level I metadata fields is described. The metadata database and an internet accessible search tool that is used to store and search Level I metadata is also described. The search engine is adaptable so it can be easily modified to reflect changes in the metadata fields based on case study activities. Furthermore, the search engine is dynamic and reflects the current holdings of the repository.
The University of Maryland at Baltimore County (UMBC) and the Baltimore- Washington Regional Collaboratory (Collaboratory)partners, under a National Spatial Data Infrastructure (NSDI) grant, has developed a three-tier approach to document metadata for geospatial data. As a part of our investigation, we are questioning the practicality of the current published FGDC standards for the vast majority of regional spatial data user community [1,2]. The multi-entity Collaboratory represents multiple governments, private businesses, and citizens using multiple data types, sources, and platforms, making the task of documenting our data sets onerous. Therefore, we are driven to develop an improved metadata standard and an efficient methodology to implement this standard to members outside the typical NSDI community. We will be presenting our three-tier approach and demonstrating our Internet accessible level I metadata input and search tool, as well as reporting on the use of the tools.
Levels I-III represent subsets of the Federal Geographic Data Committee's (FGDC) metadata standards. Level I is built upon the research question: what metadata fields approximate 100% documentation. So, Level I contains common fields or represents lowest common denominator fields for data browsers and general users. Level II is an extension of Level I and contains more detailed metadata fields that are institution specific. Level III conforms to the original FGDC metadata standards. We are gathering baseline data from prototype implementation with our user community on Levels I-III metadata use.
Our research examines which FGDC fields comprise Level I metadata for federal, state, and local spatial data users. A Level I metadata input tool has been developed and is being tested by users over the Internet ( http://www.umbc.edu/bwrdc). We are in the process of interviewing federal, state, and local GIS users about their experience in using this tool and identifying case study participants for more in-depth study of the tool. We are also continuing development of a metadata search tool that is tightly coupled with our three-tier approach. This search tool enables users to search for data based on those fields of the metadata that have been identified by them as useful search word criteria. The search engine is adaptable so it can be easily modified to reflect changes in the metadata that may be mandated based on our case studies. Furthermore, the search engine is dynamic and reflects the current holdings of the repository.
The three levels of metadata were designed to be more intuitive and user-friendly than existing FGDC standards and to reflect the different requirements associated with varying levels of technical expertise. Each level builds on the next, with each containing more detailed information. UMBC's approach takes iterative steps toward documentation:
METADATA FIELD |
DESCRIPTION |
EXAMPLE |
Organization Username |
Organization Entering Metadata |
UMBC |
File Name/Coverage |
File/coverage name |
urb1792a.zip |
Data Structure |
Political or natural geographic division by which data is divided (county, watershed, quadrangle) (free text) |
None |
Status of Data Set |
Level of completion of data set, ("Complete", "In Progress", "Planned) updating schedule, and last updated |
Complete |
Data Layer Theme |
Theme that describes the data layer |
Land Cover |
Description of Data Set |
Short description of the data set |
GIS coverage with polygons depicting built-up/urban areas in 1792 |
Project |
Project/research grant for which data set was created(free text) |
Baltimore-Washington Regional Collaboratory |
Key Words |
Words summarizing the content of the data set (free text) |
Urban, Built-up, Historical, Temporal |
Geographic Area Covered & Bounding Coordinates |
The political or physical geographic boundary of the data set by state, county & other geographic area (if applicable)
& |
Baltimore-Washington Region |
Coordinate System, Units and Datum |
Grid (and zone) or geographic coordinate system and units of measure (free text) |
UTM, meters, NAD83 |
Type of Data |
(vector/raster/point/database/text/map) |
vector |
Format of Data |
Data format |
ArcInfo export file (.e00) |
Software & Version |
Software and version |
ArcInfo 7.0.3 |
File Size |
Kilobytes/Megabytes/Gigabytes -uncompressed file size (free text) |
2 Megs |
Originating Organization |
Agency/organization that created the data set (free text) |
University of Maryland Baltimore County |
Type of Source Material |
Map/satellite/aerial photos/existing digital data/ field collection (domain different than FGDC) |
Historical maps |
Scale of Source Material |
Map scale or pixel resolution |
1:24,000 to 1:1000,000 |
Date(s) of Source Material |
Dd/mm/yyyy format |
01/01/1790 - 01/01/1804 |
Method of Data Collection |
Phrase which describes how data for the database was created (free text) |
Delineated built-up areas from historical maps onto mylar overlays |
Method of Digital Production |
Phrase which describes how data was transferred to a digital form (free text) |
Digitized polygons into Arc/Info coverage |
Accuracy Resolution |
Describe any ground-truthing tests used, or pixel resolution of raster data (free text) |
Not yet field tested. Planned summer 1997. |
Attribute Description |
Short phrase describing any attributes in database (free text) |
Polygons attributed with Land Use/Land Cover codes |
Distribution Information |
Cost of data, ordering instructions |
Free, available over the internet |
Available Media |
On -line or hard copy formats (if data is on Internet, please provide address) |
On-line at http://www.umbc.edu/bwrdc
|
Data Distribution Contact |
Contact Name and Email address |
Chris Steele, csteele@strabo.umbc.edu |
Access and Use Constraints |
Restrictions for accessing and using the data set |
No constraints. The authors of this data set would appreciate acknowledgment in products derived from these data. |
Metadata Contact Information |
Organization, Name Address, Phone Number, Electronic mail address, Date for metadata contact |
Kristine Kuhlman, UMBC - Spatial Analysis Laboratory, 1000 Hilltop Circle, (410) 455-3846, kuhlman@umbc.edu, 19970625 |
Level II Metadata |
Status of Level II Metadata |
Complete |
Level III Metadata |
Status of Level II Metadata |
Complete |
Users need to provide a username and password to access GEM! in order to provide some security to the database holdings. Users can then enter their metadata through a combination of pull-down menus and free text entry fields. Figure 1 shows the first screen of the input tool. Once the metadata fields are entered and submitted, another screen pops up so the user can review the input (Figure 2). If changes need to be made, users may simply skip back to the previous screen and make changes. Once the user is satisfied with the input, an output format is selected. The results may be emailed or downloaded in "text" or "block" format. Text format is a pipe delimited format with the metadata field name separated from the input to facilitate spreadsheet and database import. Block format is formatted according to the FGDC numbering scheme using the appropriate section number and indentation of the FGDC field corresponding to each GEM! field. We are in the process of developing a third output format which includes a framework for all FGDC fields so the user may fill out more fields if necessary. In addition to providing the user with the output, the metadata is immediately entered into the metadata database for data search and retrieval capabilities.
UMBC's GEM! offers a realistic alternative metadata standard that facilitates documentation for a large extended GIS community. Agencies that have not historically documented their data are now using GEM!. The challenge is to couple state-wide technical requirements with institutional requirements. This we hope to do though our involvement with MSGIC (Maryland State Government Geographic Information Coordinating Council) Standards Sub-Committee.
One of the interesting features of the metadata database and the search tool is that it is self describing. That is, all of the information regarding the metadata scheme is stored in the database itself in a metadata descriptor table . This information is used to dynamically create the user interface based on the metadata scheme. As a result, changing the metadata scheme does not require any changes in the search engine code and the database and search engine can easily adapt to changes in the metadata fields that may be mandated based on our case studies. In addition, we could feasibly support several databases using different schemes based on user needs. This metadata descriptor table includes the names of the fields, their types and a flag that indicates whether they are searchable fields. This data is also stored in a relational table. Currently we have decided apriori which subset of the level I metadata fields are useful search criteria and have set them as searchable fields. In future implementations, users will be able to select the searchable fields and they will be recorded as such in the metadata descriptor table and subsequently be automatically incorporated into the search tool.
The type attribute in the metadata descriptor table indicates the interface search field with the corresponding metadata field. Currently we have 6 types of fields: text, number, date, region, single-value menu, and multiple-value menu. Text fields (e.g., keywords) are specified by using a simple text entry box. Number fields are also specified by using a text entry box. Proper formulation of the number field is verified before submitting the query to the database. Date fields are specified using a date "format". The selection of year is by means of a pull-down menu. Only those years for which there is actual data in the database will appear as choices in this menu. Single-value menus are specified using a pull-down menu. The choices in this menu are generated dynamically based on the values in the database. For example, the "Type of Data" field has 5 possible choices in the input tool: vector, raster, point, database, text. If the database only contains records that have been designated as either raster or vector and "Type of Data" is designated as a searchable field, then it will appear as a field in the search interface and only the choices raster and vector would be listed in the pull-down menu. Fields that are of type multiple-value menus (e.g., theme) are specified using a list, as more than one selection is allowed in that case. Again the list of choices will contain those values that actually exist in the database and not all of the available choices in the input tool. The region field is searchable via the map. Only one region field can be specified as a searchable field. The feature described above whereby the menus are created on-the-fly based on the actual contents of the database is very powerful. As a result, the search tool also functions as a browser and helps users assess the current holdings of the database and formulate queries that most likely will return a small number of relevant hits.
Figure 3 shows the web based interface of the search engine. The user is searching for free on-line Land Use/Land Cover data for Baltimore or Howard county in Maryland dated from 1/1/1988 to 12/31/1995. As mentioned above, the search tool interface is dynamically generated from the metadata descriptor. Those fields that have been designated as searchable will appear in the search form. The interface for each field is determined by the type of the field. In addition to these search fields the search engine also displays a map that can be used to specify the region of interest for the query. This map is also constructed dynamically from the database. The map is in essence a collection of polygons. Each polygon has a name associated with it. Additionally, each polygon may have a subdivision associated with it. The subdivision is simply another map (i.e. a collection of polygons). If the user activates the "zoom in" button (magnifying glass with a plus inside) and selects one of the polygons, then the next level of subdivision will be displayed. Activating the "zoom out" button and clicking anywhere on the map will go back one level. Currently the top level subdivision is the United States where each polygon represents a state. The next level of subdivision is a county map. Thus, activating zoom in and then selecting Maryland, for example, would change the map to a county map of Maryland. We could have alternate subdivisions for each state such as census tracts or watersheds. Furthermore, additional levels of subdivision could also be defined, such as subdividing a county by zip codes.
The region of interest can be specified in three ways. A textual description (e.g., county name) can be entered in the "Geographic Area Covered" field as plain text. A particular polygon that represents a state, county, or another subdivision (this depends on the current map) can be selected by pointing to this polygon and clicking with the mouse. The name associated with the current polygon is displayed in the upper corner of the map and is updated as the mouse moves from polygon to polygon. Once a polygon is selected, its outline is highlighted and the associated name is inserted into the "Geographic Area Covered" field. Finally a region of interest can be drawn interactively by activating the "select region" button (square symbol). Once a region is specified, the bounding box corresponding to the region of interest is computed, and the database is searched for all records for which the area specified by the "Bounding Box" field intersects this region of interest.
The database is implemented using POSTGRES95 [4]. POSTGRES95 is an Object-Relational DBMS (ORDBMS), derived from the Berkeley Postgres database management system. It provides rich data types and easy extensibility. The Postgres95 database includes a backend SQL server that communicates with client applications via APIs. The search tool is a Java applet and runs on the client machine. The applet communicates with the search tool via Postgres95's Java API. When invoking the search tool it communicates with the server in order to retrieve the metadata descriptor information and the initial geographic subdivision. Once a query is formulated, it is passed to the SQL server for processing. The resulting records are displayed in a table format. If the metadata contains a url for the actual data, it can be downloaded directly. A demo of the search engine can be accessed at http/:/www.cs.umbc.edu/~zhan/demo.
We have found through our pilot projects with federal, state, local and regional data developers that even with using a subset of FGDC metadata fields (mandatory and user defined fields) some fields are consistently left blank (e.g. accuracy reports and data resolution descriptions). In addition to satisfying the technical requirements associated with developing the metadata input and search tools, we are faced with two additional challenges: 1) reaching and satisfying an extended user community (especially local governments who generate a majority of the large scale data sets) and 2) integrating this approach with the rigid federal standard.
[1] Executive Order 12906, published in the April 13, 1994, edition of the Federal Register, Volume 59, Number 71, pp. 17671-17674.
[2] Federal Geospatial Data Committee. 1994. Content Standards for Digital Geospatial Metadata(June 8). Federal Geospatial Data Committee. Washington, D. C.
[3] T. Foresman, H. Wiggins, and D. Porter. (1996). "Metadata Myth: Misunderstanding the Implications of Federal Metadata Standards". In Proceedings of the First IEEE Metadata Conference. Silver Spring, MD.
[4] Postgres95 home page, URL http://www.postgresql.org/.
This research is supported by the National Spatial Data Infrastructure under grant # 05-5-28166 and by USRA/CESDIS and NASA Goddard Space Flight Center.