Technology Seminar - the Future of the Web
Charles Nicholas, Associate
Professor, Department of Computer Science and Electrical Engineering, UMBC
This talk was delivered at Northrop-Grumman on June 3, 1998.
If you like, you can look at this talk from my Web site,http://www.cs.umbc.edu/~nicholas.
Overview
- Introduction
- Browsers: MIME capability, plug-ins, executable content
- Near-term directions: the demise of URLs, search services, URNs
- Current and future applications: cooperative work, and
agent-based computation
Introduction
There's lots of good web-related material at the World-Wide Web Home.
The on-line documentation for Mosaic has a good
Introduction to HTML.
The growth of the net in general, and the Web in particular, is
extraordinary. So is the growth in public awareness of this topic.
Look at some
Internet connectivity maps.
We can also look at a chart showing the increasing number of
Internet hosts worldwide
A lot of information is available in books for the general user.
More technical material is on the net itself, as RFCs, FYIs, Web
pages, etc. Internet drafts are a useful way of learning about new
ideas in development. See the Internet Notes
Directory at ISI for example.
The variety of Internet tools is increasing all the time. Look at
December's list of
Network Information Retrieval Tools, or a local
copy I made earlier.
What does this all mean?
- Communications infrastructure is on a long-term boom: cable
modems, fiber optics, Digital Subscriber Loops, etc.
- New job descriptions: Webmeisters, HTML Artists -- and new
varieties of hackers!
- Our society is going to change in ways that we don't yet understand.
Browsers
Browsers are getting bigger and more powerful all the time. Some
view this as a good thing. Browsers are also becoming extensible - in
terms of data they can handle, and operations they can perform on that data.
The Web isn't just for HTML anymore! MIME (Multipurpose Internet
Mail Extensions) is defined in RFC 1521 and
1522. (And there's also a
text version.) The set of MIME types seems to be expanding,
although slowly, e.g. pdf files and acroread.
A user's .mailcap file describes how different files are
to be displayed on that user's workstation. Well-behaved Web browsers
pay attention to this file. NCSA has a
description of Mailcap files.
Netscape
Navigator plug-ins take this notion of extensibility further.
Lots of plug-ins are available for Windows 95 and NT, many for Macs,
precious few for UNIX (why?). They're very easy to install (at least
on Macs) - just download the file into a plug-ins folder.
Executable Content
JAVA is generating a lot of excitement, to say the least. I read
recently that 20 Java books have been published, and another 70 are in
preparation. The JAVA language is pretty stable, and hardware
implementations of the basic language (i.e. the virtual machine) are
in the works. The API may still change over the next few months or
years.
JAVA is relatively easy to learn if you know C (and easier still
if you don't know C++ :-)
Netscape (among others) has plans to provide kits for building
Web applications. The underlying distirbuted object technology
(e.g. CORBA) is here or in the works. Take a look at the Java Beans
white paper.
JAVA security issues remain a hot topic. See, for example, the
Java Security page at Princeton. The high points:
- JAVA isn't bullet-proof
- Denial of service attacks are easy to do
- Covert channels exist, so rogue applets can communicate with anybody
- In the absence of a formal security policy, it will be difficult
to verify the security of any implementation (of JAVA or anything else)
Of course, security in general is a hot topic. Take a look at
the Virus
Encyclopedia
Lots of organizations use firewalls. Remember that they reduce
the risk, but they don't eliminate it. Take a look at this
information on Firewalls
The Demise of URLs
As we all know, URLs are long, clumsy, frequently broken, and
hard to remember. A URN is a symbolic name that gets resolved into a
specific URL. URN to URL servers perform this conversion. For much
more on this subject, look at
Parnes' M.S. Thesis.
There's a whole alphabet soup to be invented! On a more serious
note, the UR*
Page is devoted to this topic.
- URA
- Uniform Resource Agent
- URC
- Uniform Resource Citation
- URI
- Uniform Resource Identifier
- URL
- Uniform Resource Locator (everybody uses URL, but URI is
more correct)
- URN
- Uniform Resource Name
A number of URN servers have been proposed and implemented. This
will probably be combined with search services. There are security
concerns with any network entity, but protocols for keeping URNs safe
have been developed. Take a look at
Rowe and Nicholas from WWW4.
Search Services
Been around forever - at least two years.
Centralized search servers are very nice, pretty fast, and
inherently limited. Why?
- Robots need time to travers the Net - on the order of weeks
- Space concerns at the search site
- Comes at a price, in terms of server load; but worth it.
The next generation of search tools will use ideas from the Harvest Information
Discovery and Access System. You can also look at a
Technical Discussion of the Harvest System.
What is interesting about Harvest?
- Gatherers may be run at the provider's site, unlike robots
- Brokers can collect information from many Gatherers, and there
are multiple Brokers - many-to-many mapping
- Brokers can share information
- Gatherers can recognize certain types of files
- The Glimpse IR engine makes small, fast indices
There are some demo
brokers running at Colorado. We are running a Harvest broker at UMBC.
Another step forward in search engines is represented by the SIFT system developed at Stanford
by Yan and Garcia-Molina. The commercial version of SIFT is now
available. You can read the SIFT paper by
Yan and Garcia-Molina.
What is interesting about SIFT?
- Monitors USENET news all the time
- Subscribers enter keywords to create one or more profiles
- The profiles are collected into an IR engine
- Each arriving news article is used as a query against the
collection of profiles - if there`'s a match, that article is queued
for e-mailing to that subscriber
- It's free! at least for now.
The Nature of Knowledge Work is Changing
Collaboration via the Net is a fact. Lots of utilities for this
exist, including MUDs, IRCs, etc. See this demo of the Downtown Baltimore Visual MUD.
I refer you to a paper called MUDs Grow
Up by Pavel Curtis and David Nichols; and Evard's paper on Collaborative Networked
Communication. You might want to look at the HTML version of the
MUD FAQ.
You can try out the LambdaMOO MUD mentioned in Evard's paper.
That MUD happens to run off lambda.xerox.com port 8888.
Use telnet or MUD.el from within emacs.
The Web is changing the way college teaching is done. Take a
look at
some of UMBC`s courses on the Web
Agent-Based Computation
What's the big deal with agents?
- Intelligent agents has become a buzzword.
- An agent has properties of autonomy, accountability,
communication, and specialized knowledge.
- There's a lot going on in this area. See UMBC's Agents page.
Conclusions
The Web isn't finished yet!
The distinction between client and server will continue to blur
Search services will get faster, and smarter
Agents will play a larger role - but what role? Who knows.
This page is the responsibility of Charles
Nicholas.