Technology Seminar - the Future of the Web

Charles Nicholas, Associate Professor, Department of Computer Science and Electrical Engineering, UMBC

This talk was delivered at Northrop-Grumman on June 3, 1998.

If you like, you can look at this talk from my Web site,http://www.cs.umbc.edu/~nicholas.

Overview


Introduction

  • There's lots of good web-related material at the World-Wide Web Home.
  • The on-line documentation for Mosaic has a good Introduction to HTML.
  • The growth of the net in general, and the Web in particular, is extraordinary. So is the growth in public awareness of this topic. Look at some Internet connectivity maps. World map showing net connectivity
  • We can also look at a chart showing the increasing number of Internet hosts worldwide Chart showing growth in sites
  • A lot of information is available in books for the general user. More technical material is on the net itself, as RFCs, FYIs, Web pages, etc. Internet drafts are a useful way of learning about new ideas in development. See the Internet Notes Directory at ISI for example.
  • The variety of Internet tools is increasing all the time. Look at December's list of Network Information Retrieval Tools, or a local copy I made earlier.
  • What does this all mean?
    1. Communications infrastructure is on a long-term boom: cable modems, fiber optics, Digital Subscriber Loops, etc.
    2. New job descriptions: Webmeisters, HTML Artists -- and new varieties of hackers!
    3. Our society is going to change in ways that we don't yet understand.

    Browsers

  • Browsers are getting bigger and more powerful all the time. Some view this as a good thing. Browsers are also becoming extensible - in terms of data they can handle, and operations they can perform on that data.
  • The Web isn't just for HTML anymore! MIME (Multipurpose Internet Mail Extensions) is defined in RFC 1521 and 1522. (And there's also a text version.) The set of MIME types seems to be expanding, although slowly, e.g. pdf files and acroread.
  • A user's .mailcap file describes how different files are to be displayed on that user's workstation. Well-behaved Web browsers pay attention to this file. NCSA has a description of Mailcap files.
  • Netscape Navigator plug-ins take this notion of extensibility further. Lots of plug-ins are available for Windows 95 and NT, many for Macs, precious few for UNIX (why?). They're very easy to install (at least on Macs) - just download the file into a plug-ins folder.

    Executable Content

  • JAVA is generating a lot of excitement, to say the least. I read recently that 20 Java books have been published, and another 70 are in preparation. The JAVA language is pretty stable, and hardware implementations of the basic language (i.e. the virtual machine) are in the works. The API may still change over the next few months or years.
  • JAVA is relatively easy to learn if you know C (and easier still if you don't know C++ :-)
  • Netscape (among others) has plans to provide kits for building Web applications. The underlying distirbuted object technology (e.g. CORBA) is here or in the works. Take a look at the Java Beans white paper.
  • JAVA security issues remain a hot topic. See, for example, the Java Security page at Princeton. The high points:
    1. JAVA isn't bullet-proof
    2. Denial of service attacks are easy to do
    3. Covert channels exist, so rogue applets can communicate with anybody
    4. In the absence of a formal security policy, it will be difficult to verify the security of any implementation (of JAVA or anything else)
  • Of course, security in general is a hot topic. Take a look at the Virus Encyclopedia
  • Lots of organizations use firewalls. Remember that they reduce the risk, but they don't eliminate it. Take a look at this information on Firewalls

    The Demise of URLs

  • As we all know, URLs are long, clumsy, frequently broken, and hard to remember. A URN is a symbolic name that gets resolved into a specific URL. URN to URL servers perform this conversion. For much more on this subject, look at Parnes' M.S. Thesis.
  • There's a whole alphabet soup to be invented! On a more serious note, the UR* Page is devoted to this topic.
    URA
    Uniform Resource Agent
    URC
    Uniform Resource Citation
    URI
    Uniform Resource Identifier
    URL
    Uniform Resource Locator (everybody uses URL, but URI is more correct)
    URN
    Uniform Resource Name
  • A number of URN servers have been proposed and implemented. This will probably be combined with search services. There are security concerns with any network entity, but protocols for keeping URNs safe have been developed. Take a look at Rowe and Nicholas from WWW4.

    Search Services

  • Been around forever - at least two years.
  • Centralized search servers are very nice, pretty fast, and inherently limited. Why?
    1. Robots need time to travers the Net - on the order of weeks
    2. Space concerns at the search site
    3. Comes at a price, in terms of server load; but worth it.
  • The next generation of search tools will use ideas from the Harvest Information Discovery and Access System. You can also look at a Technical Discussion of the Harvest System.
  • What is interesting about Harvest?
    1. Gatherers may be run at the provider's site, unlike robots
    2. Brokers can collect information from many Gatherers, and there are multiple Brokers - many-to-many mapping
    3. Brokers can share information
    4. Gatherers can recognize certain types of files
    5. The Glimpse IR engine makes small, fast indices
  • There are some demo brokers running at Colorado. We are running a Harvest broker at UMBC.
  • Another step forward in search engines is represented by the SIFT system developed at Stanford by Yan and Garcia-Molina. The commercial version of SIFT is now available. You can read the SIFT paper by Yan and Garcia-Molina.
  • What is interesting about SIFT?
    1. Monitors USENET news all the time
    2. Subscribers enter keywords to create one or more profiles
    3. The profiles are collected into an IR engine
    4. Each arriving news article is used as a query against the collection of profiles - if there`'s a match, that article is queued for e-mailing to that subscriber
    5. It's free! at least for now.

    The Nature of Knowledge Work is Changing

  • Collaboration via the Net is a fact. Lots of utilities for this exist, including MUDs, IRCs, etc. See this demo of the Downtown Baltimore Visual MUD.
  • I refer you to a paper called MUDs Grow Up by Pavel Curtis and David Nichols; and Evard's paper on Collaborative Networked Communication. You might want to look at the HTML version of the MUD FAQ.
  • You can try out the LambdaMOO MUD mentioned in Evard's paper. That MUD happens to run off lambda.xerox.com port 8888. Use telnet or MUD.el from within emacs.
  • The Web is changing the way college teaching is done. Take a look at some of UMBC`s courses on the Web

    Agent-Based Computation

  • What's the big deal with agents?
    1. Intelligent agents has become a buzzword.
    2. An agent has properties of autonomy, accountability, communication, and specialized knowledge.
    3. There's a lot going on in this area. See UMBC's Agents page.

    Conclusions

  • The Web isn't finished yet!
  • The distinction between client and server will continue to blur
  • Search services will get faster, and smarter
  • Agents will play a larger role - but what role? Who knows.
    This page is the responsibility of Charles Nicholas.