One possible approach is to personalize the web space -- create a system which responds to user queries by potentially aggregating information from several sources in a manner which is dependent on who the user is. As a trivial example - a European querying on casinos is probably better served by URLs pointing to Monaco, whereas someone in North America should get URLs pointing to Las Vegas. A biologist querying on cricket in all likelihood wants something other than a sports enthusiast would.
Existing commercial systems seek to do some minimal personalization based on declarative information directly provided by the user, such as their zip code, or keywords describing their interests, or specific URLs, or even particular pieces of information they are interested in (e.g. price for a particular stock). Our research aims at creating systems that (semi) automatically tailor the content delivered to the user from a web site. We do so by mining the web -- both the contents, as well as the users' interaction.
Web mining, when looked upon in data mining
terms, can be said to have three operations of interests - clustering (finding
natural groupings of users, pages etc.), associations (which URLs tend
to be requested together), and sequential analysis (the order in which
URLs tend to be accessed). As in most real-world problems,
the clusters and associations in Web mining do not have crisp boundaries.
and often overlap considerably.
In addition, bad exemplars (outliers) and incomplete data can easily occur
in the data set, due to a wide variety of reasons inherent to web browsing
and logging. Thus, Web Mining and Personalization requires modeling of
an unknown number of overlapping sets in the presence of significant noise
and outliers, (i. e., bad exemplars). Moreover, the data sets in Web Mining
are extremely large.