Peer-to-Peer Client-Side Web Mining
Peer-to-peer (P2P) systems such as Gnutella, Napster, e-Mule, Kazaa, and Freenet are increasingly becoming
popular for many applications that go beyond downloading music without paying for it. Examples include P2P
systems for network storage, web caching, searching and indexing of relevant documents and distributed
network-threat analysis. Novel data integration applications such as P2P web mining from the data stored in
the browser cache of different machines connected via a peer-to-peer network may revolutionize the business
of Internet search engines. A peer-to-peer data clustering algorithm that groups the URL-s visited by each
user (with due privacy-protection) corresponding to different subjects by exchanging information with other
peers can be very useful for discovering web-usage patterns of users and efficient web-search. This may
help characterizing each user based on their browsing pattern, and forming communities of peers with
similar interests. There can be many other similar interesting information integration and knowledge
discovery applications involving data distributed in a P2P network.
Data analysis plays an important role in most non-trivial information integration and retrieval
applications. However, most of the off-the-shelf data analysis/mining techniques are designed for
centralized applications where all the data are stored in a single central place. They do not work in a
highly decentralized, distributed environment like a P2P network. We need distributed data mining
algorithms that are fundamentally decentralized, asynchronous, communication efficient, and scalable.
This research is developing a novel P2P web mining system. It is developing distributed algorithms
for an early prototype of a web-browser plug-in to support P2P information retrieval and data analysis.
Please visit out