In what follows, I have listed information useful for researchers who
are starting out in this field. My survey is also slightly biased towards Web Usage Mining.
NGDM - *Using
Mining for and on the Semantic Web* - Gerd Stumme
- Discusses how Semantic Web/Ontologies can
improve Web Usage Mining and the use of Mining techniques for the
Semantic Web. In the Semantic Web links carry meaning and this can be
used effectively for web structure mining. More detailed explanation
follows later under Dr. Berendt.
Researchers active in this field are listed below along with links to
their highly cited papers. Please note that papers that I could briefly
go through and found interesting have been marked using *"Publication"*.
*Mining
Association Patterns in Web Usage Data* (2002) - Pang-Ning Tan, Vipin Kumar Compares current data mining techniques for non-Web data and
suggests why they are not sufficient. Comes up with refinements to the
association rules and effective ways of removing "Web Robot" data from
logs. Finding negative associations are also handled/talked about in
detail for classification of user groups.
Is very active in this field. He is currently teaching a course on web data mining
at Depaul. He has a number of publications listed in his publications
page.
4. Cyrus Shahabi, Leila Kaghazian,
Soham Mehta, Amol Ghoting, Gautam Shanbhag, and Margaret L. McLaughlin, Understanding
of User Behavior in Immersive Environments , In Touch in
Virtual Environments: Haptics and the Design of Interactive Systems,
Margaret L. McLaughlin , Joao Hespanha , and Gaurav Sukhatme, Editors,
All of University of Southern California Prentice Hall, ISBN
0-13-065097-8
Has worked on Semantic Web mining and a paper authored by her is listed
in my first 3 papers.
NGDM - *Using
Mining for and on the Semantic Web* - Gerd Stumme
- Discusses how Semantic Web/Ontologies can
improve Web Usage Mining and the use of Mining techniques for the
Semantic Web. In the Semantic Web links carry meaning and this can be
used effectively for web structure mining.
In the Usage Mining area additional applications are discussed in
detail. URL's requested can be mapped to a site concept hierarchy (
taxonomy of the site ) to come up with additional relationships. They
argue that this approach will help in better understanding the
requirements of the user. A simple example being this : A user searches
for a particular information on a web page .. say
www.umbc.edu/search.html?search="graduate courses". There should be some
way to map this request by a user to a ontology having "graduate
courses". Just knowing that the user browsed "search.html" will not be
successful in capturing user requirements. So the key here is that
.."what is requested" by the user has to be captured into an ontology
and not just "what was served by website".
Other issues are also discussed along with the use of mining techniques
for the Semantic Web. A very good idea here is that usage navigation
patterns can be used to build Semantics to a website, i.e by usage
pattern we will know that the users of this website expect the following
Semantics , Link Structure for Semantic Annotation of this website.
I could not get hold of a recent paper:
Berendt, B., Günther, O., &
Spiekermann, S. (in press). Privacy in E-Commerce: Stated preferences
vs. actual behavior. To appear in Communications of the ACM. But the bottomline is that people are looking at privacy
issues seriously.
Lan Yi, Bing Liu. "Eliminating
Noisy Information in Web Pages for Data Mining." To appear Proceedings
of the ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining (KDD-2003), Washington, DC, USA, August 24 - 27, 2003.
- Talks about pruning websites for better mining. ( HTML pages )
Bing Liu, Chee Wee Chin, Hwee Tou Ng. "Mining
Topic-Specific Concepts and Definitions on the Web." To appear in Proceedings
of the twelfth international World Wide Web conference (WWW-2003),
20-24 May 2003, Budapest, HUNGARY.
- Paper talks about searching for all information related to a
particular topic using web mining methods, to bring about an improvement
on search engines.
Bing Liu, Kaidi Zhao, and Lan Yi. "*Visualizing
Web site comparisons*." Proceedings of the Eleventh
International World Wide Web Conference (WWW-2002). Honolulu,
Hawaii, USA 7-11 May 2002.
- Proposes comparison of 2 websites to figure out their structure and
other attributes. Mostly useful to figure out why your competitor is
doing better business that you are.
Ming-Syan Chen -
National Taiwan Univ
Some of his papers .. mainly web information hierarchy mining
1. H.-Y. Kao, S.-H. Lin, J.-M. Ho and M.-S. Chen, ``Mining
Web Information Structures and Contents based on Entropy Analysis,''IEEE
Trans. on Knowledge and Data Engineering, Vol. 16, No. 1,
January 2004
- Suggests web structure mining techniques to better organize a
website's link structure.
1. Manolopoulos, Y., Morzy, M., Morzy,
T., Nanopoulos, A., Wojciechowski, M., Zakrzewicz, M.: "; Indexing
Techniques for Web Access Logs, " chapter in the book
"Web Information Systems," IDEA Group Inc., to appear, 2003
2. Nanopoulos A., Katsaros
D. and Manolopoulos Y.: "A Data Mining
Algorithm for Generalized Web Prefetching", IEEE Transactions on
Knowledge and Data Engineering, vol. 15, no. 5, Sep./Oct. 2003
3. Nanopoulos A.,
Zakrzewicz M., Morzy T., Manolopoulos Y.: "Indexing Web
Access-Logs for Pattern Queries", Proceedings 4th International
Workshop on Web Information and Data Management (WIDM'02), pp.398-404,
McLean, VA, 2002.
4. Nanopoulos A., Katsaros
D. and Manolopoulos Y.: "Exploiting Web
Log Mining for Web Cache Enhancement", Lecture Notes on Artificial
Intelligence (LNAI), Springer-Verlag, vol. 2356, pp. 68-87, 2002.
5. Nanopoulos A., Katsaros
D. and Manolopoulos Y.: “Effective
Prediction of Web-user Accesses: a Data Mining Approach",
Proceedings Conference on Mining Log Data Across All Customer
Touchpoints (WebKDD), San Francisco, 2001 Information about publications gathered from
citeseer
I have filtered irrelevant one's .. and listed them under authors if I
have identified the authors separately.
Papers published in WebKDD
workshops are a good resource . The most recent being WebKDD 2003.
WEBKDD 2003
Proceeding available here :
http://www.acm.org/sigkdd/proceedings/webkdd03/