Here are some ideas for the final project. If you
want to work on any of these topics then talk to me and I will give you
some reading materials.
1. Data stream mining: Design and implement stream mining
algorithms (e.g clustering, classification).
2. Design a privacy-preserving distributed data mining
algorithms that can mine multi-party data from different sites
without looking at the data in its original form.
4. Searching for structures in graphs (e.g. social networks)
for counter-terrorism applications.
5. Design and develop a software architecture for a stream
mining/monitoring system that pays attention to the resource limitations.
6. Develop a distributed outlier detection algorithm.
7. Mining sensor networks.
8. Scientific data mining from distributed data.
9. Privacy-preserving data stream mining algorithms and
systems.
10. Web (click-stream or content) mining for
personalization.
Give a look at this link
for
some of the emerging data mining research directions.
More Suggestions:
1. Distributed Data Mining for Outlier Detection in Sensor Networks or P2P Networks
=================================================================================
Design algorithms for asynchronous distributed algorithms for outlier detection. Experiment with
distributed data set available from the DDMWiki or other sources.
2. Graphical Data Mining in P2P Applications
==========================================
Review the link analysis literature. Design an algorithm for distributed link analysis and experiment
with web-cache data from multiple sources.
3. Grapical Link Analysis for Web Mining
=====================================
Use the Microsoft Web click-stream data for performing link analysis. Develop algorithms and
experiment. This data set is available from the UCI KDD Archive.
4. Text Classification and Clustering using "Interesting" Representation Construction Techniques
==============================================================================================
Use the Text data sets available from UCI KDD Archive and apply innovative representation
construction techniques for clustering and classification applications. Develop algorithms and
experiment.
4. Netflix Data Mining
====================
Netflix 1 competition data set is publicly avaiable. They offered a $1,000,000 Netflix Prize to the winner.
Take a shot at the : http://www.netflixprize.com/
Note: They just announced Netflix 2.
5. Power Sensitive Data Mining Algorithms for Mobile Applications
==============================================================
The goal of these project will be to analyze the power consumption
characteristics of data mining algorithms based on existing existing
models of power consumption and try to optimize those algorithms. If you
are interested in doing some hardware related work using an Osciloscope
I can provide you access to that.
6. Mining Geo-Spatial Emissions Data
====================================
Give a look at the following site: http://www.eia.doe.gov/
It offers lots of interesting data sets. I can also provide you more data if needed. You should be able to
cluster, learn predictive model, or detect outliers from these data sets.
7. Health Data Mining
==================
Applying data mining algorithms to analyze health care data sets: http://www.ehdp.com/vitalnet/datasets.htm
You can possibly link that with census data.
****************************
A Useful KDD Data Source:
UCI KDD Data Archive: http://kdd.ics.uci.edu/summary.data.application.html