Masters Thesis Defense

Approximation of Nonintegral Frequency Moments

Brian Pilz

10:00am 30 November 2011, ITE325b

Let a data stream have length m over an alphabet of n letters, with letter i occurring m_i times for i = 1,…,n. For any k, define the frequency moments F_k as F_k = sum_{i=1}^n m_i^k. Alon, Matias, and Szegedy showed how to estimate F_k for integers k>0 with a one-pass algorithm using O(n^{1-1/k}log n) space for given length m, accuracy, and confidence. Here we extend those results to non-integral k obtaining bounds on the variance giving accuracy and confidence estimates, and giving quantitative results on the algorithm’s space requirements with particular interest to when k is near 1. We also give some performance statistics of the algorithm for these cases and consider an application to entropy estimation. This algorithm is known as a sketching algorithm. Sketching algorithms are probabilistic algorithms generally requiring sublinear space vs. a "classical" O(n) (linear) space requirement, and may have applications for anomaly detection of systems or networks.

Committee:

  • Drs. Samuel Lomonaco
  • Brooke Stephens
  • Kostas Kalpakis (chair)
  • Larry Wagoner