Readings
Students are expected to read the following PDF documents and post a comment or question on the associated Piazza topic. All readings count as a single homework grade (100 pts), and all readings are weighted equally.
Due before the Midterm
Due before the Final
- Accumulo Manual: Minimum of Chapters 1-4 and 8, the rest is optional but recommended
- Select sections of the Storm Documentation:
Recommended Books
Students are not required to purchase or otherwise read the following list of books, but they are pretty good references for those interested in learning more about Hadoop technologies.
- Tom White. "Hadoop: The Definitive Guide. Third Edition." O'Reilly Media, 2012.
- Donald Miner and Adam Shook. "MapReduce Design Patterns." O'Reilly Media, 2012.
- Eric Sammer. "Hadoop Operations." O'Reilly Media, 2012.
- Alan Gates. "Programming Pig." O'Reilly Media, 2011.
Recommended Whitepapers
Students are encouraged to read the following papers. Students who read the paper and submit a brief one-page summary and a question about the paper via email will receive ten points of extra credit towards their homework grade. They can be turned in at any time prior to the final exam.
- Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS Operating Systems Review. Vol. 37. No. 5. ACM, 2003.
- Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
- Chang, Fay, et al. "Bigtable: A distributed storage system for structured data." ACM Transactions on Computer Systems (TOCS) 26.2 (2008): 4.
- Elsayed, Tamer, Jimmy Lin, and Douglas W. Oard. "Pairwise document similarity in large collections with MapReduce." Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 2008.
- Lin, Jimmy, and Michael Schatz. "Design patterns for efficient graph algorithms in MapReduce." Proceedings of the Eighth Workshop on Mining and Learning with Graphs. ACM, 2010.