ADAPTIVE DATA REPLICATION IN DISTRIBUTED SYSTEMS

Prof. Ouri Wolfson
Department of Electrical Engineering and Computer Science
University of Illinois, Chicago
Chicago, IL 60607
Phone: (312) 996-6770
Fax : (312) 413-0024
Email: wolfson@eecs.uic.edu

Prof. K. Kalpakis
Department of Computer Science and Electrical Engineering
University of Maryland Baltimore County
Baltimore, MD 21250
Phone: (410) 455-3143
Fax : (410) 455-3969
Email: kalpakis@csee.umbc.edu

WWW PAGE

Adaptive Data Replication
http://www.eecs.uic.edu/~wolfson/html/replic.html
http://www.csee.umbc.edu/~kalpakis/ADR

Keywords

Adaptive data replication, data caching, dynamic file allocation, consistency and recovery, data dissemination.

Project Award Information

Project Summary

Adaptive replication is a generalization of the principle of caching. In adaptive replication the number of copies of an object, the location of these copies, and which updates are propagated to each copy, change dynamically depending on a given cost function.

This project establishes the fundamental principles of adaptive information replication and dissemination in distributed systems. It also  develops and analyzes a system for adaptive replication of objects. The system consists of a set of algorithms to be used for various cost functions. For the analysis the investigators are building a simulation testbed  which  enables comparison among the various adaptive and static replication algorithms. The project will improve the understanding, cost, and performance of distributed systems and distributed databases. It can lead to a more efficient Internet and World-Wide-Web. Currently the results are being extended to the wireless WWW.

Publications and Products

The principles of Adaptive Replication have been established (please see Project references 1-11 below). Due to space limitations, here we will elaborate only on some of our results. We developed two new algorithms for optimal placement of replicas in tree networks, taking into account the read, write, and storage costs. One is using a Minimum Spanning Tree write-propagation policy, and the other is using a Steiner Minimum Tree write-propagation policy (see references 4 and 5 below). These results shed light on the relationship between the write propagation policy and the  minimum cost replication scheme.  In addition, we developed a new algorithm for minimum read-write-storage cost replication on a large family of networks, namely networks with bounded treewidth (which include among others planar networks with bounded diameter). We published this result in reference 6 below. Furthermore, a new approach for maintaining replicated redirection services in Web-based information systems, aiming at minimizing client response time and network overhead while reducing the load imposed upon the data servers, has been developed. This approach is based on the ADR protocol and a protocol by Rabinovich et al (ICDCS 1999) (see reference 7 below). We have also investigated the relationships between adaptive replication and gossiping in wireless broadcast systems (references 8,  9, and 11). We determined that the cost based approach bridges between the two methodologies. The core modules of the simulation testbed have been implemented in Java, and the system is currently being used to experimentally study various replication protocols. A GUI for our simulator is under development. An extension of the simulator for wireless environments has been established.

Goals, Objectives, and Targeted Activities

The project has the following objectives. First, establish the fundamental principles of adaptive information replication and dissemination in distributed systems. Second, develop a system of algorithms for adaptive replication of objects, and a cost based system to compare the algorithms. The third objective  is to build a simulation testbed in which to evaluate the algorithms using real data such as WWW access traces. The last objective is to extend the results to the wireless internet.
 

Project Impact

Project References

1. P. Sistla, O. Wolfson, Y. Yesha, R. Sloan,"Towards a Theory of Cost Management for Digital Libraries and Electronic Commerce",  ACM Transactions on Database Systems (TODS),  Vol. 23(4),    Dec. 98,   pp. 411-452.
2. O. Wolfson, Y. Huang,"Competitive Analysis of Caching in Distributed Databases", IEEE Transactions on Parallel and Distributed Systems, 9(4), Apr. 1998, pp. 391-409.
3. P. Sistla, O. Wolfson, Y. Huang,"Minimization of Communication Cost Through Caching in Mobile Environments", IEEE Transactions on Parallel and Distributed Systems, 9(4), Apr. 1998, pp. 378-390.
4. K. Kalpakis, K. Dasgupta, and O. Wolfson, "Optimal Placement of Replicas in Trees with Read-Write Costs and Capacity Constraints", accepted for publication in the IEEE Transactions on Paralell and Distributed Systems (2001).
5. K. Kalpakis, K. Dasgupta, and O. Wolfson, Minimum Cost Placement of Replicas in Tree Networks using the Steiner--Write Policy, accepted for publication in the Proceedings of the International Database Engineering and Applications Symposium, Grenoble, France, July 16--18, 2001.
6. K. Kalpakis and K. Dasgupta, Data Replication on Networks with Bounded Treewidth, submitted for publication.
7. K. Dasgupta and K. Kalpakis, Maintaining Replicated Redirection Services in Web-based Information Systems, accepted for publication in the Proceedings of the 2nd IEEE Workshop on Internet Applications, San Jose, CA, July 23--24, 2001.
8. B. Xu, O. Wolfson, S. Chamberlain, "Cost Based Data Dissemination in  Broadcast Networks",  Springer Verlag Lecture Notes in Computer Science, number 1973, Proceedings of 8th International Conference on Database Theory(ICDT01), London, UK,  4-6 January, 2001, pp. 114-128.
9. B. Xu, O. Wolfson, S. Chamberlain, "Spatially Distributed Databases on Sensors", Proceedings of The 8th ACM Symposium on Advances in Geographic Information Systems, Washington DC, Nov. 2000, pp. 153-160.
10. O. Wolfson, S, Jajodia, and Y. Huang, An Adaptive Data Replication Algorithm, ACM Transactions on Database Systems, Vol. 22(4), June 1997, pp. 255-314.
11.  B. Xu, O. Wolfson, S. Chamberlain, N. Rishe,  "Cost Based Data Dissemination in Satellite Networks", accepted, to appear in ACM/Baltzer Journal on Special Topics in Mobile Networks and Applications, Special Issue on Satelite Based Information Systems, 6(2001), pp. 51-70.
12. O. Wolfson, "Infrastructure and cost models for digital libraries", ACM Computing Surveys (Electronic version). Vol. 28A, Dec. 1996.
13. "Strategic Directions in Electronic Commerce and Digital Libraries: Towards a Digital Agora", ACM Computing Surveys, Vol.28(4), Dec. 1996, pp. 818-835.
14. P. Sistla, O. Wolfson, et al, An Architecture for Consumer-Oriented Online Database Services. Proceedings of the 6th International Workshop on Research Issues in Data Engineering: Interoperability of Nontraditional Database Systems, New Orleans, LA, Feb. 1996
15. O. Wolfson and S. Jajodia, "An Algorithm for Dynamic Data Allocation in Distributed Systems" Information Processing Letters, Vol. 53, 1995, pp. 113-119.
16. Y. Huang, O. Wolfson, A Competitive Dynamic Data Replication Algorithm. Proceedings of the 9th International Conf. on Data Engineering, pp. 310-317, April 1993, Vienna, Austria
17. Y. Huang, R. Sloan, O. Wolfson, Divergence Caching in Client-Server Architectures. In Proceedings of th2:00 PM 3/29/2001e 1:50 PM 3/29/2001 third International Conference on Parallel and Distributed Information Systems (PDIS), Austin, TX, Sept. 1994, pp. 131-139.
18. O. Wolfson and S. Jajodia, An Algorithm for Dynamic Data Distribution. In Proceedings of the 2nd Workshop on the Management of Replicated Data (WMRD-II), Monterey, CA, Nov. 1992.
19. O. Wolfson and S. Jajodia, Distributed Algorithms for Adaptive Replication of Data. In Proceedings of the 11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San Diego, CA, June 1992.
20. S. Dao, E. Shek, A. Vellaikal, R. Muntz, L. Zhang, M. Potkonjak, O. Wolfson, "Semantic Multicast: Intelligently Sharing Collaborative Sessions", ACM Computing Surveys, Vol 3(2es), 1999.
 

Area Background

Adaptive replication is a generalization of the principle of caching. Caching is used to improve the performance and availability of data systems by maintaining multiple copies/replicas of objects in a dynamic fashion. A typical question in caching is: "When and where should objects be cached to optimize performance and availability?" Our contribution is that we add the concept of cost functions to caching and we attempt to distinguish the principles of caching from the associated cost functions.

Area References

A very short list of references (due to space limitations):
O. Wolfson, S. Jajodia, and Y. Huang, "An Adaptive Data Replication Algorithm", ACM Transactions on Database Systems, 22:4, 1997.
J. Sidel, P. Aoki, A. Sah, C. Staelin, M. Stonebraker, and A. Yu, "Data Replication in Mariposa", 12th Intl. Conf. on Data Engineering, Feb 1996.
C. Bowman, P. Danzig, D. Hardy, U. Manber, and M. Schwartz, "The Harvest Information Discovery and Access System", Computer Networks and ISDN Systems, 28:1, 1995.
A. El Abbadi, "Adaptive Protocols for Managing Replicated Distributed Databases", in the Symposium on Parallel and Distributed Processing, 1991.
R. Alonso, D. Barbaba, and H. Garcia-Molina, "Data caching issues in an information retrieval system", ACM Transactions on Database Systems, 15:3, 1990.
L. W. Dowdy and D.V. Foster, "Comparative Models of the File Assignment Problem", ACM Computing Surveys, 14:2, 1982.