ADAPTIVE DATA REPLICATION IN DISTRIBUTED
SYSTEMS
Prof. Ouri Wolfson
Department of Electrical Engineering and Computer Science
University of Illinois, Chicago
Chicago, IL 60607
Phone: (312) 996-6770
Fax : (312) 413-0024
Email: wolfson@eecs.uic.edu
Prof. K. Kalpakis
Department of
Computer Science and Electrical Engineering
University
of Maryland Baltimore County
Baltimore, MD 21250
Phone: (410) 455-3143
Fax : (410)
455-3969
Email: kalpakis@csee.umbc.edu
WWW PAGE
Adaptive Data Replication
http://www.eecs.uic.edu/~wolfson/html/replic.html
http://www.csee.umbc.edu/~kalpakis/ADR
Keywords
Adaptive data replication,
data caching, dynamic file allocation, consistency and recovery, data
dissemination.
Project Award Information
- Award Number: IRI-9712967 (Wolfson) and IRI-9729495
(Kalpakis)
- Duration: 10/01/1997-09/30/2001.
- Title: Adaptive Data Replication in Distributed
Systems
Project Summary
Adaptive replication
is a generalization of the principle of caching. In adaptive replication the
number of copies of an object, the location of these copies, and which updates
are propagated to each copy, change dynamically depending on a given cost
function.
This project establishes the fundamental principles of adaptive
information replication and dissemination in distributed systems. It also
develops and analyzes a system for adaptive replication of objects. The system
consists of a set of algorithms to be used for various cost functions. For the
analysis the investigators are building a simulation testbed which
enables comparison among the various adaptive and static replication algorithms.
The project will improve the understanding, cost, and performance of distributed
systems and distributed databases. It can lead to a more efficient Internet and
World-Wide-Web. Currently the results are being extended to the wireless
WWW.
Publications and Products
The
principles of Adaptive Replication have been established (please see Project
references 1-11 below). Due to space limitations, here we will elaborate only on
some of our results. We developed two new algorithms for optimal placement of
replicas in tree networks, taking into account the read, write, and storage
costs. One is using a Minimum Spanning Tree write-propagation policy, and the
other is using a Steiner Minimum Tree write-propagation policy (see references 4
and 5 below). These results shed light on the relationship between the write
propagation policy and the minimum cost replication scheme. In
addition, we developed a new algorithm for minimum read-write-storage cost
replication on a large family of networks, namely networks with bounded
treewidth (which include among others planar networks with bounded diameter). We
published this result in reference 6 below. Furthermore, a new approach for
maintaining replicated redirection services in Web-based information systems,
aiming at minimizing client response time and network overhead while reducing
the load imposed upon the data servers, has been developed. This approach is
based on the ADR protocol and a protocol by Rabinovich et al (ICDCS 1999) (see
reference 7 below). We have also investigated the relationships between adaptive
replication and gossiping in wireless broadcast systems (references 8, 9,
and 11). We determined that the cost based approach bridges between the two
methodologies. The core modules of the simulation testbed have been implemented
in Java, and the system is currently being used to experimentally study various
replication protocols. A GUI for our simulator is under development. An
extension of the simulator for wireless environments has been
established.
Goals, Objectives, and Targeted Activities
The project has the following objectives. First, establish the
fundamental principles of adaptive information replication and dissemination in
distributed systems. Second, develop a system of algorithms for adaptive
replication of objects, and a cost based system to compare the algorithms. The
third objective is to build a simulation testbed in which to evaluate the
algorithms using real data such as WWW access traces. The last objective is to
extend the results to the wireless internet.
Project Impact
- Human Resources: A graduate student is working as a Research
Assistant at UIC. Another Ph.D. student, which passed his qualifier exams, is
currently working on this project at UMBC.
- Education and curriculum development: Algorithms from this
project are being discussed in our graduate Operating/Distributed Systems
courses, and class projects based on this project have been assigned.
- Commercialization and technology transfer efforts are
currently under way.
Project References
1. P. Sistla, O.
Wolfson, Y. Yesha, R. Sloan,"Towards a Theory of
Cost Management for Digital Libraries and Electronic Commerce", ACM
Transactions on Database Systems (TODS), Vol. 23(4), Dec.
98, pp. 411-452.
2. O. Wolfson,
Y. Huang,"Competitive Analysis
of Caching in Distributed Databases", IEEE Transactions on Parallel and
Distributed Systems, 9(4), Apr. 1998, pp. 391-409.
3. P. Sistla, O. Wolfson, Y. Huang,"Minimization of
Communication Cost Through Caching in Mobile Environments", IEEE
Transactions on Parallel and Distributed Systems, 9(4), Apr. 1998, pp.
378-390.
4. K. Kalpakis, K. Dasgupta, and O. Wolfson,
"Optimal Placement of Replicas in Trees with Read-Write Costs and Capacity
Constraints", accepted for publication in the IEEE Transactions on Paralell
and Distributed Systems (2001).
5. K. Kalpakis, K.
Dasgupta, and O. Wolfson, Minimum Cost Placement of Replicas in Tree Networks
using the Steiner--Write Policy, accepted for publication in the Proceedings
of the International Database Engineering and Applications Symposium, Grenoble,
France, July 16--18, 2001.
6. K. Kalpakis and K.
Dasgupta, Data Replication on Networks with Bounded Treewidth, submitted
for publication.
7. K. Dasgupta and K. Kalpakis,
Maintaining Replicated Redirection Services in Web-based Information
Systems, accepted for publication in the Proceedings of the 2nd IEEE
Workshop on Internet Applications, San Jose, CA, July 23--24, 2001.
8. B. Xu, O. Wolfson, S. Chamberlain, "Cost Based Data
Dissemination in Broadcast Networks", Springer Verlag Lecture Notes
in Computer Science, number 1973, Proceedings of 8th International Conference on
Database Theory(ICDT01), London, UK, 4-6 January, 2001, pp.
114-128.
9. B. Xu, O. Wolfson, S. Chamberlain,
"Spatially Distributed Databases on Sensors", Proceedings of The 8th ACM
Symposium on Advances in Geographic Information Systems, Washington DC, Nov.
2000, pp. 153-160.
10. O. Wolfson, S,
Jajodia, and Y. Huang, An Adaptive Data Replication Algorithm, ACM
Transactions on Database Systems, Vol. 22(4), June 1997, pp. 255-314.
11. B. Xu, O. Wolfson, S. Chamberlain, N. Rishe,
"Cost Based Data Dissemination in Satellite Networks", accepted, to appear in
ACM/Baltzer Journal on Special Topics in Mobile Networks and Applications,
Special Issue on Satelite Based Information Systems, 6(2001), pp. 51-70.
12.
O. Wolfson, "Infrastructure and cost models for digital libraries", ACM
Computing Surveys (Electronic version). Vol. 28A, Dec. 1996.
13. "Strategic Directions in Electronic Commerce and Digital
Libraries: Towards a Digital Agora", ACM Computing Surveys, Vol.28(4),
Dec. 1996, pp. 818-835.
14. P. Sistla, O.
Wolfson, et al, An Architecture for Consumer-Oriented Online Database
Services. Proceedings of the 6th International Workshop on Research Issues
in Data Engineering: Interoperability of Nontraditional Database Systems, New
Orleans, LA, Feb. 1996
15. O. Wolfson and S. Jajodia,
"An Algorithm for Dynamic Data Allocation in Distributed Systems" Information
Processing Letters, Vol. 53, 1995, pp. 113-119.
16. Y. Huang, O. Wolfson,
A Competitive Dynamic Data Replication Algorithm. Proceedings of the 9th
International Conf. on Data Engineering, pp. 310-317, April 1993, Vienna,
Austria
17. Y. Huang, R. Sloan, O. Wolfson,
Divergence Caching in Client-Server Architectures. In Proceedings of th2:00 PM 3/29/2001e 1:50 PM 3/29/2001
third International Conference on Parallel and Distributed Information Systems
(PDIS), Austin, TX, Sept. 1994, pp. 131-139.
18. O. Wolfson and S. Jajodia, An
Algorithm for Dynamic Data Distribution. In Proceedings of the 2nd Workshop
on the Management of Replicated Data (WMRD-II), Monterey, CA, Nov. 1992.
19. O. Wolfson and S. Jajodia, Distributed
Algorithms for Adaptive Replication of Data. In Proceedings of the 11th ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San
Diego, CA, June 1992.
20. S. Dao, E. Shek, A. Vellaikal, R. Muntz, L. Zhang, M. Potkonjak, O. Wolfson, "Semantic Multicast:
Intelligently Sharing Collaborative Sessions", ACM Computing Surveys, Vol
3(2es), 1999.
Area Background
Adaptive replication
is a generalization of the principle of caching. Caching is used to improve the
performance and availability of data systems by maintaining multiple
copies/replicas of objects in a dynamic fashion. A typical question in caching
is: "When and where should objects be cached to optimize performance and
availability?" Our contribution is that we add the concept of cost functions to
caching and we attempt to distinguish the principles of caching from the
associated cost functions.
Area References
A very short list of
references (due to space limitations):
O. Wolfson, S.
Jajodia, and Y. Huang, "An Adaptive Data
Replication Algorithm", ACM Transactions on Database Systems, 22:4,
1997.
J. Sidel, P. Aoki, A. Sah, C. Staelin, M.
Stonebraker, and A. Yu, "Data Replication in Mariposa", 12th Intl. Conf.
on Data Engineering, Feb 1996.
C. Bowman, P. Danzig, D.
Hardy, U. Manber, and M. Schwartz, "The Harvest Information Discovery and Access
System", Computer Networks and ISDN Systems, 28:1, 1995.
A. El Abbadi, "Adaptive Protocols for Managing Replicated Distributed
Databases", in the Symposium on Parallel and Distributed Processing,
1991.
R. Alonso, D. Barbaba, and H. Garcia-Molina,
"Data caching issues in an information retrieval system", ACM
Transactions on Database Systems, 15:3, 1990.
L. W.
Dowdy and D.V. Foster, "Comparative Models of the File Assignment
Problem", ACM Computing Surveys, 14:2, 1982.