Events
The Taub Faculty of Computer Science Events and Talks
Matan Liram (M.Sc. Thesis Seminar)
Thursday, 30.03.2017, 11:00
Advisor: Prof. E. Yaakobi, G. Yadgar, Prof. A. Schuster
Erasure codes protect data in large scale data centers against multiple concurrent
failures. However, in the frequent case of a single node failure, the amount of
data that must be read for recovery can be an order of magnitude larger than the
amount of data lost. Some existing codes successfully reduce these recovery costs
but increase the storage overhead considerably. Others, which are theoretically
optimal, minimize the amount of data required for recovery, but incur irregular
I/O patterns that may actually increase overall recovery time and cost. Thus,
while the theoretical results in this context continue to improve, many of them
are inapplicable to realistic system settings, and their benefit remains
theoretical as well.
This gap between theory and practice has been observed in previous studies that
applied theoretically optimal techniques to real systems. In this paper, we
present a novel system-level approach to bridging this gap in the context of
reducing recovery costs. We optimize the sequentiality of the data read, at the
cost of a minor increase in its amount. We use Zigzag, a family of erasure codes
with minimal overhead and optimal recovery, and trade its theoretical optimality
for real performance gains. Our implementation of Zigzag and its optimizations
in Ceph reduces recovery costs with two, three and four parity nodes, for large
and small objects alike. We were able to cut down recovery time by up to 20%
compared to that of Reed-Solomon, and to reduce the amount of data read and
transferred by 18% to 37%.