Project: Statistical Analysis of Cloud Storage Integration

Team: Dr. Josef Spillner, Johannes Müller, Maximilian Quellmalz, David Apsel

Research institution: TU Dresden

Abstract: This infrastructure access proposal seeks to improve the understanding of measurable and calculable parameters in distributed cloud storage integration on clients. It employs iterative measurements to achieve statistically significant information about all parameters.


The Cloud Storage Lab at TU Dresden runs several connected experiments to ensure a smooth overall storage experience. In particular, we are interested in the long-term stability, coding and transport performance and efficiency of our prototypes with high volumes of data throughput, and in optimal redundancy configuration depending on the estimated availability per provider.

Progress: Availability

Nubisave, our cloud storage controller, ships with a user-friendly configuration GUI. It offers a slider to configure a redundancy of between 0% and 100%. Unfortunately, there are three issues: (1) If the number of targets is low, then just adding a few percent of redundancy only wastes space without contributing to higher availability of data. (2) If the number of targets (n) is high, then calculating the average availability takes a long time due to O(2n) complexity especially with large k (= significant n). And (3) If targets have interdependencies, e.g. Dropbox which actually uses Amazon S3, the perceived availability is higher than the actual one.

Motivational Quote: We propose an improvement in the algorithm used to calculate availability for a set of heterogeneous storage nodes, decreasing worst case and average complexity by two. This is sufficient to calculate availabilities with high speed for configurations with less than 40 nodes, which enables the GUI of Nubisave to respond fluently to changes in the configuration.

Progress: Transport performance

Cloudfusion is a FUSE filesystem which connects Dropbox, Sugarsync or (in the future) additional storage services to the local machine. It is used by Nubisave as one out of multiple possible transports.

On the HPI FSOC server (Hewlett Packard DL980 G7 - 1), we ran a couple of experiments to improve the Cloudfusion performance. ql-fstest wrote 237531 MiB at 10.9524 MiBps? and read 728116 MiB at 30.3617 MiBps?.

Comparison between Cloudfusion and other services: Google Storage (with s3fuse) = ?? Mbps, Amazon S3 (with s3fuse) = 0.59 Mbps, T-Online (with davfs2) = 0.59 Mbps, Dropbox (with Cloudfusion, 3 threads) = 2.06 Mbps, Sugarsync (with Cloudfusion) = 0.51 Mbps

All numbers are preliminary.

Last modified 3 years ago Last modified on Mar 11, 2014 11:44:20 AM