2013.B.1.2 Distributed Data Storage on CubeSat Clusters
Obulapathi N Challa (1), Janise McNair (1)
- University of Florida, USA
CubeSat Clusters, Distributed File System
Small satellites have limited storage, processing and communication and thus there is need for distributed storage, processing and communication. In our previous work, we developed CubeSat MapReduce and CubeSat Torrent which address the problems of distributed processing and distributed communication for CubeSat Clusters. CubeSat Distributed File System (CDFS) is designed to store large files and facilitate large scale data processing and distributed communications on small satellite clusters. Addressing the goals of performance, scalability, reliability and availability, CDFS had been designed for small satellite clusters considering its unreliable wireless network, power and bandwidth constraints.
A CubeSat cluster consists of a master CubeSat, and several slave CubeSats. The sensing node plays the Master node role. When the master node wants to store a file on the cluster, file is divided into fixed size chunks which are distributed to slaves. Each chunk is addressed using an unique immutable “chunk id”. The master node stores and maintains the file system metadata. Metadata includes information about each file, its constituent components aka chunks and the location of these chunks on various slaves. Slave CubeSats store chunks as normal files. By default, the size of each chunk is 64kB, but is configurable.
For reliability, each chunk is replicated on three slaves. When the master node transmits the chunk to its assigned slave, it goes through multiple nodes which copy the chunk while it is being transmitted. This way, replicas are created without incurring additional energy or bandwidth consumption. Master node periodically pings the slaves using heartbeat messages to collect the state of slaves. If a slave fails, chunks stored on that slave will be replicated on other slaves. Single master simplifies the design. Large chunk size reduces the total amount of metadata and control traffic.
We simulated CDFS using CubeNet, a Python CubeSat network simulator. Our simulation results indicate that CDFS, with cluster sizes in range of 5 – 25 CubeSats, enables 3.5 – 22 times faster (compared to a single CubeSat) processing and downlinking of images and videos using CubeSat MapReduce and CubeSat Torrent. Even with individual CubeSat failure rate of 10%, the data is available 99.95% of the time. Metadata overhead is very minimal (< 0.1 %) for large files (100 Mb – 5 Gb) and the amount of metadata increases almost linearly with the file size. Our future work will include porting CDFS to RaspberryPi platform and emulating it on RaspberryPi cluster.
- Download slides in PDF format here