2012.A.2.6 Distributed Computing on Cubesat Clusters using MapReduce

Author(s)

Obulapathi N. Challa (1) and Janise McNair (1)

The University of Florida

Session

A.2 – Technology – Communications, Planning, Operations, and Computing Issues for Interplanetary CubeSat Missions

Keywords

CubeSat, Cluster, Parallel Processing, Map, Reduce

Abstract

Weight, volume, dimensions and power generation capabilities of a CubeSat are constrained to 1000 grams, 1 liter, 10 x 10 x 10 cm cubed and 2.5 watt respectively. These constraints severely limit storage, processing and communication capabilities of individual CubeSats. A typical CubeSat has about 1 GB memory, 25 MIPS @ 25 MHz processing capability and 9.8 Kbps communication capability. Emerging applications for CubeSats, such as remote sensing, will require more of storage, processing and communication capabilities. For interplanetary CubeSat missions the above constraints pose even more problems as the connectivity with ground station will be very limited, intermittent and comes at a very high price. This paper proposes Cubesat MapReduce(CMR), using which processing resources can be pooled among CubeSats in a cluster to speedup missions that require processing of image, video and other multimedia data.

Inspired by map and reduce primitives from Lisp, Google introduced MapReduce framework which is now being widely used for processing vast amounts of data in parallel on large clusters of compute nodes. Map takes a function and a sub-problem as input and applies the function to the sub-problem to generate a sub-solution. Reduce stitches these sub-solutions into full solution.

Each CubeSat cluster has a master node which orchestrates CMR. All CubeSats in the cluster, other than the master, are pooled to form a worker pool. Master node splits the data set to be processed into large number of fixed size blocks called chunks. Chunks are distributed and scheduled for execution on worker nodes to generate sub-solutions. As and when worker nodes finish jobs, new jobs are assigned. Once all the chunks are processed, the sub-solutions are sent to the master node which stitches them together.

We simulated CMR using CubeNet, a Python based CubeSat cluster simulator. Our simulations indicate that CMR, with cluster sizes in range of 5-25 CubeSats, can process images and videos at about 4-20 times faster than an individual CubeSat with a power overhead of 250 mW/node/Mb (cluster size of 5) to 140 mW/node/Mb (cluster size of 25). CMR has a negligible memory overhead (< 0.01%) to store metadata. These results indicate that CMR can speedup image and video processing missions by a factor of 4-20x. Future work will include distributed Reduce operation and integration with cluster based distributed file systems along with locality optimization to improve the scalability and power efficiency of CMR.