Date of Award
2008
Degree Type
Thesis
Degree Name
Doctor of Philosophy
Program
Computer Science
Supervisor
Dr. Michael Bauer
Abstract
To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters becomes an attractive possibility. This allocation process entails dividing the processes of a job among several clusters which we refer to as co-allocation. Co-allocation offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processors larger than processors on any single cluster. In order to realize these possibilities, effective co-allocation, ultimately, depends on the inter-cluster communication cost. In this thesis, we introduce a scalable co-allocation strategy called theMaximumBandwidthAdjacentclusterSet(MBAS)strategy. Thestrategymakesuse of two thresholds to control allocation: one to control the bandwidth levels on inter cluster communication links and another to control how jobs are split. To evaluate the performance of the proposed strategy, a simulator that can simulate the dynamic behavior of jobs running across multiple clusters has also been developed and validated in this research. The simulation results indicate that by adjusting the thresholds for link saturation level control and chunk size control in splitting jobs, the MBAS co-allocation strategy can significantly improve both users’ satisfaction and system utilization. However, the situation is more complicated in reality as the mix of communication patterns can vary. Being able to dynamically adjust the thresholds may provide a more
effective approach to co-allocation. In the thesis we introduce the Adaptive Threshold Control System (ATCS). Based on fuzzy logic, ATCS can adjust the thresholds dynamically according to system states and jobs’ characteristics. The simulation results suggest that using ATCS during MBAS job co-allocation the overall performance can be improved further than by just using static thresholds. Moreover, this improvement is much more tolerant to the changes ofjob communication requirements; while this is a problemforusingstaticthresholds. Inaddition,ATCSprovidestheflexibilitytoenablea system to be tuned to achieve a more expressive co-allocation control in practice.
Recommended Citation
Qin, Jinhui, "Job Co-Allocation Strategies in Multiple HPC Clusters" (2008). Digitized Theses. 4904.
https://ir.lib.uwo.ca/digitizedtheses/4904