Techniques and evaluation of processor co-allocation in multi-cluster systems

Ngubiri, John

View/Open

PhD Thesis (2.970Mb)

Date

2008-09-15

Author

Ngubiri, John

Metadata

Show full item record

Abstract

Computer processing power is increasing at a very high rate. A computer considered to be fast today may not be fast in a few years to come. At the same time, the number and complexity of resource intensive computer applications are increasing. This calls for bringing together of multiple processing units so as to collectively service competing resource intensive applications and employing good scheduling strategies so as to offer the maximum possible satisfaction to the owners of the jobs. In this thesis, we study the ways rigid jobs can be scheduled on a multi-cluster system that processes by pure space slicing and allows co-allocation. We study ways user satisfaction can be evaluated in a typical multi-cluster system. To a very large extent, this has been done using the average value of the performance metric. Given the nature of typical super computer workloads, jobs have varying resource requirements. This implies that some are more schedulable than others. At the same time, the scheduler may favor some jobs at the expense of others. Studies show that schedulable (small) jobs make up the majority of the jobs but the minority of the load. Schedulable jobs tend to have good performance while unschedulable ones have poor performance. The good performance of the schedulable jobs (which are the majority) makes the average metric value appear impressive. The impressive av- erage metric value does not imply the poor performance of the majority of the load. We study the differences in performance of the different groups (grouped by size, number of components and width of the widest component) and how the performance varies with the changes in scheduler parameters. We also study the relationship with job characteristics and their (approximate) schedulability. We show that the schedulability has a big relationship with job size and width of the widest component. We further show that performance can be improved by partitioning the jobs in such a way that they are more schedulable. Another way we use job schedulability is by using it in prioritization. We use the job (approximate) schedulability to enhance the scheduler prioritization scheme so as to improve the performance of the entire job stream and reduce the performance difference between schedulable and unschedulable jobs. We do this by giving a priority boost to unschedulable jobs on top of the time they have spent in the queue (seniority). We propose the greedy scheduler that uses the new prioritization approach. We show that so long as the depth and maxJumps values are high enough, the greedy scheduler outperforms and is fairer than the FPFS scheduler. The differences in performance among jobs can be due to differences in job schedulability, cases of the scheduler favoritism/discrimination (unfairness) or a combination of the two. Compared to performance and scheduling techniques, there are fewer studies carried out on fairness in parallel job scheduling. We first study characteristics of existing fairness metrics used in parallel job scheduling. We investigate how they imply fairness/unfairness. We realize that there are instances where the implied unfairness is not unfairness in practice. The deductions can therefore be misleading sometimes. The causes of the misleading deductions are mostly failure to account for the effect of differences in resource requirements for the jobs, differences in job seniority and differences in queue states as the job gets submissed into the queue. We then propose a new approach to fairness evaluation for parallel job schedulers. Our approach considers the job wise net benefit of using one scheduler instead of another. This caters for differences in performance that are not due o scheduler discrimination (like differences in resource requirements and traffic). Broadly, other than comparing a job to others for the same scheduler, our approach compares a job to its self for different schedulers. Our approach addresses the weaknesses found in the existing approaches. We use the performance and discrimination trends to validate our approach on selected multi-cluster schedulers. Our approach is able to deduce unfair treatment of jobs even if the unfairly treated job is not among the worst performing job. Factors like differences in resources among jobs and jobs arriving during peak hours are adequately catered for by our approach as it evaluates scheduler fairness.

URI

http://hdl.handle.net/10570/715

Collections

School of Computing and Informatics Technology (CIT) Collection