TY - GEN
T1 - Enhanced parallel loop self-scheduling for heterogeneous multi-core cluster systems
AU - Wu, Chao Chin
AU - Huang, Liang Tsung
AU - Lai, Lien Fu
AU - Chen, Ming Lung
PY - 2009/12/1
Y1 - 2009/12/1
N2 - Recently, more and more studies investigated the issue of dealing with the heterogeneity problem on heterogeneous cluster systems consisting of multi-core computing nodes. Previously we have proposed a hybrid MPI and OpenMP based loop self-scheduling approach for this kind of system. The allocation functions of several well-known schemes have been modified for better performance. Though the previous approach can improve system performance significantly, in this paper we present how to enhance the speedup further. First, we exploit the thread-level parallelism on the multi-core master node. Second, we investigate how to design a loop self-scheduling scheme which is able to smartly assign a proper chunk size according to each node's performance. At the beginning of dispatching, we prevent the slow slaves from being assigned too many tasks. On the other hand, the master will not assign too many small chunks to slaves at the end. Experimental results show that our approach could obtain the best speedup of 1.35.
AB - Recently, more and more studies investigated the issue of dealing with the heterogeneity problem on heterogeneous cluster systems consisting of multi-core computing nodes. Previously we have proposed a hybrid MPI and OpenMP based loop self-scheduling approach for this kind of system. The allocation functions of several well-known schemes have been modified for better performance. Though the previous approach can improve system performance significantly, in this paper we present how to enhance the speedup further. First, we exploit the thread-level parallelism on the multi-core master node. Second, we investigate how to design a loop self-scheduling scheme which is able to smartly assign a proper chunk size according to each node's performance. At the beginning of dispatching, we prevent the slow slaves from being assigned too many tasks. On the other hand, the master will not assign too many small chunks to slaves at the end. Experimental results show that our approach could obtain the best speedup of 1.35.
UR - http://www.scopus.com/inward/record.url?scp=77949782929&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77949782929&partnerID=8YFLogxK
U2 - 10.1109/I-SPAN.2009.38
DO - 10.1109/I-SPAN.2009.38
M3 - Conference contribution
AN - SCOPUS:77949782929
SN - 9780769539089
T3 - I-SPAN 2009 - The 10th International Symposium on Pervasive Systems, Algorithms, and Networks
SP - 568
EP - 573
BT - I-SPAN 2009 - The 10th International Symposium on Pervasive Systems, Algorithms, and Networks
T2 - 10th International Symposium on Pervasive Systems, Algorithms, and Networks, I-SPAN 2009
Y2 - 14 December 2009 through 16 December 2009
ER -