Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters

Chao Tung Yang, Jen Hsiang Chang, Chao-Chin Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In recent years, Multicore computers have been widely included in cluster systems. They adopt shared memory architectures. However, previous researches on parallel loop self-scheduling did not consider the feature of multicore computers. It is more suitable for shared-memory multiprocessors to adopt OpenMP for parallel programming. In this paper, we propose a performance-based approach that partitions loop iterations according to the performance weighting of cluster nodes. Because the iterations assigned to one MPI process will be processed in parallel by OpenMP threads running by the processor cores in the same computational node, the number of loop iterations to be allocated to one computational node at each scheduling step also depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes.

Original languageEnglish
Title of host publicationHigh Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers
Pages509-514
Number of pages6
DOIs
Publication statusPublished - 2010 May 3
Event2nd International Conference on High-Performance Computing and Applications, HPCA 2009 - Shanghai, China
Duration: 2009 Aug 102009 Aug 12

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5938 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other2nd International Conference on High-Performance Computing and Applications, HPCA 2009
CountryChina
CityShanghai
Period09-08-1009-08-12

Fingerprint

PC Cluster
Scheduling
Memory architecture
Parallel programming
OpenMP
Vertex of a graph
Iteration
Computer systems
Data storage equipment
Shared-memory multiprocessors
Parallel Programming
Shared Memory
Thread
Weighting
Partition
Experimental Results

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Yang, C. T., Chang, J. H., & Wu, C-C. (2010). Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters. In High Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers (pp. 509-514). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5938 LNCS). https://doi.org/10.1007/978-3-642-11842-5_71
Yang, Chao Tung ; Chang, Jen Hsiang ; Wu, Chao-Chin. / Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters. High Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers. 2010. pp. 509-514 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{f0deea07792c45309cb650fe77f2953f,
title = "Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters",
abstract = "In recent years, Multicore computers have been widely included in cluster systems. They adopt shared memory architectures. However, previous researches on parallel loop self-scheduling did not consider the feature of multicore computers. It is more suitable for shared-memory multiprocessors to adopt OpenMP for parallel programming. In this paper, we propose a performance-based approach that partitions loop iterations according to the performance weighting of cluster nodes. Because the iterations assigned to one MPI process will be processed in parallel by OpenMP threads running by the processor cores in the same computational node, the number of loop iterations to be allocated to one computational node at each scheduling step also depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes.",
author = "Yang, {Chao Tung} and Chang, {Jen Hsiang} and Chao-Chin Wu",
year = "2010",
month = "5",
day = "3",
doi = "10.1007/978-3-642-11842-5_71",
language = "English",
isbn = "3642118410",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "509--514",
booktitle = "High Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers",

}

Yang, CT, Chang, JH & Wu, C-C 2010, Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters. in High Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5938 LNCS, pp. 509-514, 2nd International Conference on High-Performance Computing and Applications, HPCA 2009, Shanghai, China, 09-08-10. https://doi.org/10.1007/978-3-642-11842-5_71

Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters. / Yang, Chao Tung; Chang, Jen Hsiang; Wu, Chao-Chin.

High Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers. 2010. p. 509-514 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5938 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters

AU - Yang, Chao Tung

AU - Chang, Jen Hsiang

AU - Wu, Chao-Chin

PY - 2010/5/3

Y1 - 2010/5/3

N2 - In recent years, Multicore computers have been widely included in cluster systems. They adopt shared memory architectures. However, previous researches on parallel loop self-scheduling did not consider the feature of multicore computers. It is more suitable for shared-memory multiprocessors to adopt OpenMP for parallel programming. In this paper, we propose a performance-based approach that partitions loop iterations according to the performance weighting of cluster nodes. Because the iterations assigned to one MPI process will be processed in parallel by OpenMP threads running by the processor cores in the same computational node, the number of loop iterations to be allocated to one computational node at each scheduling step also depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes.

AB - In recent years, Multicore computers have been widely included in cluster systems. They adopt shared memory architectures. However, previous researches on parallel loop self-scheduling did not consider the feature of multicore computers. It is more suitable for shared-memory multiprocessors to adopt OpenMP for parallel programming. In this paper, we propose a performance-based approach that partitions loop iterations according to the performance weighting of cluster nodes. Because the iterations assigned to one MPI process will be processed in parallel by OpenMP threads running by the processor cores in the same computational node, the number of loop iterations to be allocated to one computational node at each scheduling step also depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes.

UR - http://www.scopus.com/inward/record.url?scp=77951532714&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951532714&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-11842-5_71

DO - 10.1007/978-3-642-11842-5_71

M3 - Conference contribution

AN - SCOPUS:77951532714

SN - 3642118410

SN - 9783642118418

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 509

EP - 514

BT - High Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers

ER -

Yang CT, Chang JH, Wu C-C. Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters. In High Performance Computing and Applications - Second International Conference, HPCA 2009, Revised Selected Papers. 2010. p. 509-514. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-11842-5_71