Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism

Chao Chin Wu, Jenn Yang Ke, Heshan Lin, Wu Chun Feng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Dynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based on the characteristics of the optimization equation. Because applications that are classified in the same category of DP have similar program behavior, the research community has sought to propose general solutions for parallelizing each category of DP. However, most existing studies focus on running DP on CPU-based parallel systems rather than on accelerating DP algorithms on the graphics processing unit (GPU). This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP). In NPDP applications, the degree of parallelism varies significantly in different stages of computation, making it difficult to fully utilize the compute power of hundreds of processing cores in a GPU. To address this challenge, we propose a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages. We realize our approach in a real-world NPDP application - the optimal matrix parenthesization problem. Experimental results demonstrate our method can achieve a speedup of 13.40 over the previously published GPU algorithm.

Original languageEnglish
Title of host publicationProceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
Pages96-103
Number of pages8
DOIs
Publication statusPublished - 2011 Dec 1
Event2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011 - Tainan, Taiwan
Duration: 2011 Dec 72011 Dec 9

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
ISSN (Print)1521-9097

Other

Other2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
CountryTaiwan
CityTainan
Period11-12-0711-12-09

Fingerprint

Dynamic programming
Graphics processing unit
Computational methods
Program processors
Packaging
Scheduling

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

Wu, C. C., Ke, J. Y., Lin, H., & Feng, W. C. (2011). Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. In Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011 (pp. 96-103). [6121265] (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS). https://doi.org/10.1109/ICPADS.2011.92
Wu, Chao Chin ; Ke, Jenn Yang ; Lin, Heshan ; Feng, Wu Chun. / Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011. 2011. pp. 96-103 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS).
@inproceedings{725d7f563ab3412eb5107e928190b511,
title = "Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism",
abstract = "Dynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based on the characteristics of the optimization equation. Because applications that are classified in the same category of DP have similar program behavior, the research community has sought to propose general solutions for parallelizing each category of DP. However, most existing studies focus on running DP on CPU-based parallel systems rather than on accelerating DP algorithms on the graphics processing unit (GPU). This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP). In NPDP applications, the degree of parallelism varies significantly in different stages of computation, making it difficult to fully utilize the compute power of hundreds of processing cores in a GPU. To address this challenge, we propose a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages. We realize our approach in a real-world NPDP application - the optimal matrix parenthesization problem. Experimental results demonstrate our method can achieve a speedup of 13.40 over the previously published GPU algorithm.",
author = "Wu, {Chao Chin} and Ke, {Jenn Yang} and Heshan Lin and Feng, {Wu Chun}",
year = "2011",
month = "12",
day = "1",
doi = "10.1109/ICPADS.2011.92",
language = "English",
isbn = "9780769545769",
series = "Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS",
pages = "96--103",
booktitle = "Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011",

}

Wu, CC, Ke, JY, Lin, H & Feng, WC 2011, Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. in Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011., 6121265, Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, pp. 96-103, 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011, Tainan, Taiwan, 11-12-07. https://doi.org/10.1109/ICPADS.2011.92

Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. / Wu, Chao Chin; Ke, Jenn Yang; Lin, Heshan; Feng, Wu Chun.

Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011. 2011. p. 96-103 6121265 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism

AU - Wu, Chao Chin

AU - Ke, Jenn Yang

AU - Lin, Heshan

AU - Feng, Wu Chun

PY - 2011/12/1

Y1 - 2011/12/1

N2 - Dynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based on the characteristics of the optimization equation. Because applications that are classified in the same category of DP have similar program behavior, the research community has sought to propose general solutions for parallelizing each category of DP. However, most existing studies focus on running DP on CPU-based parallel systems rather than on accelerating DP algorithms on the graphics processing unit (GPU). This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP). In NPDP applications, the degree of parallelism varies significantly in different stages of computation, making it difficult to fully utilize the compute power of hundreds of processing cores in a GPU. To address this challenge, we propose a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages. We realize our approach in a real-world NPDP application - the optimal matrix parenthesization problem. Experimental results demonstrate our method can achieve a speedup of 13.40 over the previously published GPU algorithm.

AB - Dynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based on the characteristics of the optimization equation. Because applications that are classified in the same category of DP have similar program behavior, the research community has sought to propose general solutions for parallelizing each category of DP. However, most existing studies focus on running DP on CPU-based parallel systems rather than on accelerating DP algorithms on the graphics processing unit (GPU). This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP). In NPDP applications, the degree of parallelism varies significantly in different stages of computation, making it difficult to fully utilize the compute power of hundreds of processing cores in a GPU. To address this challenge, we propose a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages. We realize our approach in a real-world NPDP application - the optimal matrix parenthesization problem. Experimental results demonstrate our method can achieve a speedup of 13.40 over the previously published GPU algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84863016584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863016584&partnerID=8YFLogxK

U2 - 10.1109/ICPADS.2011.92

DO - 10.1109/ICPADS.2011.92

M3 - Conference contribution

AN - SCOPUS:84863016584

SN - 9780769545769

T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

SP - 96

EP - 103

BT - Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011

ER -

Wu CC, Ke JY, Lin H, Feng WC. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. In Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011. 2011. p. 96-103. 6121265. (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS). https://doi.org/10.1109/ICPADS.2011.92