A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining

Chao Chin Wu, Lien Fu Lai, Liang Tsung Huang, Syun Sheng Jhan, Chung Lu

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

We propose a scheduling strategy in this paper to address the load imbalance problem of the distributed parallel apriori (DPA) algorithm published recently. We use fine grained tasks that are derived by dividing the tasks defined by DPA into smaller subtasks. The subtasks will be scheduled by a dynamic self-scheduling scheme for better load balance. Furthermore, we propose two different methods for data transmission from the master to workers. The first one broadcasts all the frequent k-itemsets to all work nodes while the second one transmits only the required data to each individual work node. Experimental results demonstrate the proposed two approaches both outperform DPA. The first one is more suitable for small datasets and the second one provides steadier performance improvement no matter which self-scheduling scheme is adopted.

Original languageEnglish
Pages (from-to)264-274
Number of pages11
JournalInternational Journal of Computational Science and Engineering
Volume6
Issue number4
DOIs
Publication statusPublished - 2011 Dec 1

Fingerprint

Frequent Itemsets
Mining
Scheduling
Parallel algorithms
Apriori Algorithm
Load Balance
Vertex of a graph
Data Transmission
Parallel Algorithms
Data communication systems
Broadcast
Experimental Results
Demonstrate
Strategy

All Science Journal Classification (ASJC) codes

  • Software
  • Modelling and Simulation
  • Hardware and Architecture
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

@article{1d38ca9f1559425db177e4fdebb970ac,
title = "A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining",
abstract = "We propose a scheduling strategy in this paper to address the load imbalance problem of the distributed parallel apriori (DPA) algorithm published recently. We use fine grained tasks that are derived by dividing the tasks defined by DPA into smaller subtasks. The subtasks will be scheduled by a dynamic self-scheduling scheme for better load balance. Furthermore, we propose two different methods for data transmission from the master to workers. The first one broadcasts all the frequent k-itemsets to all work nodes while the second one transmits only the required data to each individual work node. Experimental results demonstrate the proposed two approaches both outperform DPA. The first one is more suitable for small datasets and the second one provides steadier performance improvement no matter which self-scheduling scheme is adopted.",
author = "Wu, {Chao Chin} and Lai, {Lien Fu} and Huang, {Liang Tsung} and Jhan, {Syun Sheng} and Chung Lu",
year = "2011",
month = "12",
day = "1",
doi = "10.1504/IJCSE.2011.043925",
language = "English",
volume = "6",
pages = "264--274",
journal = "International Journal of Computational Science and Engineering",
issn = "1742-7185",
publisher = "Inderscience Enterprises Ltd",
number = "4",

}

A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining. / Wu, Chao Chin; Lai, Lien Fu; Huang, Liang Tsung; Jhan, Syun Sheng; Lu, Chung.

In: International Journal of Computational Science and Engineering, Vol. 6, No. 4, 01.12.2011, p. 264-274.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining

AU - Wu, Chao Chin

AU - Lai, Lien Fu

AU - Huang, Liang Tsung

AU - Jhan, Syun Sheng

AU - Lu, Chung

PY - 2011/12/1

Y1 - 2011/12/1

N2 - We propose a scheduling strategy in this paper to address the load imbalance problem of the distributed parallel apriori (DPA) algorithm published recently. We use fine grained tasks that are derived by dividing the tasks defined by DPA into smaller subtasks. The subtasks will be scheduled by a dynamic self-scheduling scheme for better load balance. Furthermore, we propose two different methods for data transmission from the master to workers. The first one broadcasts all the frequent k-itemsets to all work nodes while the second one transmits only the required data to each individual work node. Experimental results demonstrate the proposed two approaches both outperform DPA. The first one is more suitable for small datasets and the second one provides steadier performance improvement no matter which self-scheduling scheme is adopted.

AB - We propose a scheduling strategy in this paper to address the load imbalance problem of the distributed parallel apriori (DPA) algorithm published recently. We use fine grained tasks that are derived by dividing the tasks defined by DPA into smaller subtasks. The subtasks will be scheduled by a dynamic self-scheduling scheme for better load balance. Furthermore, we propose two different methods for data transmission from the master to workers. The first one broadcasts all the frequent k-itemsets to all work nodes while the second one transmits only the required data to each individual work node. Experimental results demonstrate the proposed two approaches both outperform DPA. The first one is more suitable for small datasets and the second one provides steadier performance improvement no matter which self-scheduling scheme is adopted.

UR - http://www.scopus.com/inward/record.url?scp=84863181034&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863181034&partnerID=8YFLogxK

U2 - 10.1504/IJCSE.2011.043925

DO - 10.1504/IJCSE.2011.043925

M3 - Article

AN - SCOPUS:84863181034

VL - 6

SP - 264

EP - 274

JO - International Journal of Computational Science and Engineering

JF - International Journal of Computational Science and Engineering

SN - 1742-7185

IS - 4

ER -