A multivariate decision tree algorithm to mine imbalanced data

Cheng Jung Tsai, Chien I. Lee, Chiu Ting Chen, Wei Pang Yang

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The class imbalance problem is an important issue in classification of Data mining. Among the proposed approaches, some of them modify the class distribution of the original data which would worsen the computational burden or might throw away some userful information; some are limited to specific dataset or only applicable to the dataset with numeric attribute; some would take a lot of training time due to the natural property of core techniques such as neural network; and some suffer from determining a proper threshold while the user is unfamiliar with the domain knowledge. In this paper, we proposed the HIerarchical Shrinking decision Tree (HIS-Tree) algorithm to solve these problems. HIS-Tree uses the multivariae test derived from geometric mean measurement as splitting criteria to group minority examples together. By this way, HIS-Tree can avoid discovering rules dominated by the majority examples. Finally, as shown in the experiment, HIS-Tree can predict minority/interesting examples more accurately.

Original languageEnglish
Pages (from-to)50-58
Number of pages9
JournalWSEAS Transactions on Information Science and Applications
Volume4
Issue number1
Publication statusPublished - 2007 Jan 1

Fingerprint

Decision trees
Data mining
Neural networks
Experiments

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications

Cite this

Tsai, Cheng Jung ; Lee, Chien I. ; Chen, Chiu Ting ; Yang, Wei Pang. / A multivariate decision tree algorithm to mine imbalanced data. In: WSEAS Transactions on Information Science and Applications. 2007 ; Vol. 4, No. 1. pp. 50-58.
@article{fe1434be7dfd4b319b16c12d8e53237c,
title = "A multivariate decision tree algorithm to mine imbalanced data",
abstract = "The class imbalance problem is an important issue in classification of Data mining. Among the proposed approaches, some of them modify the class distribution of the original data which would worsen the computational burden or might throw away some userful information; some are limited to specific dataset or only applicable to the dataset with numeric attribute; some would take a lot of training time due to the natural property of core techniques such as neural network; and some suffer from determining a proper threshold while the user is unfamiliar with the domain knowledge. In this paper, we proposed the HIerarchical Shrinking decision Tree (HIS-Tree) algorithm to solve these problems. HIS-Tree uses the multivariae test derived from geometric mean measurement as splitting criteria to group minority examples together. By this way, HIS-Tree can avoid discovering rules dominated by the majority examples. Finally, as shown in the experiment, HIS-Tree can predict minority/interesting examples more accurately.",
author = "Tsai, {Cheng Jung} and Lee, {Chien I.} and Chen, {Chiu Ting} and Yang, {Wei Pang}",
year = "2007",
month = "1",
day = "1",
language = "English",
volume = "4",
pages = "50--58",
journal = "WSEAS Transactions on Information Science and Applications",
issn = "1790-0832",
publisher = "World Scientific and Engineering Academy and Society",
number = "1",

}

A multivariate decision tree algorithm to mine imbalanced data. / Tsai, Cheng Jung; Lee, Chien I.; Chen, Chiu Ting; Yang, Wei Pang.

In: WSEAS Transactions on Information Science and Applications, Vol. 4, No. 1, 01.01.2007, p. 50-58.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A multivariate decision tree algorithm to mine imbalanced data

AU - Tsai, Cheng Jung

AU - Lee, Chien I.

AU - Chen, Chiu Ting

AU - Yang, Wei Pang

PY - 2007/1/1

Y1 - 2007/1/1

N2 - The class imbalance problem is an important issue in classification of Data mining. Among the proposed approaches, some of them modify the class distribution of the original data which would worsen the computational burden or might throw away some userful information; some are limited to specific dataset or only applicable to the dataset with numeric attribute; some would take a lot of training time due to the natural property of core techniques such as neural network; and some suffer from determining a proper threshold while the user is unfamiliar with the domain knowledge. In this paper, we proposed the HIerarchical Shrinking decision Tree (HIS-Tree) algorithm to solve these problems. HIS-Tree uses the multivariae test derived from geometric mean measurement as splitting criteria to group minority examples together. By this way, HIS-Tree can avoid discovering rules dominated by the majority examples. Finally, as shown in the experiment, HIS-Tree can predict minority/interesting examples more accurately.

AB - The class imbalance problem is an important issue in classification of Data mining. Among the proposed approaches, some of them modify the class distribution of the original data which would worsen the computational burden or might throw away some userful information; some are limited to specific dataset or only applicable to the dataset with numeric attribute; some would take a lot of training time due to the natural property of core techniques such as neural network; and some suffer from determining a proper threshold while the user is unfamiliar with the domain knowledge. In this paper, we proposed the HIerarchical Shrinking decision Tree (HIS-Tree) algorithm to solve these problems. HIS-Tree uses the multivariae test derived from geometric mean measurement as splitting criteria to group minority examples together. By this way, HIS-Tree can avoid discovering rules dominated by the majority examples. Finally, as shown in the experiment, HIS-Tree can predict minority/interesting examples more accurately.

UR - http://www.scopus.com/inward/record.url?scp=37849185887&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=37849185887&partnerID=8YFLogxK

M3 - Article

VL - 4

SP - 50

EP - 58

JO - WSEAS Transactions on Information Science and Applications

JF - WSEAS Transactions on Information Science and Applications

SN - 1790-0832

IS - 1

ER -