A discretization algorithm based on Class-Attribute Contingency Coefficient

Cheng Jung Tsai, Chien I. Lee, Wei Pang Yang

Research output: Contribution to journalArticle

134 Citations (Scopus)

Abstract

Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster. In this paper, we propose a static, global, incremental, supervised and top-down discretization algorithm based on Class-Attribute Contingency Coefficient. Empirical evaluation of seven discretization algorithms on 13 real datasets and four artificial datasets showed that the proposed algorithm could generate a better discretization scheme that improved the accuracy of classification. As to the execution time of discretization, the number of generated rules, and the training time of C5.0, our approach also achieved promising results.

Original languageEnglish
Pages (from-to)714-731
Number of pages18
JournalInformation Sciences
Volume178
Issue number3
DOIs
Publication statusPublished - 2008 Feb 1

Fingerprint

Discretization
Attribute
Coefficient
Data mining
Discretization Scheme
Summarization
Knowledge Discovery
Execution Time
Data Mining
Class
Contingency
Coefficients
Evaluation

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Tsai, Cheng Jung ; Lee, Chien I. ; Yang, Wei Pang. / A discretization algorithm based on Class-Attribute Contingency Coefficient. In: Information Sciences. 2008 ; Vol. 178, No. 3. pp. 714-731.
@article{ac84400622c5440599c692f851a44ccd,
title = "A discretization algorithm based on Class-Attribute Contingency Coefficient",
abstract = "Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster. In this paper, we propose a static, global, incremental, supervised and top-down discretization algorithm based on Class-Attribute Contingency Coefficient. Empirical evaluation of seven discretization algorithms on 13 real datasets and four artificial datasets showed that the proposed algorithm could generate a better discretization scheme that improved the accuracy of classification. As to the execution time of discretization, the number of generated rules, and the training time of C5.0, our approach also achieved promising results.",
author = "Tsai, {Cheng Jung} and Lee, {Chien I.} and Yang, {Wei Pang}",
year = "2008",
month = "2",
day = "1",
doi = "10.1016/j.ins.2007.09.004",
language = "English",
volume = "178",
pages = "714--731",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",
number = "3",

}

A discretization algorithm based on Class-Attribute Contingency Coefficient. / Tsai, Cheng Jung; Lee, Chien I.; Yang, Wei Pang.

In: Information Sciences, Vol. 178, No. 3, 01.02.2008, p. 714-731.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A discretization algorithm based on Class-Attribute Contingency Coefficient

AU - Tsai, Cheng Jung

AU - Lee, Chien I.

AU - Yang, Wei Pang

PY - 2008/2/1

Y1 - 2008/2/1

N2 - Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster. In this paper, we propose a static, global, incremental, supervised and top-down discretization algorithm based on Class-Attribute Contingency Coefficient. Empirical evaluation of seven discretization algorithms on 13 real datasets and four artificial datasets showed that the proposed algorithm could generate a better discretization scheme that improved the accuracy of classification. As to the execution time of discretization, the number of generated rules, and the training time of C5.0, our approach also achieved promising results.

AB - Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster. In this paper, we propose a static, global, incremental, supervised and top-down discretization algorithm based on Class-Attribute Contingency Coefficient. Empirical evaluation of seven discretization algorithms on 13 real datasets and four artificial datasets showed that the proposed algorithm could generate a better discretization scheme that improved the accuracy of classification. As to the execution time of discretization, the number of generated rules, and the training time of C5.0, our approach also achieved promising results.

UR - http://www.scopus.com/inward/record.url?scp=35748943218&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35748943218&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2007.09.004

DO - 10.1016/j.ins.2007.09.004

M3 - Article

AN - SCOPUS:35748943218

VL - 178

SP - 714

EP - 731

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

IS - 3

ER -