An efficient and sensitive decision tree approach to mining concept-drifting data streams

Cheng Jung Tsai, Chien I. Lee, Wei Pang Yang

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Data stream mining has become a novel research topic of growing interest in knowledge discovery. Most proposed algorithms for data stream mining assume that each data block is basically a random sample from a stationary distribution, but many databases available violate this assumption. That is, the class of an instance may change over time, known as concept drift. In this paper, we propose a Sensitive Concept Drift Probing Decision Tree algorithm (SCRIPT), which is based on the statistical X2 test, to handle the concept drift problem on data streams. Compared with the proposed methods, the advantages of SCRIPT include: a) it can avoid unnecessary system cost for stable data streams; b) it can immediately and efficiently corrects original classifier while data streams are instable; c) it is more suitable to the applications in which a sensitive detection of concept drift is required.

Original languageEnglish
Pages (from-to)135-156
Number of pages22
JournalInformatica
Volume19
Issue number1
Publication statusPublished - 2008

Fingerprint

Concept Drift
Decision trees
Data Streams
Decision tree
Mining
Statistical tests
Data mining
Classifiers
Tree Algorithms
Costs
Knowledge Discovery
Violate
Statistical test
Stationary Distribution
Immediately
Classifier
Concepts

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Applied Mathematics

Cite this

Tsai, Cheng Jung ; Lee, Chien I. ; Yang, Wei Pang. / An efficient and sensitive decision tree approach to mining concept-drifting data streams. In: Informatica. 2008 ; Vol. 19, No. 1. pp. 135-156.
@article{9066385762244367b920902789414556,
title = "An efficient and sensitive decision tree approach to mining concept-drifting data streams",
abstract = "Data stream mining has become a novel research topic of growing interest in knowledge discovery. Most proposed algorithms for data stream mining assume that each data block is basically a random sample from a stationary distribution, but many databases available violate this assumption. That is, the class of an instance may change over time, known as concept drift. In this paper, we propose a Sensitive Concept Drift Probing Decision Tree algorithm (SCRIPT), which is based on the statistical X2 test, to handle the concept drift problem on data streams. Compared with the proposed methods, the advantages of SCRIPT include: a) it can avoid unnecessary system cost for stable data streams; b) it can immediately and efficiently corrects original classifier while data streams are instable; c) it is more suitable to the applications in which a sensitive detection of concept drift is required.",
author = "Tsai, {Cheng Jung} and Lee, {Chien I.} and Yang, {Wei Pang}",
year = "2008",
language = "English",
volume = "19",
pages = "135--156",
journal = "Informatica",
issn = "0868-4952",
publisher = "IOS Press",
number = "1",

}

An efficient and sensitive decision tree approach to mining concept-drifting data streams. / Tsai, Cheng Jung; Lee, Chien I.; Yang, Wei Pang.

In: Informatica, Vol. 19, No. 1, 2008, p. 135-156.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An efficient and sensitive decision tree approach to mining concept-drifting data streams

AU - Tsai, Cheng Jung

AU - Lee, Chien I.

AU - Yang, Wei Pang

PY - 2008

Y1 - 2008

N2 - Data stream mining has become a novel research topic of growing interest in knowledge discovery. Most proposed algorithms for data stream mining assume that each data block is basically a random sample from a stationary distribution, but many databases available violate this assumption. That is, the class of an instance may change over time, known as concept drift. In this paper, we propose a Sensitive Concept Drift Probing Decision Tree algorithm (SCRIPT), which is based on the statistical X2 test, to handle the concept drift problem on data streams. Compared with the proposed methods, the advantages of SCRIPT include: a) it can avoid unnecessary system cost for stable data streams; b) it can immediately and efficiently corrects original classifier while data streams are instable; c) it is more suitable to the applications in which a sensitive detection of concept drift is required.

AB - Data stream mining has become a novel research topic of growing interest in knowledge discovery. Most proposed algorithms for data stream mining assume that each data block is basically a random sample from a stationary distribution, but many databases available violate this assumption. That is, the class of an instance may change over time, known as concept drift. In this paper, we propose a Sensitive Concept Drift Probing Decision Tree algorithm (SCRIPT), which is based on the statistical X2 test, to handle the concept drift problem on data streams. Compared with the proposed methods, the advantages of SCRIPT include: a) it can avoid unnecessary system cost for stable data streams; b) it can immediately and efficiently corrects original classifier while data streams are instable; c) it is more suitable to the applications in which a sensitive detection of concept drift is required.

UR - http://www.scopus.com/inward/record.url?scp=42149193745&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42149193745&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:42149193745

VL - 19

SP - 135

EP - 156

JO - Informatica

JF - Informatica

SN - 0868-4952

IS - 1

ER -