A two-step method for clustering mixed categroical and numeric data

Ming-Yi Shih, Jar Wen Jheng, Lien-Fu Lai

Research output: Contribution to journalArticle

60 Citations (Scopus)

Abstract

Various clustering algorithms have been developed to group data into clusters in diverse domains. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numeric data types. In this paper, a new two-step clustering method is presented to find clusters on this kind of data. In this approach the items in categorical attributes are processed to construct the similarity or relationships among them based on the ideas of co-occurrence; then all categorical attributes can be converted into numeric attributes based on these constructed relationships. Finally, since all categorical data are converted into numeric, the existing clustering algorithms can be applied to the dataset without pain. Nevertheless, the existing clustering algorithms suffer from some disadvantages or weakness, the proposed two-step method integrates hierarchical and partitioning clustering algorithm with adding attributes to cluster objects. This method defines the relationships among items, and improves the weaknesses of applying single clustering algorithm. Experimental evidences show that robust results can be achieved by applying this method to cluster mixed numeric and categorical data.

Original languageEnglish
Pages (from-to)11-19
Number of pages9
JournalTamkang Journal of Science and Engineering
Volume13
Issue number1
Publication statusPublished - 2010 Mar 1

Fingerprint

Clustering algorithms

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

@article{f0b20b8e58344e739054320d932967dd,
title = "A two-step method for clustering mixed categroical and numeric data",
abstract = "Various clustering algorithms have been developed to group data into clusters in diverse domains. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numeric data types. In this paper, a new two-step clustering method is presented to find clusters on this kind of data. In this approach the items in categorical attributes are processed to construct the similarity or relationships among them based on the ideas of co-occurrence; then all categorical attributes can be converted into numeric attributes based on these constructed relationships. Finally, since all categorical data are converted into numeric, the existing clustering algorithms can be applied to the dataset without pain. Nevertheless, the existing clustering algorithms suffer from some disadvantages or weakness, the proposed two-step method integrates hierarchical and partitioning clustering algorithm with adding attributes to cluster objects. This method defines the relationships among items, and improves the weaknesses of applying single clustering algorithm. Experimental evidences show that robust results can be achieved by applying this method to cluster mixed numeric and categorical data.",
author = "Ming-Yi Shih and Jheng, {Jar Wen} and Lien-Fu Lai",
year = "2010",
month = "3",
day = "1",
language = "English",
volume = "13",
pages = "11--19",
journal = "Journal of Applied Science and Engineering",
issn = "1560-6686",
publisher = "Tamkang University",
number = "1",

}

A two-step method for clustering mixed categroical and numeric data. / Shih, Ming-Yi; Jheng, Jar Wen; Lai, Lien-Fu.

In: Tamkang Journal of Science and Engineering, Vol. 13, No. 1, 01.03.2010, p. 11-19.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A two-step method for clustering mixed categroical and numeric data

AU - Shih, Ming-Yi

AU - Jheng, Jar Wen

AU - Lai, Lien-Fu

PY - 2010/3/1

Y1 - 2010/3/1

N2 - Various clustering algorithms have been developed to group data into clusters in diverse domains. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numeric data types. In this paper, a new two-step clustering method is presented to find clusters on this kind of data. In this approach the items in categorical attributes are processed to construct the similarity or relationships among them based on the ideas of co-occurrence; then all categorical attributes can be converted into numeric attributes based on these constructed relationships. Finally, since all categorical data are converted into numeric, the existing clustering algorithms can be applied to the dataset without pain. Nevertheless, the existing clustering algorithms suffer from some disadvantages or weakness, the proposed two-step method integrates hierarchical and partitioning clustering algorithm with adding attributes to cluster objects. This method defines the relationships among items, and improves the weaknesses of applying single clustering algorithm. Experimental evidences show that robust results can be achieved by applying this method to cluster mixed numeric and categorical data.

AB - Various clustering algorithms have been developed to group data into clusters in diverse domains. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numeric data types. In this paper, a new two-step clustering method is presented to find clusters on this kind of data. In this approach the items in categorical attributes are processed to construct the similarity or relationships among them based on the ideas of co-occurrence; then all categorical attributes can be converted into numeric attributes based on these constructed relationships. Finally, since all categorical data are converted into numeric, the existing clustering algorithms can be applied to the dataset without pain. Nevertheless, the existing clustering algorithms suffer from some disadvantages or weakness, the proposed two-step method integrates hierarchical and partitioning clustering algorithm with adding attributes to cluster objects. This method defines the relationships among items, and improves the weaknesses of applying single clustering algorithm. Experimental evidences show that robust results can be achieved by applying this method to cluster mixed numeric and categorical data.

UR - http://www.scopus.com/inward/record.url?scp=77950987307&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950987307&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:77950987307

VL - 13

SP - 11

EP - 19

JO - Journal of Applied Science and Engineering

JF - Journal of Applied Science and Engineering

SN - 1560-6686

IS - 1

ER -