Model selection based on resampling approaches for cluster longitudinal data with missingness in outcomes

Chun Shu Chen, Chung Wei Shen

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In medical and health studies, longitudinal and cluster longitudinal data are often collected, where the response variable of interest is observed repeatedly over time and along with a set of covariates. Model selection becomes an active research topic but has not been explored largely due to the complex correlation structure of the data set. To address this important issue, in this paper, we concentrate on model selection of cluster longitudinal data especially when data are subject to missingness. Motivated from the expected weighted quadratic loss of a given model, data perturbation and bootstrapping methods are used to estimate the loss and then the model that has the smallest expected loss is selected as the best model. To justify the proposed model selection method, we provide various numerical assessments and a real application regarding the asthma data set is also analyzed for illustration.

Original languageEnglish
Pages (from-to)2982-2997
Number of pages16
JournalStatistics in Medicine
Volume37
Issue number20
DOIs
Publication statusPublished - 2018 Sep 10

Fingerprint

Longitudinal Data
Resampling
Model Selection
Data Perturbation
Quadratic Loss
Longitudinal Studies
Asthma
Longitudinal Study
Bootstrapping
Correlation Structure
Complex Structure
Justify
Covariates
Health
Research
Model
Estimate
Datasets

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Statistics and Probability

Cite this

@article{2d6f0e89af4349af9139eef94631fbaa,
title = "Model selection based on resampling approaches for cluster longitudinal data with missingness in outcomes",
abstract = "In medical and health studies, longitudinal and cluster longitudinal data are often collected, where the response variable of interest is observed repeatedly over time and along with a set of covariates. Model selection becomes an active research topic but has not been explored largely due to the complex correlation structure of the data set. To address this important issue, in this paper, we concentrate on model selection of cluster longitudinal data especially when data are subject to missingness. Motivated from the expected weighted quadratic loss of a given model, data perturbation and bootstrapping methods are used to estimate the loss and then the model that has the smallest expected loss is selected as the best model. To justify the proposed model selection method, we provide various numerical assessments and a real application regarding the asthma data set is also analyzed for illustration.",
author = "Chen, {Chun Shu} and Shen, {Chung Wei}",
year = "2018",
month = "9",
day = "10",
doi = "10.1002/sim.7801",
language = "English",
volume = "37",
pages = "2982--2997",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "20",

}

Model selection based on resampling approaches for cluster longitudinal data with missingness in outcomes. / Chen, Chun Shu; Shen, Chung Wei.

In: Statistics in Medicine, Vol. 37, No. 20, 10.09.2018, p. 2982-2997.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Model selection based on resampling approaches for cluster longitudinal data with missingness in outcomes

AU - Chen, Chun Shu

AU - Shen, Chung Wei

PY - 2018/9/10

Y1 - 2018/9/10

N2 - In medical and health studies, longitudinal and cluster longitudinal data are often collected, where the response variable of interest is observed repeatedly over time and along with a set of covariates. Model selection becomes an active research topic but has not been explored largely due to the complex correlation structure of the data set. To address this important issue, in this paper, we concentrate on model selection of cluster longitudinal data especially when data are subject to missingness. Motivated from the expected weighted quadratic loss of a given model, data perturbation and bootstrapping methods are used to estimate the loss and then the model that has the smallest expected loss is selected as the best model. To justify the proposed model selection method, we provide various numerical assessments and a real application regarding the asthma data set is also analyzed for illustration.

AB - In medical and health studies, longitudinal and cluster longitudinal data are often collected, where the response variable of interest is observed repeatedly over time and along with a set of covariates. Model selection becomes an active research topic but has not been explored largely due to the complex correlation structure of the data set. To address this important issue, in this paper, we concentrate on model selection of cluster longitudinal data especially when data are subject to missingness. Motivated from the expected weighted quadratic loss of a given model, data perturbation and bootstrapping methods are used to estimate the loss and then the model that has the smallest expected loss is selected as the best model. To justify the proposed model selection method, we provide various numerical assessments and a real application regarding the asthma data set is also analyzed for illustration.

UR - http://www.scopus.com/inward/record.url?scp=85046553907&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046553907&partnerID=8YFLogxK

U2 - 10.1002/sim.7801

DO - 10.1002/sim.7801

M3 - Article

VL - 37

SP - 2982

EP - 2997

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 20

ER -