Assume y is a response variable, x is a risk factor of interest, and z's are covariates, or sometime called "confounders of x" if they are correlated with both x and y. If the covariates are numerous, then model selection procedures are applied on z's while x is usually forced into the model before or after the selection. In this situation, over-dispersion will occur to bias the inference on the relation between x and y. In a linear model, the over-dispersion comes from two sources: An underestimation of the mean-squared error, and a dependency between the estimator of the x-effect and its standard error. The author proposed a method that incorporates the ideas of Ye's generalized degree of freedom and Rosenbaum and Rubin's propensity score. The method reduces the bias and over-dispersion effect to acceptable levels. Data from the Georgia capital charging and sentencing study, which included 1077 observations and 295 covariates, were analyzed as an illustration.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics