Most scientific computing packages contain facilities for stepwise regression and often for 'all subsets' and other techniques for finding 'best-fitting' subsets of regression variables. The application of standard theory can be very misleading in such cases when the model has not been chosen a priori, but from the data. There is widespread awareness that considerable over-fitting occurs and that prediction equations obtained after extensive 'data dredging' often perform poorly when applied to new data.
This monograph relates almost entirely to least-squares methods of finding and fitting subsets of regression variables, though most of the concepts are presented in terms of the interpretation and statistical properties of orthogonal projections. An early chapter introduces these methods, which are still not widely known to users of least-squares methods.
Existing methods are described for testing whether any useful improvement can be obtained by using any of a set of predictors. Spjotvoll's method for comparing two arbitrary subsets of predictor variables is illustrated and described in detail.
When the selected model is the 'best-fitting' in some sense, conventional fitting methods give estimates of regression coefficients which are usually biased in the direction of being too large. The extent of this bias is demonstrated for simple cases. Various ad hoc methods for correcting the bias are discussed (ridge regression, James-Stein shrinkage, jack-knifing, etc.), together with the author's maximum likelihood technique. Areas in which further research is needed are also outlined.
- ISBN10 0412353806
- ISBN13 9780412353802
- Publish Date 1 May 1990
- Publish Status Out of Stock
- Out of Print 7 January 2000
- Publish Country US
- Publisher Taylor & Francis Ltd
- Imprint Chapman & Hall/CRC
- Format Hardcover
- Pages 240
- Language English