stochastics
Modeling
Model families and the machinery to fit them.
Categorical data modeling
Inputs → Outputs: contingency tables or categorical outcomes → fitted model, tests, estimates (often odds / log-odds scale).
Categorical data modeling of data that can be represented by a contingency table. Can be applied for:
- Linear model analysis
- Log-linear model analysis
- Logistic regression
- Repeated-measures analysis
- Analysis of variance
- Linear regression
- Ordinal logistic analysis
- Sample survey analysis
References
- Haberman, S. J. (1972). “Algorithm AS 51: Log-Linear Fit for Contingency Tables.” Journal of the Royal Statistical Society, Series C 21:218–225.
- Forthofer, R. N., and Koch, G. G. (1973). “An Analysis of Compounded Functions of Categorical Data.” Biometrics 29:143–157.
- Koch, G. G., and Stokes, M. E. (1979). Annotated Computer Applications of Weighted Least Squares Methods for Illustrative Analyses of Examples Involving Health Survey Data. Technical report, prepared for the U.S. National Center for Health Studies.
- Guthrie, D. (1981). “Analysis of Dichotomous Variables in Repeated Measures Experiments.” Psychological Bulletin 90:189–195.
Least squares (general linear models)
Inputs → Outputs: response + predictors (+ optional weights) → coefficient estimates, standard errors, tests, fitted values.
Using the method of least squares to fit general linear models. Can be applied for:
- Simple regression
- Multiple regression
- ANOVA (especially unbalanced designs)
- ANCOVA
- Response surface models
- Weighted regression
- Polynomial regression
- Partial correlation
- MANOVA
- Repeated-measures ANOVA
References
- Box, G. E. P. (1953). “Non-normality and Tests on Variances.” Biometrika 40:318–335.
- Box, G. E. P. (1954). “Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, Part 2: Effects of Inequality of Variance and of Correlation between Errors in the Two-Way Classification.” Annals of Mathematical Statistics 25:484–498.
Design matrix construction
Inputs → Outputs: factors + coding choices (contrasts) → design matrix X + parameter interpretation map.
Constructing a design matrix for a general linear model; essentially constituting the model-building front end for using the method of least squares to fit general linear models.
References
- Erdman, L. W. (1946). “Studies to Determine If Antibiosis Occurs among Rhizobia.” Journal of the American Society of Agronomy 38:251–258.
Generalized linear models (GLM)
Inputs → Outputs: exponential-family response + predictors (+ link) → fitted coefficients, deviance, predictions.
Fitting generalized linear models.
References
- Nelder, J. A., and Wedderburn, R. W. M. (1972). “Generalized Linear Models.” Journal of the Royal Statistical Society, Series A 135:370–384.
- Akaike, H. (1981). “Likelihood of a Model and Information Criteria.” Journal of Econometrics 16:3–14.
- Gamerman, D. (1997). “Sampling from the Posterior Distribution in Generalized Linear Models.” Statistics and Computing 7:57–68.
Quantal response models (discrete outcomes)
Inputs → Outputs: binary / quantal responses + explanatory variables → fitted regression parameters (MLE), predicted probabilities, and threshold-style summaries.
Investigating the relationship between discrete responses and explanatory variables. This includes classic dose–response setups (quantal response in assays) and more general binary-outcome regression framing.
A common goal is to estimate regression parameters and (when meaningful) a natural or threshold response rate, using maximum likelihood methods.
References
- Finney, D. J. (1947). “The Estimation from Individual Records of the Relationship between Dose and Quantal Response.” Biometrika 34:320–334.
- Brier, G. W. (1950). “Verification of Forecasts Expressed in Terms of Probability.” Monthly Weather Review 78:1–3.
- Youden, W. J. (1950). “Index for Rating Diagnostic Tests.” Cancer 3:32–35.
- Aitchison, J., and Silvey, S. (1957). “The Generalization of Probit Analysis to the Case of Multiple Responses.” Biometrika 44:131–140.
- Ashford, J. R. (1959). “An Approach to the Analysis of Data for Semi-Quantal Responses in Biological Assay.” Biometrics 15:573–581.
- Hubert, J. J., Bohidar, N. R., and Peace, K. E. (1988). “Assessment of Pharmacological Activity.” In Biopharmaceutical Statistics for Drug Development, edited by K. E. Peace, 83–145. New York: Marcel Dekker.
Bayesian generalized linear mixed models (GLMM)
Inputs → Outputs: clustered / hierarchical data + fixed & random effects + priors → posterior for effects and variance components.
Providing Bayesian inference for generalized linear mixed models.
References
- Jennrich, R. I., and Schluchter, M. D. (1986). “Unbalanced Repeated-Measures Models with Structured Covariance Matrices.” Biometrics 42:805–820.
- Eilers, P. H. C., and Marx, B. D. (1996). “Flexible Smoothing with B-Splines and Penalties.” Statistical Science 11:89–121. With discussion.
Bayesian discrete choice models
Inputs → Outputs: choice data + covariates + priors → posterior for utility parameters, predicted choice probabilities.
Performing Bayesian analysis for discrete choice models.
References
- McFadden, D. (1974). “Conditional Logit Analysis of Qualitative Choice Behavior.” In Frontiers in Econometrics, edited by P. Zarembka, 105–142. New York: Academic Press.
- McFadden, D. (1978). “Modelling the Choice of Residential Location.” In Spatial Interaction Theory and Planning Models, edited by A. Karlqvist, L. Lundqvist, F. Snickars, and J. Weibull, 75–96. Amsterdam: North-Holland.
- McFadden, D. (2001). “Economic Choices.” American Economic Review 91:351–378.
Finite mixture models
Inputs → Outputs: responses from a mixture distribution (+ optional covariates) → component parameters, mixing proportions, memberships.
Fitting statistical models to data for which the distribution of the response is a finite mixture of distributions; that is, each response is drawn with unknown probability from one of several distributions.
References
- Pearson, K. (1915). “On Certain Types of Compound Frequency Distributions in Which the Components Can Be Individually Described by Binomial Series.” Biometrika 11:139–144.
- Fisher, R. A. (1921). “On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample.” Metron 1:3–32.
- Brier, S. S. (1980). “Analysis of Contingency Tables under Cluster Sampling.” Biometrika 67:591–596.
- Breslow, N. E. (1984). “Extra-Poisson Variation in Log-Linear Models.” Journal of the Royal Statistical Society, Series C 33:38–44.
- Dennis, J. E., and Mei, H. H. W. (1979). “Two New Unconstrained Optimization Algorithms Which Use Function and Gradient Values.” Journal of Optimization Theory and Applications 28:453–482.
- Wang, P., Puterman, M. L., Cockburn, I., and Le, N. (1996). “Mixed Poisson Regression Models with Covariate Dependent Rates.” Biometrics 52:381–400.
Censored regression (Tobit and limited dependent variables)
Inputs → Outputs: censored / limited response + covariates → parameter estimates and predictions under censoring.
Fitting parametric models to limited dependent variables where the response may be censored (e.g., at zero).
References
- Tobin, J. (1958). “Estimation of Relationships for Limited Dependent Variables.” Econometrica 26:24–36.
- Mroz, T. A. (1987). “The Sensitivity of an Empirical Model of Married Women’s Work to Economic and Statistical Assumptions.” Econometrica 55:765–799.
Cox proportional hazards (survival regression)
Inputs → Outputs: time-to-event + censoring + covariates → hazard ratios via partial likelihood, survival predictions.
Performing regression analysis of survival data based on the Cox proportional hazards model.
References
- Cox, D. R. (1972). “Regression Models and Life-Tables.” Journal of the Royal Statistical Society, Series B 34:187–220. With discussion.
- Breslow, N. E. (1972). “Discussion of Professor Cox’s Paper.” Journal of the Royal Statistical Society, Series B 34:216–217.
- Breslow, N. E. (1974). “Covariance Analysis of Censored Survival Data.” Biometrics 30:89–99.
- Cox, D. R. (1975). “Partial Likelihood.” Biometrika 62:269–276.
- Breslow, N. E., and Clayton, D. G. (1993). “Approximate Inference in Generalized Linear Mixed Models.” Journal of the American Statistical Association 88:9–25.