This course provides a thorough introduction to the General Linear Model, which incorporates analyses such as multiple regression, ANOVA, ANCOVA, repeated-measures ANOVA. We will also cover extensions to linear mixed-effects models and logistic regression. All techniques will be discussed within a general framework of building and comparing statistical models. Practical experience in applying the methods will be developed through exercises with the statistics package SPSS.

Lecture 1

Ignore cookbook approach, do model comparison.

General linear model.

Inference as attempted generalization from sample to population (non-Bayesian?).

Want estimators to be:

Efficient estimators:

$MSE = \sum (Y_i - \hat{Y}_i)^2 / n - p$.

TODO Why degrees of freedom?



Lecture 2

Model is model of population (which implies that we can include sampling method in inference if we think we can accurately model the bias).

Sum of squares reduced $SSR = \operatorname{SSE}(C) - \operatorname{SSE}(A)$

Proportional reduction in error $PRE = \frac{\operatorname{SSE}(C) - \operatorname{SSE}(A)}{\operatorname{SSE}(C)}$. On population is usually denoted $\eta^2$.

F-score for GLM: $F = \frac{\mathrm{PRE} / (\mathrm{PA} - \mathrm{PC})}{(1-\mathrm{PRE})/(n - \mathrm{PA})} \sim F(\mathrm{PA} - \mathrm{PC}, n - \mathrm{PA})$

F-test: reject null if $P_\mathrm{null}(F > F_\mathrm{observed}) < \alpha$. Fixes $P_\mathrm{null}(\mathrm{Type1}) = \alpha$. Produces tradeoff curve between $P_\mathrm{null}(\mathrm{Type2})$ and real effect size.

95% confidence interval of estimate = on 95% of samples, confidence interval falls around true population value = reject null if (1-$\alpha$) confidence interval does not contain null.



Lecture 3

Multiple regression.

Test for unique effect of $X_i$ by comparing with model where $\beta_i=0$.

Omnibus test - testing multiple parameters at once. Prefer tests where $PA - PC = 1$ - easier to interpret success/failure.

$R^2$ - squared multiple correlation coefficient - ‘coefficient of determination’ - ‘proportion of variance explained’ - PRE of model over $Y_i = \beta_0 + \epsilon_i$.

$\eta^2$ - true value of PRE in population. Unbiased estimate $\hat{\eta}^2 = 1 - \frac{(1 - \mathrm{PRE})(n - \mathrm{PC})}{n - \mathrm{PA}}$.


$1-\alpha$ confidence interval for slope $b_j \pm \sqrt{\frac{F_{1,n-p;\alpha}\mathrm{MSE}}{(n-1)S^2_{X_j}(1-R^2_j)}}$ where:

$(1 - R^2_j)$ also called tolerance - how uniquely useful is $X_j$

Model search:

Better to rely on theory

Note, for null model $Y_i = b_0 + \epsilon $ we get $SSE = (n - 1)\operatorname{Var}(Y_i)$


Lecture 4

GLM assumptions:

  1. Normality - $\epsilon_i \sim Normal$
    • Biased predictions
  2. Unbiasedness - $\epsilon_i$ has mean 0
    • Biased test results
  3. Homoscedasticity - $\epsilon_i$ has constant variance (per i)
    • Unbiased parameter estimates (?)
    • Biased test results
  4. Independence - $\epsilon_i$ are pairwise independent
    • Model mis-specification

Histogram of residuals should be roughly normal (1).

Should be no relationship in residual vs predicted graph (2,3).

Quantile-quantile plot - $Y_i$ vs $Q_i$ where $Q_i$ s.t. $P(Y \leq Q_i) = \hat{p}_i \approx p(Y \leq Y_i)$ ie quantiles vs cdf of normal distribution. If $Y_i$ are normal than should be roughly straight.

Shapiro-Wilk or Kolmogorov-Smirnov tests for normality.

Breush-Pagan or Koenker or Levene test for homoscedasticity.

Randomized control or sequential dependence test for independence.

Transform dependent variables to achieve 1,3. Transform predictor to achieve 2.

Outlier detection:

Outlier tests run on all data points, so need multiple comparison correction.

Multicollinearity - as $R^2_j \xrightarrow 1$ the confidence interval $\xrightarrow \infty$. Detection:

Partial correlation between $Y$ and $X_i$ is $\operatorname{sign}(\beta_i) \sqrt{\operatorname{PRE}(M, M-X_i)} = \frac{\operatorname{PRE}(M, NULL) - \operatorname{PRE}(M - X_i, NULL)}{1 - \operatorname{PRE}(M - X_i, NULL)}$


Lecture 5


Mediation (cf Mediation Analysis):

Caution - Don’t Expect An Easy Answer


Lecture 6

ANOVA - analysis of variance - modeling differences between group means.

Null model = same means.

Contrast codes:

With unequal cell sizes, orthogonal rows can still introduce redundancy (in generate case of only one datapoint Y=0, X+Y and X-Y are orthogonal but perfectly anti-correlated).

Helmert codes - $\lambda_{i,i} = m-i$ and $\forall j > i \ldotp \lambda_{i,j} = -1$

Orthogonal polynomial codes:

Dummy codes - $\lambda_{i,i+1} = 1$ and $\lambda_{i,j} = 0$ otherwise. Not contrast codes - interpret $\lambda_i$ as comparing case $i$ vs case $0$.

Unequal cell sizes are weird, because mean of group means is not mean of individuals.

Multiple comparisons abound.

For power analysis estimate:

\begin{align} \hat{\eta}^2 &= 1 - \frac{(1 - \mathrm{PRE})(n - \mathrm{PC})}{n - \mathrm{PA}} \cr &= \left( \frac{ m \sigma^2 (\sum \lambda^2 / n_k) }{ (\sum \lambda_k \mu_k)^2 } + 1 \right) ^{-1} \end{align}

where $\mu_k$ is predicted group mean and $\sigma^2$ is predicted within-cell variance.

With multiple categorical variables a good tactic is:

Including useful variables often increases power for testing original variables, because reduces error which would obscure small effects.

Tukey-Kramer to test all possible pairs of groups.

Lecture 7

ANCOVA - analysis of covariance - same as ANOVA but with continuous as well as categorical predictors.

Typical use case - control vs treatment whilst controlling for covariate. Similar to before, can increase power by reducing error that is obscuring small effects.

Eg in pre/post test, typically more powerful than just modeling the difference. Latter effectively fixes the pre-test parameter to 1, so is only more powerful if ANCOVA estimate was close to 1.

Homogeneity of regression assumption = no interaction between categorical variable and continuous covariate.

Lecture 8

What if $e_i$ are not independent? Eg grouped or sequential data.

Repeated measures ANOVA - for grouped data, use weighted mean of group score.

I can’t find a reason to prefer this over a hierarchical model.

Lecture 9

Losing interest in the course by this point. Statistical Rethinking is much more useful.

Multi-level models.

Lecture 10

Bayes factors.

Logistic regression.