Stats blog 7 (GLMs: Logistic and Poisson regression)

Helen Li
2 min readMar 27, 2021

I would like to introduce GLMs which are the Logistic and Poisson regression.

Generalized Linear Models

Canonical link functions:

There are a few nice properties that come with using the canonical link:

  1. They ensure \mu be in the range the outcome variable can take
  2. Two methods for finding the MLE, Newton’s method and the Fisher scoring method will be identical when using the canonical link
  3. Residuals will sum to 0
  4. The Hessian for the log-likelihood will be equal to its expected value (The Hessian is the matrix of second derivatives of the likelihood with respect to the parameters)

Logistic regression

  1. Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Some of the examples of classification problems are Email spam or not spam, Online transactions Fraud or not Fraud, Tumor Malignant or Benign. Logistic regression transforms its output using the logistic sigmoid function to return a probability value.
  2. Logistic regression is a statistical model that uses Logistic function to model the conditional probability. This is read as the conditional probability of Y=1, given X or conditional probability of Y=0, given X. An example of logistic regression can be to find if a person will default their credit card payment or not.
  3. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Poisson regression

Use case: Counts, which can also extend to rates, i.e. counts per some unit of time or space

Assumptions: Poisson response, Independence, mean is equal to variance and log(\lambda) is a linear function of x

We interpret coefficients as risk ratios: Similar to logistic regression, we are interpreting the coefficients on the multiplicative scale after transforming with inverse of our link function.

Offsets: Accounts for different denominators in rates while allowing for counts to still be the response.

Key concepts for Poisson regression

  1. Overdispersion: A key assumption of Poisson regression is that the mean and variance of our response are equal. If the variance is greater than the mean we have overdispersion. If we ignore overdispersion, our standard errors might be falsely small, meaning we’ll probably have falsely small p-values, which might lead us to choose a more complicated model.
  2. Negative binomial regression: This is another approach to dealing with overdispersion. As we’ve seen negative binomial isn’t part of the one parameter exponential family, it requires a second parameter, but this gives us more flexibility and assumes an explicit likelihood function.
  3. Zero-inflated Poisson (ZIP): Some situations have two underlying processes, one that is Poisson and one that always produces zeros.

--

--