Sign in

Today, I would like to share some projects I have done in other courses I have taken before. I have taken STA304 which is related to the design of surveys, sources of bias, observational data and techniques of sampling such as stratification, clustering, estimates of population mean and variances and…

I would like to introduce GLMs which are the Logistic and Poisson regression.

Generalized Linear Models

Canonical link functions:

There are a few nice properties that come with using the canonical link:

  1. They ensure \mu be in the range the outcome variable can take
  2. Two methods for finding the MLE, Newton’s method and…


Distributions are quite important in statistics since it provides a parameterized mathematical function that can be used to calculate the probability for any individual observation from the sample space. This distribution describes the grouping or the density of the observations, called the probability density function.

There are different statistical distributions…

Vocabulary (related to correlated data)

  1. Fixed effects: These are non-random quantities. All the coefficients we were estimating in STA302 are the examples of fixed effects as we were not treating beta_1 as a random variable.
  2. Random effects: These are random quantities. These model parameters are treated as random variables.
  3. Mixed effects model: A model that…

Today, I would like to share some assignments and projects I have done in other courses I have taken before. …

Today, I would like to share some topics related to ethics and communication.

Ethical codes for statisticians / data scientists

As ethical statisticians, it is quite important to:

  1. be accurate in our analyses and conclusions
  2. be alert to possible consequences of our results/recommendations on others
  3. be honest in reporting results, even when we don’t get the results…

Today, I would like to share some knowledge related to tidy data and data wrangling learned from the previous statistics classes and STA303.

Tidy data:

There are three interrelated rules which make a dataset tidy

  1. Each variable must have its own column.
  2. Each observation must have its own row.
  3. Each value must…

Why model: Our goal of using model is to describe data, to make inferences about a population or to make predictions about the future.

Review of linear regression: We have known that a model is linear if it is linear in the parameters which is quite important when we would…

Helen Li

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store