Stats blog 8 (share some experiences related to statistics)

Helen Li
2 min readMar 28, 2021

Today, I would like to share some projects I have done in other courses I have taken before. I have taken STA304 which is related to the design of surveys, sources of bias, observational data and techniques of sampling such as stratification, clustering, estimates of population mean and variances and so on. Moreover, I also learnt how to write a well-structured report and how to analyze the data and some topics.

In STA304, I completed a project related to the prediction of the overall popular vote of the 2020 American federal election. In this analysis, I used a technique called post-stratification to classify sample census data into post-strata by five different estimators (weighting factors) and adjusted weight within each post-stratum. Then, I applied a logistic regression model to estimate the proportion of voters who vote for Donald Trump. I also illustrated the voting result by pie chart based on the post-stratification analysis and modeling by a logistic regression. At the end of the analysis, I summarized the voting result, discussed the weaknesses and next steps of this prediction.

I would like to share my GitHub repo: https://github.com/Hiraethwly/Forcasting_US_Election.git

I also completed a project related to the analysis of Canadian General Social Survey Data (GSS). In this analysis, I first introduced the survey program and data by discussing the target population, sampling frame, the frame population, sampling method, (non-response problems and solutions) and the strengths and weaknesses. Then, I selected the key features in this survey data by using Bayesian Information Criterion which involves choosing the optimum model by deleting one most insignificant feature a time. I also demonstrated some basic composition of the sample, variable of interest and relationship between variables by using ggplot function to produce pie chart, histogram, boxplot and summary table. Moreover, I fitted a multiple linear regression model, diagnosed the model to evaluate model assumptions and applied hypothesis testing.

I would like to share my GitHub repo: https://github.com/Hiraethwly/STA304PromblemSet2

--

--