Stats blog 3 (Ethics and communication)

Helen Li
3 min readMar 21, 2021

Today, I would like to share some topics related to ethics and communication.

Ethical codes for statisticians / data scientists

As ethical statisticians, it is quite important to:

  1. be accurate in our analyses and conclusions
  2. be alert to possible consequences of our results/recommendations on others
  3. be honest in reporting results, even when we don’t get the results we hoped for
  4. be respectful of other reasonable results (based on well-conducted research) even if they diff from our own
  5. share credit when our work is based on the ideas of others

Confounding and study design

Confounders (confounding factors or confounding variables):

  1. A confounding variable is a variable that influences both the explanatory variable and the response variable (If we fail to account for our confounding variable, either by not measuring it or not including it, we can come to incorrect conclusions)
  2. In an observational study, variables are “observed” (measured and recorded) without manipulation of variables or conditions by the researcher
  3. Two variables are confounded if their effects on the response variable are mixed together and there is no way to separate them out. If this is the case, we have no way of determining which variable is causing changes to the response

When we have data from an observational study, we can only conclude association between variables, not causation.

Designing studies to avoid confounding

  1. In an experiment (or randomized trial or randomized control trial) variables and/or conditions are manipulated by the researcher and the impact on other variable(s) is measured and recorded
  2. If there is a significant difference in the outcome between the two groups, we may have evidence that there is a causal relationship between the treatment and the outcome
  3. Although well-designed randomized trials are the best way to establish a causal relationship, observational studies can also help build evidence for causation

Human research ethics

The Nuremberg code was formulated in August 1947 in Nuremberg, Germany, by American judges sitting in judgement of Nazi doctors accused of conducting murderous and torturous human experiments in concentration camps during the war.

The Nuremberg code codified many of our standard principles of ethical research, including:

  1. research must appropriately balance risk and potential benefits
  2. researchers must be well-versed in their discipline and ground human experiments in animal trials

Principles of free and informed consent

  1. Information: The researcher procedure, risks and anticipated benefits, alternative procedures (where therapy is involved) and a statement offering the participant the opportunity to ask questions and to withdraw at any time from the research
  2. Comprehension: The manner and context in which information is conveyed is as important as the information itself
  3. Voluntariness: An agreement to participate in research constitutes a valid consent only if it is voluntary; this requires conditions free of coercion and inappropriate influence

Web scraping and APIs

Web scraping (also known as web harvesting, web crawling or web data extraction) is any method of copying data from a webpage, usually to then store it in a spreadsheet or database.

An ethical scraper should follow the site’s terms and conditions and/or robots.txt, use an API when provided, rate limit their requests and credit their sources.

API stands for application programming interface. It is a structured way for data (broadly) requests to be made and fulfilled with computers.

Indigenous data sovereignty

Data sovereignty: Countries and nations tend to want data collected and stored about them/their people to be subject to their laws.

The statisticians should:

  1. Be aware of indigenous rights and interests in relation to data
  2. Understand protocols for consulting with Indigenous peoples about data collection, access and use
  3. Ensure data for and about Indigenous peoples we are given access to is safeguarded and protected
  4. Support quality and integrity of Indigenous data and its collection
  5. Advocate for Indigenous involvement in the governance of data repositories
  6. Support the development of Indigenous data infrastructure and security systems

Selection bias

Selection bias can occur in a range of ways, but the key feature is that your sample is not representative of the population.

Algorithmic bias: Prediction models are taught what they “know” from training data. Training data can be incomplete, biased, or skewed. This can result in algorithmic bias.

--

--