STA111: Statistical Inference

Module 1: Introduction to Statistics and Data

This module covers the basic definitions and types of data/variables, which form the bedrock of the course.

Statistical Concepts and Definitions:
- Types of Statistics: Descriptive vs. Inferential Statistics (making decisions about a population based on sample results).
- Core Properties: Understanding that the sample mean is a statistic and a portion of the population.
Types of Data and Variables:
- Types of Data: Primary Data vs. Secondary Data.
- Types of Variables: Discrete Variable (and implicitly Continuous Variable).
Data Collection Methods: E.g., Questionnaire Survey Method.

This module focuses on arranging raw data into a more manageable format for analysis.

Data Organization:
- Arrayed Data: Listing data in order.
- Stem and Leaf Plot: Creating and interpreting a stem and leaf display.
Frequency Distributions:
- Frequency Table Components:
  - Class Mark (midpoint of class limits)
  - Lower Class Boundary
  - Frequency and Cumulative Frequency (CF)
  - Relative Frequency (RF) and Cumulative Relative Frequency (CRF)
  - Modal Class (the class with the highest frequency)

This module is critical and focuses on finding the "center" or average of a dataset.

Mean: Calculating the arithmetic mean for both raw and grouped data.
- Mean Deviation Property: The algebraic sum of deviations from the mean is zero: $\sum_{i=1}^n (x_i - \bar{x}) = 0$.
Median: Determining the middle value (position and calculation for odd/even number of observations).
Mode: Identifying the most frequently occurring value.

This module is about quantifying the spread or variability within the data.

Absolute Measures of Dispersion:
- Range: The difference between the highest and lowest values.
- Mean Deviation (MD): The average of the absolute deviations from the mean.
- Variance: The square of the standard deviation.
- Standard Deviation (S): Calculation and interpretation of the most common measure of spread.
Relative Measures of Dispersion:
- Coefficient of Variation (CV): $\text{CV} = (S / \bar{x}) \times 100$. Used for comparing variability between two or more datasets (unit-free).
Properties: Measures of dispersion are always positive/never negative.

This module deals with dividing the dataset into equal parts and describing the symmetry (or lack thereof) of the distribution.

Measures of Location and Position (Quantiles):
*Note: These are measurements that divide observations into equal parts.
- Quartiles:
  - Lower Quartile ($Q_1$)
  - Upper Quartile ($Q_3$)
  - Inter Quartile Range (IQR) = $Q_3 - Q_1$
- Percentiles: Calculating $P_{25}$, $P_{50}$ (which is the Median), and $P_{75}$.
Skewness:
- Types of Skewness:
  - Negatively Skewed Distribution: Mean < Median < Mode (tail to the left).
  - Positively Skewed Distribution: Mean > Median > Mode (tail to the right).
- Coefficient of Skewness.

This module introduces the rules and methods for calculating the likelihood of events.

Calculating Probability: Ratio of required outcomes to total possible outcomes: $P(E) = \text{Number of Favorable Outcomes} / \text{Total Possible Outcomes}$.
Rules of Probability:
- Mutually Exclusive Events: $P(A \cap B) = 0$.
- Independent Events: Calculating the probability of multiple independent events occurring.
- "At Least One": Solving problems involving the probability of "at least one" event.
Contingency Tables: Working with Joint and Marginal Probabilities in a table format.