STA111: Statistical Inference

An outline covering the foundational concepts of descriptive statistics , data organization, and basic probability.

Module 1: Introduction to Statistics and Data

This module covers the basic definitions and types of data/variables, which form the bedrock of the course.

  • Statistical Concepts and Definitions:
    • Types of Statistics: Descriptive vs. Inferential Statistics (making decisions about a population based on sample results).
    • Core Properties: Understanding that the sample mean is a statistic and a portion of the population.
  • Types of Data and Variables:
    • Types of Data: Primary Data vs. Secondary Data.
    • Types of Variables: Discrete Variable (and implicitly Continuous Variable).
  • Data Collection Methods: E.g., Questionnaire Survey Method.

Module 2: Data Organization and Frequency Distributions

This module focuses on arranging raw data into a more manageable format for analysis.

  • Data Organization:
    • Arrayed Data: Listing data in order.
    • Stem and Leaf Plot: Creating and interpreting a stem and leaf display.
  • Frequency Distributions:
    • Frequency Table Components:
      • Class Mark (midpoint of class limits)
      • Lower Class Boundary
      • Frequency and Cumulative Frequency (CF)
      • Relative Frequency (RF) and Cumulative Relative Frequency (CRF)
      • Modal Class (the class with the highest frequency)

Module 3: Measures of Location (Central Tendency)

This module is critical and focuses on finding the "center" or average of a dataset.

  • Mean: Calculating the arithmetic mean for both raw and grouped data.
    • Mean Deviation Property: The algebraic sum of deviations from the mean is zero: $\sum_{i=1}^n (x_i - \bar{x}) = 0$.
  • Median: Determining the middle value (position and calculation for odd/even number of observations).
  • Mode: Identifying the most frequently occurring value.

Module 4: Measures of Dispersion (Spread)

This module is about quantifying the spread or variability within the data.

  • Absolute Measures of Dispersion:
    • Range: The difference between the highest and lowest values.
    • Mean Deviation (MD): The average of the absolute deviations from the mean.
    • Variance: The square of the standard deviation.
    • Standard Deviation (S): Calculation and interpretation of the most common measure of spread.
  • Relative Measures of Dispersion:
    • Coefficient of Variation (CV): $\text{CV} = (S / \bar{x}) \times 100$. Used for comparing variability between two or more datasets (unit-free).
  • Properties: Measures of dispersion are always positive/never negative.

Module 5: Measures of Location and Distribution Shape

This module deals with dividing the dataset into equal parts and describing the symmetry (or lack thereof) of the distribution.

  • Measures of Location and Position (Quantiles):

    *Note: These are measurements that divide observations into equal parts.

    • Quartiles:
      • Lower Quartile ($Q_1$)
      • Upper Quartile ($Q_3$)
      • Inter Quartile Range (IQR) = $Q_3 - Q_1$
    • Percentiles: Calculating $P_{25}$, $P_{50}$ (which is the Median), and $P_{75}$.
  • Skewness:
    • Types of Skewness:
      • Negatively Skewed Distribution: Mean < Median < Mode (tail to the left).
      • Positively Skewed Distribution: Mean > Median > Mode (tail to the right).
    • Coefficient of Skewness.

Module 6: Basic Probability

This module introduces the rules and methods for calculating the likelihood of events.

  • Calculating Probability: Ratio of required outcomes to total possible outcomes: $P(E) = \text{Number of Favorable Outcomes} / \text{Total Possible Outcomes}$.
  • Rules of Probability:
    • Mutually Exclusive Events: $P(A \cap B) = 0$.
    • Independent Events: Calculating the probability of multiple independent events occurring.
    • "At Least One": Solving problems involving the probability of "at least one" event.
  • Contingency Tables: Working with Joint and Marginal Probabilities in a table format.