Module 1: Introduction to Statistics and Data
This module covers the basic definitions and types of data/variables, which form the bedrock of the course.
-
Statistical Concepts and Definitions:
- Types of Statistics: Descriptive vs. Inferential Statistics (making decisions about a population based on sample results).
- Core Properties: Understanding that the sample mean is a statistic and a portion of the population.
-
Types of Data and Variables:
- Types of Data: Primary Data vs. Secondary Data.
- Types of Variables: Discrete Variable (and implicitly Continuous Variable).
- Data Collection Methods: E.g., Questionnaire Survey Method.
Module 2: Data Organization and Frequency Distributions
This module focuses on arranging raw data into a more manageable format for analysis.
-
Data Organization:
- Arrayed Data: Listing data in order.
- Stem and Leaf Plot: Creating and interpreting a stem and leaf display.
-
Frequency Distributions:
-
Frequency Table Components:
- Class Mark (midpoint of class limits)
- Lower Class Boundary
- Frequency and Cumulative Frequency (CF)
- Relative Frequency (RF) and Cumulative Relative Frequency (CRF)
- Modal Class (the class with the highest frequency)
-
Frequency Table Components:
Module 3: Measures of Location (Central Tendency)
This module is critical and focuses on finding the "center" or average of a dataset.
-
Mean: Calculating the arithmetic mean for both raw
and grouped data.
- Mean Deviation Property: The algebraic sum of deviations from the mean is zero: $\sum_{i=1}^n (x_i - \bar{x}) = 0$.
- Median: Determining the middle value (position and calculation for odd/even number of observations).
- Mode: Identifying the most frequently occurring value.
Module 4: Measures of Dispersion (Spread)
This module is about quantifying the spread or variability within the data.
-
Absolute Measures of Dispersion:
- Range: The difference between the highest and lowest values.
- Mean Deviation (MD): The average of the absolute deviations from the mean.
- Variance: The square of the standard deviation.
- Standard Deviation (S): Calculation and interpretation of the most common measure of spread.
-
Relative Measures of Dispersion:
- Coefficient of Variation (CV): $\text{CV} = (S / \bar{x}) \times 100$. Used for comparing variability between two or more datasets (unit-free).
- Properties: Measures of dispersion are always positive/never negative.
Module 5: Measures of Location and Distribution Shape
This module deals with dividing the dataset into equal parts and describing the symmetry (or lack thereof) of the distribution.
-
Measures of Location and Position (Quantiles):
*Note: These are measurements that divide observations into equal parts.
-
Quartiles:
- Lower Quartile ($Q_1$)
- Upper Quartile ($Q_3$)
- Inter Quartile Range (IQR) = $Q_3 - Q_1$
- Percentiles: Calculating $P_{25}$, $P_{50}$ (which is the Median), and $P_{75}$.
-
Quartiles:
-
Skewness:
-
Types of Skewness:
- Negatively Skewed Distribution: Mean < Median < Mode (tail to the left).
- Positively Skewed Distribution: Mean > Median > Mode (tail to the right).
- Coefficient of Skewness.
-
Types of Skewness:
Module 6: Basic Probability
This module introduces the rules and methods for calculating the likelihood of events.
- Calculating Probability: Ratio of required outcomes to total possible outcomes: $P(E) = \text{Number of Favorable Outcomes} / \text{Total Possible Outcomes}$.
-
Rules of Probability:
- Mutually Exclusive Events: $P(A \cap B) = 0$.
- Independent Events: Calculating the probability of multiple independent events occurring.
- "At Least One": Solving problems involving the probability of "at least one" event.
- Contingency Tables: Working with Joint and Marginal Probabilities in a table format.