Probabilities and Statistics

Dawson 201-SN1-RE.

Last minute GG

1. Intro to stats

Four phases of a statistical study:

  1. Data collection
  2. Data analysis
  3. Data presentation
  4. Data interpretation

Definitions:

  • population: everyone that you are interested in the study
  • sample: some people you pick from population (because you cannot possibly ask everyone)
  • variable: a characteristic of the population/sample
  • parameter: a characteristic of a population
  • statistic: a characteristic of a sample

Distinguish data types:

data typeexplanation
qualitativeno numbers, only categories (e.g. red, green, blue)
quantitativehas number (1000 meters)

Level of measurement:

lvlexplanation
nominallike qualitative data, only categories no numbers
ordinalonly the order matters, cant perform calculation (e.g. rate this guide from 1 to 10)
intervalnumber on a scale, no natural zero (zero only means a point on a scale)
ratiosame as interval but has natural zero (zero means none)

Sampling techniques:

  • simple random
    • select randomly out of the whole group. Everyone has the same chance of being selected
  • stratified
    • separated population into groups (strata) and random sample people in each group
  • cluster
    • separate population into groups based on details (age, sex etc) and
      • randomly select the group (single stage)
      • randomly select the group and then randomly select people in the group (two stage)
  • systematic
    • picking people from a list at equal steps, starting from a random point
  • convenience (non-probability sampling)
    • use when you are lazy (choose a class to represent the whole school)

all sampling methods except for convenience sampling is considered probability sampling

Chart types:

  • bar
  • pie
  • Pareto
    • both lines and bars
    • bars: individual values
    • lines: cumulative total
  • scatter
  • histogram
    • like bar chart but for continuous data
  • dot
    • display distribution shape
  • stem and leaf
    • to compare two samples
  • box and whisker
    • display five number summary
    • min, q1, q2 (median), q3, max

You are cooked :(

2. Descriptive statistics

Frequency distribution table: a table of how many occurrences of each value

  • ungrouped: use when having little different values
  • grouped: use when having too many different values

Steps:

  1. Find number of classes (5-20). If too many, use formula 1+3.3logn\approx 1 + 3.3\log{n}
  2. Class width
    • largest valuesmallest valuenumber of classes\frac{\text{largest value} - \text{smallest value}}{\text{number of classes}} then round up
    • or subtract the lower limit of two consecutive classes
  3. Class limits
    • lower limit: choose smallest value in the set then add class width
    • upper limit: subtract one (or appropriate value) from the second class
  4. Class boundaries
    • minus 0.5 (or 0.05) on the smallest value, then keep adding class width
  5. Class mark (class midpoint)
    • average of lower and upper limit of one class

Frequencies:

  • f.: how many times that class happened
  • c.f.: keep adding previous f.
  • r.f.: divide f. of that class by n
  • r.c.f.: keep adding previous r.f.

Three Ms are all on the formula sheet

  • Mean

    • Population: μ\mu
    • Sample: xˉ\bar{x}
  • Median: middle value

    • if even, average of middle two
  • Mode: most frequent

  • Percentile: multiply with n and round up

  • Quartile: P25 P50 and P75, round up

  • Range: max - min

  • Outlier: IQR = Q3 - Q1

    • 1.5 IQR more than Q3 or
    • 1.5 IQR less than Q1
  • variance

    • calculate by hand with a table like this
datadata
xxxxxx
P(x)P(x)xxxx
xP(x)x P(x)xxxx
x2P(x)x^2 P(x)xxxx

And then population variance = x2P(x)μ2\sum x^2 P(x) - \mu^2

Use calculator

Empirical rule is on the formula sheet, ez

3. Linear regression type

Bombed lowkey just read text book its not hard gang