Probabilities and Statistics

Last minute GG

1. Intro to stats

Four phases of a statistical study:

Definitions:

population: everyone that you are interested in the study
sample: some people you pick from population (because you cannot possibly ask everyone)
variable: a characteristic of the population/sample
parameter: a characteristic of a population
statistic: a characteristic of a sample

Distinguish data types:

data type	explanation
qualitative	no numbers, only categories (e.g. red, green, blue)
quantitative	has number (1000 meters)

Level of measurement:

lvl	explanation
nominal	like qualitative data, only categories no numbers
ordinal	only the order matters, cant perform calculation (e.g. rate this guide from 1 to 10)
interval	number on a scale, no natural zero (zero only means a point on a scale)
ratio	same as interval but has natural zero (zero means none)

Sampling techniques:

simple random
- select randomly out of the whole group. Everyone has the same chance of being selected
stratified
- separated population into groups (strata) and random sample people in each group
cluster
- separate population into groups based on details (age, sex etc) and
  - randomly select the group (single stage)
  - randomly select the group and then randomly select people in the group (two stage)
systematic
- picking people from a list at equal steps, starting from a random point
convenience (non-probability sampling)
- use when you are lazy (choose a class to represent the whole school)

all sampling methods except for convenience sampling is considered probability sampling

Chart types:

bar
pie
Pareto
- both lines and bars
- bars: individual values
- lines: cumulative total
scatter
histogram
- like bar chart but for continuous data
dot
- display distribution shape
stem and leaf
- to compare two samples
box and whisker
- display five number summary
- min, q1, q2 (median), q3, max

You are cooked :(

Frequency distribution table: a table of how many occurrences of each value

Steps:

Find number of classes (5-20). If too many, use formula $\approx 1 + 3.3\log{n}$
Class width
- $\frac{\text{largest value} - \text{smallest value}}{\text{number of classes}}$ then round up
- or subtract the lower limit of two consecutive classes
Class limits
- lower limit: choose smallest value in the set then add class width
- upper limit: subtract one (or appropriate value) from the second class
Class boundaries
- minus 0.5 (or 0.05) on the smallest value, then keep adding class width
Class mark (class midpoint)
- average of lower and upper limit of one class

Frequencies:

Three Ms are all on the formula sheet

And then population variance = $\sum x^2 P(x) - \mu^2$

Use calculator

Empirical rule is on the formula sheet, ez

Bombed lowkey just read text book its not hard gang