Probabilities and Statistics
Dawson 201-SN1-RE.
Last minute GG
1. Intro to stats
Four phases of a statistical study:
- Data collection
- Data analysis
- Data presentation
- Data interpretation
Definitions:
- population: everyone that you are interested in the study
- sample: some people you pick from population (because you cannot possibly ask everyone)
- variable: a characteristic of the population/sample
- parameter: a characteristic of a population
- statistic: a characteristic of a sample
Distinguish data types:
| data type | explanation |
|---|---|
| qualitative | no numbers, only categories (e.g. red, green, blue) |
| quantitative | has number (1000 meters) |
Level of measurement:
| lvl | explanation |
|---|---|
| nominal | like qualitative data, only categories no numbers |
| ordinal | only the order matters, cant perform calculation (e.g. rate this guide from 1 to 10) |
| interval | number on a scale, no natural zero (zero only means a point on a scale) |
| ratio | same as interval but has natural zero (zero means none) |
Sampling techniques:
- simple random
- select randomly out of the whole group. Everyone has the same chance of being selected
- stratified
- separated population into groups (strata) and random sample people in each group
- cluster
- separate population into groups based on details (age, sex etc) and
- randomly select the group (single stage)
- randomly select the group and then randomly select people in the group (two stage)
- separate population into groups based on details (age, sex etc) and
- systematic
- picking people from a list at equal steps, starting from a random point
- convenience (non-probability sampling)
- use when you are lazy (choose a class to represent the whole school)
all sampling methods except for convenience sampling is considered probability sampling
Chart types:
- bar
- pie
- Pareto
- both lines and bars
- bars: individual values
- lines: cumulative total
- scatter
- histogram
- like bar chart but for continuous data
- dot
- display distribution shape
- stem and leaf
- to compare two samples
- box and whisker
- display five number summary
- min, q1, q2 (median), q3, max
You are cooked :(
2. Descriptive statistics
Frequency distribution table: a table of how many occurrences of each value
- ungrouped: use when having little different values
- grouped: use when having too many different values
Steps:
- Find number of classes (5-20). If too many, use formula
- Class width
- then round up
- or subtract the lower limit of two consecutive classes
- Class limits
- lower limit: choose smallest value in the set then add class width
- upper limit: subtract one (or appropriate value) from the second class
- Class boundaries
- minus 0.5 (or 0.05) on the smallest value, then keep adding class width
- Class mark (class midpoint)
- average of lower and upper limit of one class
Frequencies:
- f.: how many times that class happened
- c.f.: keep adding previous f.
- r.f.: divide f. of that class by n
- r.c.f.: keep adding previous r.f.
Three Ms are all on the formula sheet
-
Mean
- Population:
- Sample:
-
Median: middle value
- if even, average of middle two
-
Mode: most frequent
-
Percentile: multiply with n and round up
-
Quartile: P25 P50 and P75, round up
-
Range: max - min
-
Outlier: IQR = Q3 - Q1
- 1.5 IQR more than Q3 or
- 1.5 IQR less than Q1
-
variance
- calculate by hand with a table like this
| data | data | |
|---|---|---|
And then population variance =
Use calculator
Empirical rule is on the formula sheet, ez
3. Linear regression type
Bombed lowkey just read text book its not hard gang