Helpful terms (not from book)
- A parameter is a value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic. For example, the population mean is a parameter that is often used to indicate the average value of a quantity. Within a population, a parameter is a fixed value which does not vary. Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean in the population from which that sample was drawn.
- A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. For example, the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn. It is possible to draw more than one sample from the same population and the value of a statistic will in general vary from sample to sample. For example, the average value in a sample is a statistic. The average values in more than one sample, drawn from the same population, will not necessarily be equal.
- A sample is a group of units selected from a larger group (the population). By studying the sample it is hoped to draw valid conclusions about the larger group. A sample is generally selected for study because the population is too large to study in its entirety. The sample should be representative of the general population. This is often best achieved by random sampling. Also, before collecting the sample, it is important that the researcher carefully and completely defines the population, including a description of the members to be included.
- A population is any entire collection of people, animals, plants or things from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about. In order to make any generalisations about a population, a sample, that is meant to be representative of the population, is often studied. For each population there are many possible samples. A sample statistic gives information about a corresponding population parameter. For example, the sample mean for a set of data would give information about the overall population mean.
- random variable is a function that associates a unique numerical value with every outcome of an experiment. The value of the random variable will vary from trial to trial as the experiment is repeated; there are two types of random variable - discrete and continuous; a random variable has either an associated probability distribution (discrete random variable) or probability density function (continuous random variable).
- A discrete random variable is one which may take on only a countable number of distinct values such as 0, 1, 2, 3, 4, … Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor’s surgery, the number of defective light bulbs in a box of ten.
- A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile.
- An estimator is any quantity calculated from the sample data which is used to give information about an unknown quantity in the population. For example, the sample mean is an estimator of the population mean.
- A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter.
- The null hypothesis, H0, represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug. We would write H0: there is no difference between the two drugs on average. We give special consideration to the null hypothesis. This is due to the fact that the null hypothesis relates to the statement being tested, whereas the alternative hypothesis relates to the statement to be accepted if / when the null is rejected. The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either “Reject H0 in favour of H1” or “Do not reject H0”; we never conclude “Reject H1”, or even “Accept H1”. If we conclude “Do not reject H0”, this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favour of H1. Rejecting the null hypothesis then, suggests that the alternative hypothesis may be true.
- The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true. It is the probability of wrongly rejecting the null hypothesis if it is in fact true. It is equal to the significance level of the test for which we would only just reject the null hypothesis. The p-value is compared with the actual significance level of our test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% signficance level, this would be reported as “p < 0.05”. Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding “Reject H0’ or “Do not reject H0”.
- treatment/level is something that researchers administer to experimantal units
- A factor of an experiment is a controlled independent variable; a variable whose levels are set by the experimenter. A factor is a general type or category of treatments. Different treatments constitute different levels of a factor. For example, three different groups of runners are subjected to different training methods. The runners are the experimental units, the training methods, the treatments, where the three types of training methods constitute three levels of the factor ‘type of training’.
#**Chapter 1**
- empirical model (page 2,18) - experimentally determined models; an equation derived from the data that expresses the relationship between the response and the important design factors
- replication (page 12) is an independent repeat of each factor combination, shows sources of variance between and (potentially) within runs
- repeated measurements (page 13), in contrast, show the variability in measurements. Ex: measuring length of the same product twice and getting different values
- statistical model (page 2) - statistical formulations, or analysis which, when applied to data and found to fit, are used to verify the assumptions and parameters used in the analysis (linear model, polynomial model, two parameter); A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables
(page 64) - describes observations from an experiment
(page 84) linear model, method of least squares, quadratic model
#**Chapter 2**
#**2.1 Introduction**
- experimental error/error/statistical error (page 25) - arises from variation that is uncontrolled and generally unavoidable, implies random variable
- Simple comparative experiments (page 23) - experiments that compare two conditions/treatments/levels of a factor
- discrete random variable (page 25) - random variable with a finite or countably infinite set of all possible values
- continuous random variable (page 25) - random variable whose set of all possible values is an interval
#**2.2 Basic Statistical Concepts**
Graphs that show variability (page 25)
dot diagram - small sets, shows tendency and spread; illustrate the major features of the distribution of the data in a convenient form, can also help detect any unusual observations (outliers), or any gaps in the data set.
histogram - larger sets, tendency, spread, distribution
box plot (box and whisker)
- displays minimum, maximum, lower quartile (25th percentile), upper quartile (75th percentile), and median (50th percentile,value halfway through the ordered data set)
- middle line at median, end of box lines at 25th and 75th percentiles, and ends of whiskers at min and max
- helpful for indicating whether a distribution is skewed and whether there are any unusual observations (outliers) in the data set, very useful when large numbers of observations are involved and when two or more data sets are being compared.
#**2.3 Sampling and Sampling Distributions**
- statistic (page 28) - any function of the observations in a sample that does not contain unknown parameters (mean, variance)
- sample mean (page 29) - () estimates population mean (μ), a measure of the central tendency of a sample
- sample variance (page 29) - (S²) estimates population variance (σ²), a measure of dispersion of a sample
- sample standard deviation (page 29) - (S) estimates (σ), a measure of dispersion of a sample
#**2.4 Inferences about the Differences in Means, Randomized Designs**
(most tests in this section)
- α (page 35, 38) = Pr(Type I error) = Pr(reject H0 | H0 is true), null hypothesis is rejected when it is true
significance level, level of significance,
- β = Pr(Type II error) = Pr(fail to reject H0 | H0 is false), null hypothesis is not rejected when it is false
- SE mean (bottom of page 38) - standard error of the mean
where S is the sample standard deviation (page 29)
- P-value (page 38) - smallest level of significance that would lead to rejection of the null hypothesis; smallest level of α at which the data are significant;
- confidence coefficient (page 44) - (1-α)
Comparing the means of two samples/populations
σ1 = σ2, variances equal and known (page 45)
- two sample t-test (page 36) sample based, see table on pg 48
- two sample Z-test (page 46) population based, see table on pg 47
σ1 ≠ σ2, variances not equal but known
- modified t-test (page 45)
Comparing a single mean to a specified value (page 46)
- one sample Z-test (page 46) - population based, see table on page 47
- one sample t-test (page 47) - sample based, see table on page 48
#**2.5 Inferences about the Differences in Means, Paired Comparison Designs**
Difference between means
paired t-test (page 50) - tests if there is a difference in means between 2 treatments, confidence interval on page 52
#**2.6 Inferences About the Variances of Normal Distributions**
(page 53) is for testing whether the variance of a normal population equals some value
F0 (page 53) is for testing whether the variances of two normal populations are equal
#**Chapter 3**
#**3.2 The Analysis of Variance**
Linear statistical models (page 64)
- means model
yij = µi + εij - effects model
a = number of treatments/levels (rows)
n = number of replications of a treatment (columns)
N = a*n
#**3.3 Analysis of the Fixed Effects Model**
- Hypothesis for equality of a treatment means (ANOVA) - Are the means equal? Do treatments affect means? Do treatments make a difference?
H0= µ1=µ2=…µa
H1=µi≠µa for at least one pair (i,j)
- reject Fo if Fo>Fα,a-1,N-1
- confidence interval for Fo on page 73
yi. - sum of observations of ith level (add row)
ӯi. - average of observations of ith level (add row and divide by n)
y.. - total of all observations (add all rows and columns)
ӯ.. - average of all observations (add all rows and columns and divide by an=N)
#**3.4 Model Adequacy Checking**
- residuals/error (page 75) - (e) difference between actual value and average
eij = yij - ӯi.
- Bartlett’s test - test for equality of variance, like Fo but for variance between all treatments, doesn’t require ANOVA table but lots of calculations
Tests for equality between 2 sets of means
- Tuckey (page 93), compares between all means, uses ANOVA
- Fisher LSD (page 94), compares between all means, uses ANOVA
- Dunnet (page 96), compares means to control, uses ANOVA
#**`Which test do I use?`**
1 set of data
One sample Z-test
- page 46
- compares mean of a population (µ) to a specified value (µ0)
- variance (σ) should already be known
- see table on page 47 for different hypotheses
One sample t-test
- page 47
- compares mean of a sample (µ) to a specified value
- variance of sample (S) is used to estimate (σ)
- see table on page 48 for different hypotheses
for chapter 2
- page 53
- test variances of one population against some value (0)
- uses σ in hypothesis but S in calculation
2 sets of data
Two sample Z-test
- page 45-46
- compares means of two populations (μ1 and μ2)
- variances should be known and equal
- see table on page 47 for different hypotheses
Two sample t-test
- page 36
- compares the means of two samples (y1 and y2)
- variances should be known and equal
- need to calculate Sp using 2 S values
- v= n1 + n2 -2
- see table on page 48 for different hypotheses
Modified t-test
- page 45
- compares means of two samples
- for when σ1 ≠ σ2
- hope you don’t have to use it because CI is unknown and book only lists one set of hypotheses
Differences in means
- page 50
- tests whether two means are equal or unequal OR tests whether the difference between the two means is zero or not, depending on how you look at it
- “paired t-test”
- d = y1 - y2
- confidence interval on page 52
F0 for chapter 2
- page 53
- test variances of 2 samples
- uses σ in hypothesis but S in calculation
3+ sets of data (ANOVA)
Fo for chapter 3
- page 70
- to be used with ANOVA table
- tests whether or not all means are equal or some differ
- confidence intervals on individual treatments or between two treatments page 73
- = MS TRT/MS E
TO GET P VALUE
* must already have Fo
* find bounds on Fo in Fα table holding (a-1) and (N-a) constant
Bartlett’s test
- page 79
- doesn’t require ANOVA table
- tests whether or not all variances are equal or some differ
Ho: σ1=σ2=σ3…σa
H1: above not true for at least one σi, some variances are different
Tukey’s test
- page 93
- compares means of two individual treatments
- requires ANOVA table
Ho: μi=μj
H1: μi≠μj
Fisher LSD
- page 94
- compares means of two individual treatments
- requires ANOVA table
Ho: μi=μj
H1: μi≠μj
Dunnett test
- page 96
- compares means of two individual treatments
- usually compares all other treatments to a control
Ho: μi=μa
H1: μi≠μa
Helpful terms (not from book)
randomized complete block design is a design in which the subjects are matched according to a variable which the experimenter wishes to control. The subjects are put into groups (blocks) of the same size as the number of treatments. The members of each block are then randomly assigned to different treatment groups.
Example
A researcher is carrying out a study of the effectiveness of four different skin creams for the treatment of a certain skin disease. He has eighty subjects and plans to divide them into 4 treatment groups of twenty subjects each. Using a randomised blocks design, the subjects are assessed and put in blocks of four according to how severe their skin condition is; the four most severe cases are the first block, the next four most severe cases are the second block, and so on to the twentieth block. The four members of each block are then randomly assigned, one to each of the four treatment groups.
#**Chapter 4**
#**4.1 The Randomized Complete Block Design**
nuisance factor (page 121) - a design factor that has some effect on the response but we’re not interested in this effect. Can be unknown and uncontrollable, known and uncontrollable, or known and controllable
blocking (page 121) - used to eliminate the effect of known an controllable nuisance factors in comparisons among treatments
randomized complete block design (RCBD) (page 122) - an experimental design in which each block contains all the treatments;
Ho: μ1 = μ2 = μ3
H1: at least one μi ≠ μj
To compare treatment means (page 128) use any of the Ch. 3 methods but:
- replace n with b
- replace (N-a) with (a-1)(b-1)
Randomization (page 121) - design technique
#**4.2 The Latin Square Design**
#**4.3 The Graeco-Latin Square Design**
#**4.4 Balanced Incomplete Block Designs**