to use this document yourself.
• Start

• ## Hypothesis Testing

• Rolke Given

• Troubleshooting

• R COncepts

• Useful Functions

• # Help

• Often during a session you create objects that you need only for a short time. When you no longer need them use rm to get rid of them:

x<-1:10
sum(x^2)
rm(x)

the <- is the assignment character in R, it assigns what is on the right to the symbol on the left.

• ## Data Types

• Comparing population proportions

#eSx

• type I error

• type II error

• Now the first thing we need to recognize is

One of the principles of Science is that it is impossible to prove that a theory is correct but it is always possible to prove that the theory is false (a theory can be falsified)

• the <- is the assignment character in R, it assigns what is on the right to the symbol on the left.

Data Types

the most basic type of data in R is a vector, a list of values.

Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type

x <- c(1.5,3.6,5.1,4.0)

“c” stands for concatenate, meaning “put together”

There are various ways to generate a vector, here are some examples:

x <- 1:10
x <- 10:1
x <- 1:202
x <- c(1:10,1:10
2)

Sometimes you need parentheses:

n <- 10
1:n-1
1:(n-1)

The rep (“repeat”) command is very useful:

rep(1,10)
rep(1:3,10)
rep(1:3,each=3)

To find out how many elements a vector has use the length command

length(x)

The elements of a vector are accessed with the bracket notation:

x <- 1:10*5
x
x[1:3]
x[c(1,3,8)]
x[-3]
x[-c(1,2,5)]

ls()

shows you a “listing” of the files (data, routines etc.)

If you have worked for a while you might have things you need to save, do that with

File > Save Workspace

If you quit the program without saving your stuff everything you did will be lost. R has a somewhat unusual file system, everything belonging to the same project (data, routines, graphs etc.) are stored in just one file, with the extension .RData.

To quit R, type

q()

or click the x in the upper right corner.

R has a nice recall feature, using the up and down arrow keys. Also, typing

history()

shows you the most recent things entered.

R is case-sensitive, so a and A are two different things.

Often during a session you create objects that you need only for a short time. When you no longer need them use rm to get rid of them:

x<-1:10
sum(x^2)
rm(x)

Instead of numbers a vector can also consist of characters (letters, numbers, symbols etc.) These are identified by quotes:

x <- c(“A”,”B”,”7”,”%”)

A vector is either numeric or character, but never both. You can turn one into the other (if possible) as follows:

x <- 1:10
as.character(x)

x <- c(“1”,”5”)
as.numeric(x)

A third type of data is logical, with values either TRUE or FALSE.

x <- 1:10
x>4
 FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE

these are often used as conditions:

x[x>4]
 5 6 7 8 9 10

Data Frames

data frames are the basic format for data in R. They are essentially vectors put together as columns.

• Error: could not find function __

There are a few things you should check :

Did you write the name of your function correctly? Names are case sensitive.
Did you install the package that contains the function? install.packages(“thePackage”) (this only needs to be done once)
Did you attach that package to the workspace ? require(thePackage) or library(thePackage) (this should be done every time you start a new R session)
If you’re not sure in which package that function is situated, you can do a few things.

If you’re sure you installed and attached/loaded the right package, type help.search(“some.function”) or ??some.function to get an information box that can tell you in which package it is contained.
find and getAnywhere can also be used to locate functions.
If you have no clue about the package, you can use findFn in the sos package as explained in this answer.
RSiteSearch(“some.function”) or searching with rseek are alternative ways to find the function.

• to see what data sets are attached use

search()

this also shows you which libraries are attached.

• ## Data Frames

data frames are the basic format for data in R. They are essentially vectors put together as columns.

• ggplot

• Student’s t-Test
Description

Performs one and two sample t-tests on vectors of data.

Usage

t.test(x, …)

## Default S3 method:

t.test(x, y = NULL,
alternative = c(“two.sided”, “less”, “greater”),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, …)

## S3 method for class ‘formula’

t.test(formula, data, subset, na.action, …)
Arguments

x
a (non-empty) numeric vector of data values.
y
an optional (non-empty) numeric vector of data values.
alternative
a character string specifying the alternative hypothesis, must be one of “two.sided” (default), “greater” or “less”. You can specify just the initial letter.
mu
a number indicating the true value of the mean (or difference in means if you are performing a two sample test).
paired
a logical indicating whether you want a paired t-test.
var.equal
a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
conf.level
confidence level of the interval.
formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.
data
an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).
subset
an optional vector specifying a subset of observations to be used.
na.action
a function which indicates what should happen when the data contain NAs. Defaults to getOption(“na.action”).

further arguments to be passed to or from methods.
Details

The formula interface is only applicable for the 2-sample tests.

alternative = “greater” is the alternative that x has a larger mean than y.

If paired is TRUE then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is TRUE). If var.equal is TRUE then the pooled estimate of the variance is used. By default, if var.equal is FALSE then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used.

If the input data are effectively constant (compared to the larger of the two means) an error is generated.

Value

A list with class “htest” containing the following components:

statistic
the value of the t-statistic.
parameter
the degrees of freedom for the t-statistic.
p.value
the p-value for the test.
conf.int
a confidence interval for the mean appropriate to the specified alternative hypothesis.
estimate
the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.
null.value
the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test.
alternative
a character string describing the alternative hypothesis.
method
a character string indicating what type of t-test was performed.
data.name
a character string giving the name(s) of the data.

prop.test

Examples

require(graphics)

t.test(1:10, y = c(7:20)) # P = .00001855
t.test(1:10, y = c(7:20, 200)) # P = .1245 — NOT significant anymore

## Classical example: Student’s sleep data

plot(extra ~ group, data = sleep)

with(sleep, t.test(extra[group == 1], extra[group == 2]))

## Formula interface

t.test(extra ~ group, data = sleep)

• vector, a list of values.

Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type

x <- c(1.5,3.6,5.1,4.0)

“c” stands for concatenate, meaning “put together”

• But we are still flipping a fair coin, so we should not reject the theory at all, doing so is an error. Soon we will call this the type I error. The 27% will be called the type I error probability α.

• But there is also a downside to this. Let’s select a Slightly unfair (p=0.6) coin. Now the coin is NOT fair, and we should reject the theory. But we are doing so only 46% of the time, the other 54% of the runs wrongly make the theory look ok. This mistake is called the type II error. The 54% is called the type II error probability β. The percentage of runs that correctly reject the theory is called the power of the test.

• Because of this a hypothesis test is set up so the data can proof the theory to be false:

Example 1: Null Hypothesis H0: the new treatment is NOT better than the old one.

Example 2: Null Hypothesis H0: the theory of evolution is correct

Example 3: Null Hypothesis H0: the coin is fair

but NOT proving the theory is false is not the same as accepting the theory as true! That is why we say we fail to reject the null hypothesis instead of just saying we accept the null hypothesis.

• ## Paren {base} R Documentation

Parentheses and Braces

Description

Open parenthesis, (, and open brace, {, are .Primitive functions in R.

Effectively, ( is semantically equivalent to the identity function(x) x, whereas { is slightly more interesting, see examples.

Usage

( \dots )

{ \dots }
Value

For (, the result of evaluating the argument. This has visibility set, so will auto-print if used at top-level.

For {, the result of the last expression evaluated. This has the visibility of the last evaluation.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

if, return, etc for other objects used in the R language itself.

Syntax for operator precedence.

Examples

f <- get(“(“)
e <- expression(3 + 2 * 4)
identical(f(e), e)

do <- get(“{“)
do(x <- 3, y <- 2*x-3, 6-x-y); x; y

## note the differences

(2+3)
{2+3; 4+5}
(invisible(2+3))
{invisible(2+3)}

• ggMarginal {ggExtra} R Documentation
Add marginal density/histogram to ggplot2 scatterplots

Description

Create a ggplot2 scatterplot with marginal density plots (default) or histograms, or add the marginal plots to an existing scatterplot.

Usage

ggMarginal(p, data, x, y, type = c(“density”, “histogram”, “boxplot”),
margins = c(“both”, “x”, “y”), size = 5, …, xparams, yparams)
Arguments

p
A ggplot2 scatterplot to add marginal plots to. If p is not provided, then all of data, x, and y must be provided.
data
The data.frame to use for creating the marginal plots. Optional if p is provided and the marginal plots are reflecting the same data.
x
The name of the variable along the x axis. Optional if p is provided and the x aesthetic is set in the main plot.
y
The name of the variable along the y axis. Optional if p is provided and the y aesthetic is set in the main plot.
type
What type of marginal plot to show. One of: [density, histogram, boxplot].
margins
Along which margins to show the plots. One of: [both, x, y].
size
Integer describing the relative size of the marginal plots compared to the main plot. A size of 5 means that the main plot is 5x wider and 5x taller than the marginal plots.

Extra parameters to pass to the marginal plots. Any parameter that geom_line(), geom_histogram(), or geom_boxplot() accepts can be used. For example, colour = “red” can be used for any marginal plot type, and binwidth = 10 can be used for histograms.
xparams
List of extra parameters to use only for the marginal plot along the x axis.
yparams
List of extra parameters to use only for the marginal plot along the y axis.
Value

An object of class ggExtraPlot. This object can be printed to show the plots or saved using any of the typical image-saving functions (for example, using png() or pdf()).

Note

The grid and gtable packages are required for this function.

Since the size parameter is used by ggMarginal, if you want to pass a size to the marginal plots, you cannot use the … parameter. Instead, you must pass size to both xparams and yparams. For example, ggMarginal(p, size = 2) will change the size of the main vs marginal plot, while ggMarginal(p, xparams = list(size=2), yparams = list(size=2)) will make the density plot outline thicker.

Demo Shiny app

Examples

# basic usage

p <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_point()
ggMarginal(p)

# using some parameters

set.seed(30)
df <- data.frame(x = rnorm(500, 50, 10), y = runif(500, 0, 50))
p2 <- ggplot2::ggplot(df, ggplot2::aes(x, y)) + ggplot2::geom_point()
ggMarginal(p2)
ggMarginal(p2, type = “histogram”)
ggMarginal(p2, margins = “x”)
ggMarginal(p2, size = 2)
ggMarginal(p2, colour = “red”)
ggMarginal(p2, colour = “red”, xparams = list(colour = “blue”, size = 3))
ggMarginal(p2, type = “histogram”, bins = 10)

# specifying the data directly instead of providing a plot

ggMarginal(data = df, x = “x”, y = “y”)

# the main plot axis/margins/size/etc are changed

set.seed(30)
df2 <- data.frame(x = c(rnorm(250, 50, 10), rnorm(250, 100, 10)),
y = runif(500, 0, 50))
p2 <- ggplot2::ggplot(df2, ggplot2::aes(x, y)) + ggplot2::geom_point()
ggMarginal(p2)

p2 <- p2 + ggplot2::ggtitle(“Random data”) + ggplot2::theme_bw(30)
ggMarginal(p2)

p3 <- ggplot2::ggplot(df2, ggplot2::aes(log(x), y - 500)) + ggplot2::geom_point()
ggMarginal(p3)

p4 <- p3 + ggplot2::scale_x_continuous(limits = c(2, 6)) + ggplot2::theme_bw(50)
ggMarginal(p4)

• The elements of a vector are accessed with the bracket notation:

x <- 1:10*5
x
x[1:3]
x[c(1,3,8)]
x[-3]
x[-c(1,2,5)]

To find out how many elements a vector has use the length command

length(x)

x <- 1:10

2*x

x^2

log(x)

sum(x)

y <- 21:30

x+y

x^2+y^2

mean(x+y)

• hjk

• # specifying the data directly instead of providing a plot

ggMarginal(data = df, x = “x”, y = “y”)