Start
Rolke Given
Troubleshooting
R COncepts
Useful Functions
Often during a session you create objects that you need only for a short time. When you no longer need them use rm to get rid of them:
x<-1:10
sum(x^2)
rm(x)
the <- is the assignment character in R, it assigns what is on the right to the symbol on the left.
Comparing population proportions
type I error
type II error
Now the first thing we need to recognize is
One of the principles of Science is that it is impossible to prove that a theory is correct but it is always possible to prove that the theory is false (a theory can be falsified)
the <- is the assignment character in R, it assigns what is on the right to the symbol on the left.
Data Types
the most basic type of data in R is a vector, a list of values.
Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type
x <- c(1.5,3.6,5.1,4.0)
“c” stands for concatenate, meaning “put together”
There are various ways to generate a vector, here are some examples:
x <- 1:10
x <- 10:1
x <- 1:202
x <- c(1:10,1:102)
Sometimes you need parentheses:
n <- 10
1:n-1
1:(n-1)
The rep (“repeat”) command is very useful:
rep(1,10)
rep(1:3,10)
rep(1:3,each=3)
To find out how many elements a vector has use the length command
length(x)
The elements of a vector are accessed with the bracket notation:
x <- 1:10*5
x[3]
x[1:3]
x[c(1,3,8)]
x[-3]
x[-c(1,2,5)]
Let’s start with
ls()
shows you a “listing” of the files (data, routines etc.)
If you have worked for a while you might have things you need to save, do that with
File > Save Workspace
If you quit the program without saving your stuff everything you did will be lost. R has a somewhat unusual file system, everything belonging to the same project (data, routines, graphs etc.) are stored in just one file, with the extension .RData.
To quit R, type
q()
or click the x in the upper right corner.
R has a nice recall feature, using the up and down arrow keys. Also, typing
history()
shows you the most recent things entered.
R is case-sensitive, so a and A are two different things.
Often during a session you create objects that you need only for a short time. When you no longer need them use rm to get rid of them:
x<-1:10
sum(x^2)
rm(x)
Instead of numbers a vector can also consist of characters (letters, numbers, symbols etc.) These are identified by quotes:
x <- c(“A”,”B”,”7”,”%”)
A vector is either numeric or character, but never both. You can turn one into the other (if possible) as follows:
x <- 1:10
as.character(x)x <- c(“1”,”5”)
as.numeric(x)
A third type of data is logical, with values either TRUE or FALSE.
x <- 1:10
x>4
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
these are often used as conditions:
x[x>4]
[1] 5 6 7 8 9 10
Data Frames
data frames are the basic format for data in R. They are essentially vectors put together as columns.
Error: could not find function __
There are a few things you should check :
Did you write the name of your function correctly? Names are case sensitive.
Did you install the package that contains the function? install.packages(“thePackage”) (this only needs to be done once)
Did you attach that package to the workspace ? require(thePackage) or library(thePackage) (this should be done every time you start a new R session)
If you’re not sure in which package that function is situated, you can do a few things.
If you’re sure you installed and attached/loaded the right package, type help.search(“some.function”) or ??some.function to get an information box that can tell you in which package it is contained.
find and getAnywhere can also be used to locate functions.
If you have no clue about the package, you can use findFn in the sos package as explained in this answer.
RSiteSearch(“some.function”) or searching with rseek are alternative ways to find the function.
to see what data sets are attached use
search()
this also shows you which libraries are attached.
data frames are the basic format for data in R. They are essentially vectors put together as columns.
ggplot
Student’s t-Test
Description
Performs one and two sample t-tests on vectors of data.
Usage
t.test(x, …)
t.test(x, y = NULL,
alternative = c(“two.sided”, “less”, “greater”),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, …)
t.test(formula, data, subset, na.action, …)
Arguments
x
a (non-empty) numeric vector of data values.
y
an optional (non-empty) numeric vector of data values.
alternative
a character string specifying the alternative hypothesis, must be one of “two.sided” (default), “greater” or “less”. You can specify just the initial letter.
mu
a number indicating the true value of the mean (or difference in means if you are performing a two sample test).
paired
a logical indicating whether you want a paired t-test.
var.equal
a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
conf.level
confidence level of the interval.
formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.
data
an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).
subset
an optional vector specifying a subset of observations to be used.
na.action
a function which indicates what should happen when the data contain NAs. Defaults to getOption(“na.action”).
…
further arguments to be passed to or from methods.
Details
The formula interface is only applicable for the 2-sample tests.
alternative = “greater” is the alternative that x has a larger mean than y.
If paired is TRUE then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is TRUE). If var.equal is TRUE then the pooled estimate of the variance is used. By default, if var.equal is FALSE then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used.
If the input data are effectively constant (compared to the larger of the two means) an error is generated.
Value
A list with class “htest” containing the following components:
statistic
the value of the t-statistic.
parameter
the degrees of freedom for the t-statistic.
p.value
the p-value for the test.
conf.int
a confidence interval for the mean appropriate to the specified alternative hypothesis.
estimate
the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.
null.value
the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test.
alternative
a character string describing the alternative hypothesis.
method
a character string indicating what type of t-test was performed.
data.name
a character string giving the name(s) of the data.
See Also
prop.test
Examples
require(graphics)
t.test(1:10, y = c(7:20)) # P = .00001855
t.test(1:10, y = c(7:20, 200)) # P = .1245 — NOT significant anymore
plot(extra ~ group, data = sleep)
with(sleep, t.test(extra[group == 1], extra[group == 2]))
t.test(extra ~ group, data = sleep)
vector, a list of values.
Say we want the numbers 1.5, 3.6, 5.1 and 4.0 in an R vector called x, then we can type
x <- c(1.5,3.6,5.1,4.0)
“c” stands for concatenate, meaning “put together”
But we are still flipping a fair coin, so we should not reject the theory at all, doing so is an error. Soon we will call this the type I error. The 27% will be called the type I error probability α.
But there is also a downside to this. Let’s select a Slightly unfair (p=0.6) coin. Now the coin is NOT fair, and we should reject the theory. But we are doing so only 46% of the time, the other 54% of the runs wrongly make the theory look ok. This mistake is called the type II error. The 54% is called the type II error probability β. The percentage of runs that correctly reject the theory is called the power of the test.
Because of this a hypothesis test is set up so the data can proof the theory to be false:
Example 1: Null Hypothesis H0: the new treatment is NOT better than the old one.
Example 2: Null Hypothesis H0: the theory of evolution is correct
Example 3: Null Hypothesis H0: the coin is fair
but NOT proving the theory is false is not the same as accepting the theory as true! That is why we say we fail to reject the null hypothesis instead of just saying we accept the null hypothesis.
Parentheses and Braces
Description
Open parenthesis, (, and open brace, {, are .Primitive functions in R.
Effectively, ( is semantically equivalent to the identity function(x) x, whereas { is slightly more interesting, see examples.
Usage
( \dots )
{ \dots }
Value
For (, the result of evaluating the argument. This has visibility set, so will auto-print if used at top-level.
For {, the result of the last expression evaluated. This has the visibility of the last evaluation.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
See Also
if, return, etc for other objects used in the R language itself.
Syntax for operator precedence.
Examples
f <- get(“(“)
e <- expression(3 + 2 * 4)
identical(f(e), e)
do <- get(“{“)
do(x <- 3, y <- 2*x-3, 6-x-y); x; y
(2+3)
{2+3; 4+5}
(invisible(2+3))
{invisible(2+3)}
ggMarginal {ggExtra} R Documentation
Add marginal density/histogram to ggplot2 scatterplots
Description
Create a ggplot2 scatterplot with marginal density plots (default) or histograms, or add the marginal plots to an existing scatterplot.
Usage
ggMarginal(p, data, x, y, type = c(“density”, “histogram”, “boxplot”),
margins = c(“both”, “x”, “y”), size = 5, …, xparams, yparams)
Arguments
p
A ggplot2 scatterplot to add marginal plots to. If p is not provided, then all of data, x, and y must be provided.
data
The data.frame to use for creating the marginal plots. Optional if p is provided and the marginal plots are reflecting the same data.
x
The name of the variable along the x axis. Optional if p is provided and the x aesthetic is set in the main plot.
y
The name of the variable along the y axis. Optional if p is provided and the y aesthetic is set in the main plot.
type
What type of marginal plot to show. One of: [density, histogram, boxplot].
margins
Along which margins to show the plots. One of: [both, x, y].
size
Integer describing the relative size of the marginal plots compared to the main plot. A size of 5 means that the main plot is 5x wider and 5x taller than the marginal plots.
…
Extra parameters to pass to the marginal plots. Any parameter that geom_line(), geom_histogram(), or geom_boxplot() accepts can be used. For example, colour = “red” can be used for any marginal plot type, and binwidth = 10 can be used for histograms.
xparams
List of extra parameters to use only for the marginal plot along the x axis.
yparams
List of extra parameters to use only for the marginal plot along the y axis.
Value
An object of class ggExtraPlot. This object can be printed to show the plots or saved using any of the typical image-saving functions (for example, using png() or pdf()).
Note
The grid and gtable packages are required for this function.
Since the size parameter is used by ggMarginal, if you want to pass a size to the marginal plots, you cannot use the … parameter. Instead, you must pass size to both xparams and yparams. For example, ggMarginal(p, size = 2) will change the size of the main vs marginal plot, while ggMarginal(p, xparams = list(size=2), yparams = list(size=2)) will make the density plot outline thicker.
See Also
Demo Shiny app
Examples
p <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_point()
ggMarginal(p)
set.seed(30)
df <- data.frame(x = rnorm(500, 50, 10), y = runif(500, 0, 50))
p2 <- ggplot2::ggplot(df, ggplot2::aes(x, y)) + ggplot2::geom_point()
ggMarginal(p2)
ggMarginal(p2, type = “histogram”)
ggMarginal(p2, margins = “x”)
ggMarginal(p2, size = 2)
ggMarginal(p2, colour = “red”)
ggMarginal(p2, colour = “red”, xparams = list(colour = “blue”, size = 3))
ggMarginal(p2, type = “histogram”, bins = 10)
ggMarginal(data = df, x = “x”, y = “y”)
set.seed(30)
df2 <- data.frame(x = c(rnorm(250, 50, 10), rnorm(250, 100, 10)),
y = runif(500, 0, 50))
p2 <- ggplot2::ggplot(df2, ggplot2::aes(x, y)) + ggplot2::geom_point()
ggMarginal(p2)
p2 <- p2 + ggplot2::ggtitle(“Random data”) + ggplot2::theme_bw(30)
ggMarginal(p2)
p3 <- ggplot2::ggplot(df2, ggplot2::aes(log(x), y - 500)) + ggplot2::geom_point()
ggMarginal(p3)
p4 <- p3 + ggplot2::scale_x_continuous(limits = c(2, 6)) + ggplot2::theme_bw(50)
ggMarginal(p4)
The elements of a vector are accessed with the bracket notation:
x <- 1:10*5
x[3]
x[1:3]
x[c(1,3,8)]
x[-3]
x[-c(1,2,5)]
To find out how many elements a vector has use the length command
length(x)
x <- 1:10
2*x
x^2
log(x)
sum(x)
y <- 21:30
x+y
x^2+y^2
mean(x+y)
hjk
ggMarginal(data = df, x = “x”, y = “y”)