Daniel Vaulot
2020-01-19
Who has used R before ?
What other programming language have you used before ?
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Your turn...
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Your turn...
Warning
Experimental design course
print("Hello world")
[1] "Hello world"
print("Hello world")
[1] "Hello world"
Type directly in command window
Create a new script
Type in script window, select and execute (CTRL-R)
> x <- 1> y <- 2> x + y
[1] 3
> x <- 1> y <- 2> x + y
[1] 3
> z <- x + y> z
[1] 3
= can be used instead of <- but refrain from it (not good style)
> z = x + y
= can be used instead of <- but refrain from it (not good style)
> z = x + y
You can view the values of the objects in R-studio environment window (top-right)
> Z
> Z
> Z
Error in eval(expr, envir, enclos): objet 'Z' introuvable
Myvariable
, Myvariable1
, Myvariable.1
,Myvariable-01
are OK1Myvariable
, My-variable
, Myvariable@
are not OKFive conventions
Prefer third one, much more easy to read
character: "Daniel", "This is a course in R", 'Donald'
numeric: 2, 15.5, 10e-3
integer: 2L (the L tells R to store this as an integer)
date: 2018-02-25
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)
character: "Daniel", "This is a course in R", 'Donald'
numeric: 2, 15.5, 10e-3
integer: 2L (the L tells R to store this as an integer)
date: 2018-02-25
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)
No data "NA"
Not a number "NaN" (e.g. division by zero)
Vector
List
Matrix
Data frames
Function
The basic R structure is a vector: [102030]
The basic R structure is a vector: [102030]
A vector can contain only a single element [10]
The basic R structure is a vector: [102030]
A vector can contain only a single element [10]
x <- 10x
[1] 10
x <- c(10, 20, 30)x
[1] 10 20 30
x <- c(10, 20, 30)x
[1] 10 20 30
x <- 10:30x
[1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
PoTU <- c("Donald", "Trump")PoTU
[1] "Donald" "Trump"
flags <- c(TRUE, FALSE, TRUE)flags
[1] TRUE FALSE TRUE
x[1]
[1] 10
x[1]
[1] 10
x[1:5]
[1] 10 11 12 13 14
x[1]
[1] 10
x[1:5]
[1] 10 11 12 13 14
x[-1]
[1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Apply functions (we will come back to functions latter)
typeof(x)length(x)
Apply functions (we will come back to functions latter)
typeof(x)length(x)
[1] "integer"
[1] 21
Apply functions (we will come back to functions latter)
typeof(x)length(x)
[1] "integer"
[1] 21
What is the type and length of PoTU ?
Operator | Description |
---|---|
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
^ or ** | exponentiation |
x %% y | modulus (x mod y) 5%%2 is 1 |
x %/% y | integer division 5%/%2 is 2 |
We are performing vector operations !
[123..]+[123..]=[246..]
Vector one element
x <- 1 y <- 2 z <- x + y z
[1] 3
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Use the other operators
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
[1] 2 3 4 5 6 7 8 9 10
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
[1] 2 3 4 5 6 7 8 9 10
Equivalent to
y <- c(1, 1, 1, 1, 1, 1, 1, 1, 1)
The recycling rule...
x <- TRUEy <- FALSEz <- x + yz
x <- TRUEy <- FALSEz <- x + yz
[1] 1
No error but...
The resulting variable is transformed to a numeric
How you would show that ?
No error but...
The resulting variable is transformed to a numeric
How you would show that ?
typeof(x)
[1] "logical"
typeof(z)
[1] "integer"
Operator | Description |
---|---|
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | exactly equal to |
!= | not equal to |
!x | Not x |
x | y | x OR y |
x & y | x AND y |
isTRUE(x) | test if X is TRUE |
x <- TRUEy <- FALSEz1 <- x | yz2 <- x == y
x <- TRUEy <- FALSEz1 <- x | yz2 <- x == y
[1] TRUE
[1] FALSE
Do not mix
first <- "Donald"last <- "Trump"full <- first + last
first <- "Donald"last <- "Trump"full <- first + last
Generates an error
Error in first + last: argument non numérique pour un opérateur binaire
first <- "Donald"last <- "Trump"full <- first + last
Generates an error
Error in first + last: argument non numérique pour un opérateur binaire
What can we do ?
Function perform specific task on objects
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste0() # Do not forget the parenthesis
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste0() # Do not forget the parenthesis
What happened ?
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste0() # Do not forget the parenthesis
What happened ?
We would like to write "Donald Trump" but we have :
paste0(first, last)
[1] "DonaldTrump"
Can you read the help and suggest a change in the way we call the function ?
We would like to write "Donald Trump" but we have :
paste0(first, last)
[1] "DonaldTrump"
Can you read the help and suggest a change in the way we call the function ?
paste(first, last)
[1] "Donald Trump"
my_sum <- function(a, b) { c <- a + b return(c)}
my_sum <- function(a, b) { c <- a + b return(c)}
my_sum(10, 20)
[1] 30
If you write 3 times the same piece of code write a function...
Most of the time you do not have to write functions because someone has already written one for what you want to do...
x <- 1:100sum(x)
[1] 5050
Most of the time you do not have to write functions because someone has already written one for what you want to do...
x <- 1:100sum(x)
[1] 5050
y <- rnorm(10, mean = 0, sd = 1)y
[1] -1.13613929 0.63692448 -0.07460907 0.72597733 -1.24725670 0.07377771 [7] 0.40098213 0.74694020 -0.12867386 0.45170224
mean(y)
[1] 0.04496252
sd(y)
[1] 0.7233617
mean(y)
[1] 0.04496252
sd(y)
[1] 0.7233617
Sample more points... 10,000 instead of 100
y <- rnorm(10000, mean = 0, sd = 1)mean(y)
[1] 0.01063662
sd(y)
[1] 0.9987526
Histogram
library(graphics)hist(y)
Packages are set of functions that have a common goal
They are really the strength of R
And these are only the "official"" packages. You can find more on GitHub
Download on your computer the package you need
Install package stringr (to manipulate strings of characters)
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
library(stringr)str_c(first, last, sep = " ")
[1] "Donald Trump"
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
library(stringr)str_c(first, last, sep = " ")
[1] "Donald Trump"
Sometimes functions from different libraries have similar names
List
Matrix
Factors
Data frames
df <- data.frame(label = letters[1:6], id = 1:6, value = rnorm(6, mean = 0, sd = 1), flag = c(TRUE, FALSE), stringsAsFactors = FALSE)df
label id value flag1 a 1 0.1176364 TRUE2 b 2 -1.1298893 FALSE3 c 3 0.5195048 TRUE4 d 4 -0.7246693 FALSE5 e 5 0.5171964 TRUE6 f 6 1.3644754 FALSE
stringsAsFactors = FALSE
dim(df) # returns the dimensions of data frame
[1] 6 4
nrow(df) # number of rows
[1] 6
ncol(df) # number of columns
[1] 4
str(df) # structure of data frame - name, type and preview of data in each column
'data.frame': 6 obs. of 4 variables: $ label: chr "a" "b" "c" "d" ... $ id : int 1 2 3 4 5 6 $ value: num 0.118 -1.13 0.52 -0.725 0.517 ... $ flag : logi TRUE FALSE TRUE FALSE TRUE FALSE
colnames(df) # columns names
[1] "label" "id" "value" "flag"
$
notationdf$value
[1] 0.1176364 -1.1298893 0.5195048 -0.7246693 0.5171964 1.3644754
$
notationdf$value
[1] 0.1176364 -1.1298893 0.5195048 -0.7246693 0.5171964 1.3644754
df[i,j]
notationdf[, 3]
[1] 0.1176364 -1.1298893 0.5195048 -0.7246693 0.5171964 1.3644754
df[, "value"]
[1] 0.1176364 -1.1298893 0.5195048 -0.7246693 0.5171964 1.3644754
$
for the column, [i]
for the rowdf$label[5]
[1] "e"
df$label[1:5]
[1] "a" "b" "c" "d" "e"
$
for the column, [i]
for the rowdf$label[5]
[1] "e"
df$label[1:5]
[1] "a" "b" "c" "d" "e"
df[i,j]
notation, first index corresponds to row, second index to columndf[5, 1]
[1] "e"
df[1:5, "value"]
[1] 0.1176364 -1.1298893 0.5195048 -0.7246693 0.5171964
df[df$id <= 3, ]
label id value flag1 a 1 0.1176364 TRUE2 b 2 -1.1298893 FALSE3 c 3 0.5195048 TRUE
Select lines for which the label is c
df[df$id <= 3, ]
label id value flag1 a 1 0.1176364 TRUE2 b 2 -1.1298893 FALSE3 c 3 0.5195048 TRUE
Select lines for which the label is c
df[df$label == "c", ]
label id value flag3 c 3 0.5195048 TRUE
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |