Daniel Vaulot
2019-01-17
Who has used R before ?
What other programming language have you used before ?
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Your turn...
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Your turn...
Warning
Mid 1970s - S Language for Statistical Computing conceived by John Chambers, Rick Becker, Trevor Hastie, Allan Wilks and others at Bell Labs
Early 1990's - R was first implemented in the early 1990’s by Robert Gentleman and Ross Ihaka, both faculty members at the University of Auckland.
1995 - Open Source Project
1997 - Managed by the R Core Group
2000 - First release of R
2011 - First release of R studio
print("Hello world")
[1] "Hello world"
print("Hello world")
[1] "Hello world"
Type directly in command window
Create a new script
Type in script window, select and execute (CTRL-R)
> x <- 1> y <- 2> x + y
[1] 3
> x <- 1> y <- 2> x + y
[1] 3
> z <- x + y> z
[1] 3
= can be used instead of <- but refrain from it (not good style)
> z = x + y
= can be used instead of <- but refrain from it (not good style)
> z = x + y
You can view the values of the objects in R-studio environment window (top-right)
> Z
> Z
> Z
Error in eval(expr, envir, enclos): object 'Z' not found
Myvariable
, Myvariable1
, Myvariable.1
,Myvariable-01
are OK1Myvariable
, My-variable
, Myvariable@
are not OKFive conventions
Prefer third one, much more easy to read
character: "Daniel", "This is a course in R", 'Donald'
numeric: 2, 15.5, 10e-3
integer: 2L (the L tells R to store this as an integer)
date: 2018-02-25
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)
character: "Daniel", "This is a course in R", 'Donald'
numeric: 2, 15.5, 10e-3
integer: 2L (the L tells R to store this as an integer)
date: 2018-02-25
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)
No data "NA"
Not a number "NaN" (e.g. division by zero)
Vector
List
Matrix
Data frames
Function
The basic R structure is a vector: [102030]
The basic R structure is a vector: [102030]
A vector can with a single element only [10]
The basic R structure is a vector: [102030]
A vector can with a single element only [10]
x <- 10x
[1] 10
x <- c(10, 20, 30)x
[1] 10 20 30
x <- c(10, 20, 30)x
[1] 10 20 30
x <- 10:30x
[1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
PoTU <- c("Donald", "Trump")PoTU
[1] "Donald" "Trump"
flags <- c(TRUE, FALSE, TRUE)flags
[1] TRUE FALSE TRUE
x[1]
[1] 10
x[1]
[1] 10
x[1:5]
[1] 10 11 12 13 14
x[1]
[1] 10
x[1:5]
[1] 10 11 12 13 14
x[-1]
[1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Apply functions (we will come back to functions latter)
typeof(x)length(x)
Apply functions (we will come back to functions latter)
typeof(x)length(x)
[1] "integer"
[1] 21
Apply functions (we will come back to functions latter)
typeof(x)length(x)
[1] "integer"
[1] 21
What is the type and length of PoTU ?
Operator | Description |
---|---|
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
^ or ** | exponentiation |
x %% y | modulus (x mod y) 5%%2 is 1 |
x %/% y | integer division 5%/%2 is 2 |
We are performing vector operations !
[123..]+[123..]=[246..]
Vector one element
x <- 1 y <- 2 z <- x + y z
[1] 3
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Use the other operators
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
[1] 2 3 4 5 6 7 8 9 10
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
[1] 2 3 4 5 6 7 8 9 10
Equivalent to
y <- c(1, 1, 1, 1, 1, 1, 1, 1, 1)
The recycling rule...
x <- TRUEy <- FALSEz <- x + yz
x <- TRUEy <- FALSEz <- x + yz
[1] 1
It does not give an error but...
The resulting variable is transformed to a numeric
How you would show that ?
It does not give an error but...
The resulting variable is transformed to a numeric
How you would show that ?
typeof(x)
[1] "logical"
typeof(z)
[1] "integer"
Operator | Description |
---|---|
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | exactly equal to |
!= | not equal to |
!x | Not x |
x | y | x OR y |
x & y | x AND y |
isTRUE(x) | test if X is TRUE |
x <- TRUEy <- FALSEz1 <- x | yz2 <- x == y
x <- TRUEy <- FALSEz1 <- x | yz2 <- x == y
[1] TRUE
[1] FALSE
Do not mix
first <- "Donald"last <- "Trump"full <- first + last
first <- "Donald"last <- "Trump"full <- first + last
Generates an error
Error in first + last: non-numeric argument to binary operator
first <- "Donald"last <- "Trump"full <- first + last
Generates an error
Error in first + last: non-numeric argument to binary operator
What can we do ?
Function perform specific task on objects
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste0() # Do not forget the parenthesis
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste0() # Do not forget the parenthesis
What happened ?
Function perform specific task on objects
paste0(first, last)
[1] "DonaldTrump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste0() # Do not forget the parenthesis
What happened ?
We would like to write "Donald Trump" but we have :
paste0(first, last)
[1] "DonaldTrump"
Can you read the help and suggest a change in the way we call the function ?
We would like to write "Donald Trump" but we have :
paste0(first, last)
[1] "DonaldTrump"
Can you read the help and suggest a change in the way we call the function ?
paste(first, last)
[1] "Donald Trump"
my_sum <- function(a, b) { c <- a + b return(c)}
my_sum <- function(a, b) { c <- a + b return(c)}
my_sum(10, 20)
[1] 30
If you write 3 times the same piece of code write a function...
End of lecture one
Most of the time you do not have to write functions because someone has already written one for what you want to do...
x <- 1:100sum(x)
[1] 5050
Most of the time you do not have to write functions because someone has already written one for what you want to do...
x <- 1:100sum(x)
[1] 5050
y <- rnorm(100, mean = 0, sd = 1)y[1:10]
[1] 0.4885882 -0.6260146 -0.8855401 -1.2341267 0.3726551 0.8956950 [7] 0.9124247 0.1755346 0.4628793 -1.5012981
mean(y)
[1] 0.01007783
sd(y)
[1] 0.8875528
mean(y)
[1] 0.01007783
sd(y)
[1] 0.8875528
Sample more points... 10,000 instead of 100
y <- rnorm(10000, mean = 0, sd = 1)mean(y)
[1] 0.01251073
sd(y)
[1] 1.009886
Histogram
library(graphics)hist(y)
Packages are set of functions that have a common goal
They are really the strength of R
And these are only the "official"" packages. You can find more on GitHub
Download on your computer the package you need
Install package stringr (to manipulate strings of characters)
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
library(stringr)str_c(first, last, sep = " ")
[1] "Donald Trump"
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
library(stringr)str_c(first, last, sep = " ")
[1] "Donald Trump"
Sometimes functions from different libraries have similar names
List
Matrix
Factors
Data frames
df <- data.frame(label = letters[1:6], id = 1:6, value = rnorm(6, mean = 0, sd = 1), flag = c(TRUE, FALSE), stringsAsFactors = FALSE)df
label id value flag1 a 1 0.8002749 TRUE2 b 2 -0.1723698 FALSE3 c 3 1.0188527 TRUE4 d 4 -1.4748408 FALSE5 e 5 0.5381787 TRUE6 f 6 0.8350807 FALSE
stringsAsFactors = FALSE
dim(df) # returns the dimensions of data frame
[1] 6 4
nrow(df) # number of rows
[1] 6
ncol(df) # number of columns
[1] 4
str(df) # structure of data frame - name, type and preview of data in each column
'data.frame': 6 obs. of 4 variables: $ label: chr "a" "b" "c" "d" ... $ id : int 1 2 3 4 5 6 $ value: num 0.8 -0.172 1.019 -1.475 0.538 ... $ flag : logi TRUE FALSE TRUE FALSE TRUE FALSE
colnames(df) # columns names
[1] "label" "id" "value" "flag"
$
notationdf$value
[1] 0.8002749 -0.1723698 1.0188527 -1.4748408 0.5381787 0.8350807
$
notationdf$value
[1] 0.8002749 -0.1723698 1.0188527 -1.4748408 0.5381787 0.8350807
df[i,j]
notationdf[, 3]
[1] 0.8002749 -0.1723698 1.0188527 -1.4748408 0.5381787 0.8350807
df[, "value"]
[1] 0.8002749 -0.1723698 1.0188527 -1.4748408 0.5381787 0.8350807
$
notationdf$label[5]
[1] "e"
df$label[1:5]
[1] "a" "b" "c" "d" "e"
$
notationdf$label[5]
[1] "e"
df$label[1:5]
[1] "a" "b" "c" "d" "e"
df[i,j]
notationdf[5, 1]
[1] "e"
df[1:5, "value"]
[1] 0.8002749 -0.1723698 1.0188527 -1.4748408 0.5381787
df[df$id <= 3, ]
label id value flag1 a 1 0.8002749 TRUE2 b 2 -0.1723698 FALSE3 c 3 1.0188527 TRUE
Select lines for which the label is c
df[df$id <= 3, ]
label id value flag1 a 1 0.8002749 TRUE2 b 2 -0.1723698 FALSE3 c 3 1.0188527 TRUE
Select lines for which the label is c
df[df$label == 3, ]
[1] label id value flag <0 lignes> (ou 'row.names' de longueur nulle)
What you will learn :
Please install the following packages and their dependencies
Read the installation instruction : https://bookdown.org/yihui/rmarkdown/installation.html
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |