Metabarcoding Singapore
Phyloseq
Metabarcoding Singapore
Phyloseq
- 1 Aim
- 2 Initialize
- 3 Read the data
- 4 Map
- 5 Environmental data
- 6 Phyloseq analysis
- 6.1 Create phyloseq files for all taxa after filtering the data
- 6.2 Break up into photosynthetic and non-photosynthetic
- 6.3 Normalize number of reads in each sample using median sequencing depth.
- 6.4 Phyloseq files for abundant taxa
- 6.5 Create a list for the auto and hetero phyloseq files
- 6.6 Create tabular files for other plots
- 6.7 Treemaps at division and class levels
- 6.8 Most abundant taxa
- 6.9 Most abundant OTUs
- 6.10 Bar plots
- 6.11 Synthetic comparisons
- 6.12 Time series
- 6.13 Main species for each division (Eukaryotes - Autrotrophs)
- 6.14 Heatmaps
- 6.14.1 Abundant OTUs
- 6.14.2 Chlorophyta at species level
- 6.14.3 Mamiello (Only Bathy, Ostreo and Micromonas) at genus level
- 6.14.4 Mamiello (Only Bathy, Ostreo and Micromonas) at species level
- 6.14.5 Syndiniales at order level
- 6.14.6 FIG - Most abundant OTUs in Johor vs Singapre and Prokaryotes vs Eukaryotes
- 6.15 NMDS
- 6.16 Network analysis
- 6.17 Differential expression (DESeq2)
1 Aim
Create Phyloseq file from samples, metadata and OTU table (read from Excel file)
2 Initialize
This file defines all the necessary libraries and variables
3 Read the data
3.1 File names
3.2 Read the files
- The dada2 treatment has already removed the forward and reverse primers, so no need to remove them
- Work with the unrarefied data
3.2.1 Metadata and sample files
# Read the sample and metadata tables
sample_table <- read_excel(metadata_xlsx, sheet = "samples", range = "A1:D89")
metadata_table <- read_csv(metadata_csv, na = c("ND", "")) %>% dplyr::rename(sample_code = Sample,
day_number = Day_number, date = Date, location = Location, monsoon = `Monsoon period`) %>%
select(-Strait) %>% mutate(date = lubridate::parse_date_time(date, "dmy"),
monsoon = forcats::fct_relevel(monsoon, "NE", "IM-1", "SW", "IM-2"))
station_table <- read_excel(metadata_xlsx, sheet = "stations", na = c("ND",
""))
sample_table <- sample_table %>% left_join(metadata_table) %>% left_join(station_table) %>%
mutate(sample_label = str_c(location_label, date, monsoon, sep = "_"))
3.2.2 OTU table
NOTE: The OTU table is read from Excel file
The following editing has been done * Nanochloris changed to Picochlorum based on BLAST table * Ostreococcus sp. changed to O. lucimarinus based on the BLAST
4 Map
4.1 Leaflet map
- Visualize different layers: https://leaflet-extras.github.io/leaflet-providers/preview/
lng_center = mean(station_table$longitude)
lat_center = mean(station_table$latitude)
map <- leaflet(width = 1000, height = 1000) %>%
addTiles() %>% # Default
# addTiles(urlTemplate = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}' ) %>% # Satellite
# addTiles(urlTemplate = 'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}' ) %>% # Grey background
setView(lng=lng_center, lat=lat_center, zoom=11) %>%
addCircleMarkers(data = station_table, lat = ~ latitude, lng = ~ longitude,
radius = 5,
label = ~ location,
labelOptions = labelOptions(textsize = "10px", noHide = T),
clusterOptions = markerClusterOptions())
map
5 Environmental data
5.1 Per station
5.1.1 Temp
p <- ggplot(filter(sample_table, location_label != "BL")) + geom_line(aes(x = date,
y = Temperature), na.rm = TRUE) + geom_point(aes(x = date, y = Temperature,
color = monsoon), size = 5) + facet_grid(rows = vars(location)) + ylim(26,
34) + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01",
"2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p
### Salinity
p <- ggplot(filter(sample_table, location_label != "BL")) + geom_line(aes(x = date,
y = Salinity), na.rm = TRUE) + geom_point(aes(x = date, y = Salinity, color = monsoon),
size = 5) + facet_grid(rows = vars(location)) + ylim(15, 35) + xlim(as.POSIXct(as.Date(c("2017-01-01",
"2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01",
"2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p
5.1.2 Chlorophyll
p <- ggplot(filter(sample_table, location_label != "BL")) + geom_line(aes(x = date,
y = Chl), na.rm = TRUE) + geom_point(aes(x = date, y = Chl, color = monsoon),
size = 5) + facet_grid(rows = vars(location)) + xlim(as.POSIXct(as.Date(c("2017-01-01",
"2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01",
"2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p
5.1.3 Rain over 7 days
p <- ggplot(filter(sample_table, location_label != "BL")) + geom_col(aes(x = date,
y = Rain7, fill = monsoon)) + facet_grid(rows = vars(location)) + xlim(as.POSIXct(as.Date(c("2017-01-01",
"2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01",
"2018-01-01", "2019-01-01")))) + scale_fill_viridis(discrete = TRUE)
p
5.2 Average
5.2.1 Rain over 7 days
rain <- sample_table %>% group_by(date, monsoon) %>% summarise(Rain7_mean = mean(Rain7,
na.rm = TRUE))
p <- ggplot(rain) + geom_point(aes(x = date, y = Rain7_mean, color = monsoon),
size = 5) + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01",
"2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p
6 Phyloseq analysis
6.1 Create phyloseq files for all taxa after filtering the data
Filter the euk data to remove the low bootstraps values (threshold : bootstrap > 90% at the supergroup level) and create a phyloseq file
Note the bootstrap threshold 90% for ASVs
6.1.1 Filter the ASV table to remove low bootstraps
6.1.2 Create the phyloseq file
- Remove sequences with bad quality
- Remove Metazoa
- Remove Chloroplast
otu_mat <- otu_table_ps %>% select(otu = otu_id, matches("EC|PR|RM|SBW|STJ"),
-species, -species_boot)
tax_mat <- otu_table_ps %>% select(otu = otu_id, kingdom:species)
samples_df <- sample_table %>% rename(sample = sample_id)
row.names(otu_mat) <- otu_mat$otu
otu_mat <- otu_mat %>% select(-otu)
row.names(tax_mat) <- tax_mat$otu
tax_mat <- tax_mat %>% select(-otu)
row.names(samples_df) <- samples_df$sample
samples_df <- samples_df %>% select(-sample)
otu_mat <- as.matrix(otu_mat)
tax_mat <- as.matrix(tax_mat)
OTU = otu_table(otu_mat, taxa_are_rows = TRUE)
TAX = tax_table(tax_mat)
samples = sample_data(samples_df)
ps_all_raw <- phyloseq(OTU, TAX, samples)
# Filtering
ps_all_raw <- subset_samples(ps_all_raw, sequence_quality == "good")
ps_all_raw <- subset_taxa(ps_all_raw, !(division %in% c("Metazoa")))
ps_all_raw <- subset_taxa(ps_all_raw, !(kingdom %in% c("Eukaryota_plastid")))
6.2 Break up into photosynthetic and non-photosynthetic
Phyloseq All
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 2571 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 2571 taxa by 8 taxonomic ranks ]
ps_euk <- subset_taxa(ps_all, (kingdom %in% c("Eukaryota")))
cat("\nPhyloseq Eukaryotes \n========== \n")
Phyloseq Eukaryotes
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 683 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 683 taxa by 8 taxonomic ranks ]
ps_photo <- subset_taxa(ps_euk, (division %in% c("Chlorophyta", "Cryptophyta",
"Rhodophyta", "Haptophyta", "Ochrophyta")) | ((division == "Dinoflagellata") &
(class != "Syndiniales")) | (class == "Filosa-Chlorarachnea"))
cat("\nPhyloseq Photosynthetic Eukaryotes \n========== \n")
Phyloseq Photosynthetic Eukaryotes
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 271 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 271 taxa by 8 taxonomic ranks ]
ps_hetero <- subset_taxa(ps_euk, !(division %in% c("Chlorophyta", "Cryptophyta",
"Rhodophyta", "Haptophyta", "Ochrophyta")) & !((division == "Dinoflagellata") &
!(class == "Syndiniales")) & !(class == "Filosa-Chlorarachnea"))
cat("\nPhyloseq Heterotrophic Eukaryotes \n========== \n")
Phyloseq Heterotrophic Eukaryotes
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 412 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 412 taxa by 8 taxonomic ranks ]
ps_prok <- subset_taxa(ps_all, (kingdom %in% c("Bacteria", "Archaea")))
cat("\nPhyloseq Prokaryotes \n========== \n")
Phyloseq Prokaryotes
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1888 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 1888 taxa by 8 taxonomic ranks ]
6.3 Normalize number of reads in each sample using median sequencing depth.
- ! If there no cells do not transform, just set column to 0
function(x, t=total_hetero) (if(sum(x) > 0){ t * (x / sum(x))} else {x})
# First define a function to normalize
ps_normalize_median <- function(ps, title) {
ps_median = median(sample_sums(ps))
cat(sprintf("\nThe median number of reads used for normalization of %s is %.0f",
title, ps_median))
normalize_median = function(x, t = ps_median) (if (sum(x) > 0) {
t * (x/sum(x))
} else {
x
})
ps = transform_sample_counts(ps, normalize_median)
cat(str_c("\nPhyloseq ", title, "\n========== \n"))
print(ps)
}
# Apply to all the phyloseq files
ps_all = ps_normalize_median(ps_all, "all")
The median number of reads used for normalization of all is 54948
Phyloseq all
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 2571 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 2571 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of eukaryotes (auto+hetero) is 4735
Phyloseq eukaryotes (auto+hetero)
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 683 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 683 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of eukaryotes autotrophs is 3038
Phyloseq eukaryotes autotrophs
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 271 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 271 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of eukaryotes heterotrophs is 983
Phyloseq eukaryotes heterotrophs
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 412 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 412 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of prokaryotes is 50346
Phyloseq prokaryotes
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1888 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 1888 taxa by 8 taxonomic ranks ]
6.4 Phyloseq files for abundant taxa
- Remove taxa that are < 0.10 (euks) and <0.05 (proks) in any given sample
- Normalize again…
ps_abundant <- function(ps, contrib_min = 0.1, title) {
total_per_sample <- max(sample_sums(ps))
ps <- filter_taxa(ps, function(x) sum(x > total_per_sample * contrib_min) >
0, TRUE)
ps <- ps_normalize_median(ps, title)
}
cat("Remove taxa in low abundance \n\n")
Remove taxa in low abundance
The median number of reads used for normalization of All is 24049
Phyloseq All
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 45 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 45 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of eukaryotes (auto+hetero) is 3359
Phyloseq eukaryotes (auto+hetero)
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 61 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 61 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of eukaryotes autotrophs is 2767
Phyloseq eukaryotes autotrophs
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 60 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 60 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of eukaryotes heterotrophs is 688
Phyloseq eukaryotes heterotrophs
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 88 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 88 taxa by 8 taxonomic ranks ]
The median number of reads used for normalization of prokaryotes is 23865
Phyloseq prokaryotes
==========
phyloseq-class experiment-level object
otu_table() OTU Table: [ 44 taxa and 81 samples ]
sample_data() Sample Data: [ 81 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 44 taxa by 8 taxonomic ranks ]
6.5 Create a list for the auto and hetero phyloseq files
ps_list <- list(ps = c(ps_prok, ps_euk, ps_photo), title = c("Prokaryotes - all OTUs",
"Eukaryotes - Auto and Hetero - all OTUs", "Eukaryotes - Autotrophs - all OTUs"))
ps_list_abundant <- list(ps = c(ps_prok_abundant, ps_euk_abundant, ps_photo_abundant),
title = c("Prokaryotes - abundant OTUs (> 5%)", "Eukaryotes - Auto + Hetero - abundant OTUs (> 10%)",
"Eukaryotes - Autotrophs - abundant OTUs (> 10%)"))
6.6 Create tabular files for other plots
ps_to_long <- function(ps) {
otu_df <- data.frame(otu_table(ps)) %>% rownames_to_column(var = "otu_id")
taxo_df <- data.frame(tax_table(ps)) %>% rownames_to_column(var = "otu_id")
otu_df <- left_join(taxo_df, otu_df)
otu_df <- gather(otu_df, "sample", "n_seq", contains("X")) # All samples contain X
metadata_df <- data.frame(sample_data(ps)) %>% rownames_to_column(var = "sample")
otu_df <- left_join(otu_df, metadata_df)
}
long_all <- ps_to_long(ps_all)
long_euk <- ps_to_long(ps_euk)
long_photo <- ps_to_long(ps_photo)
long_euk_abundant <- ps_to_long(ps_euk_abundant)
long_photo_abundant <- ps_to_long(ps_photo_abundant)
long_prok <- ps_to_long(ps_prok)
6.7 Treemaps at division and class levels
6.7.1 Define function to draw treemaps
treemap_gg_dv <- function(df, group1, group2, title) {
group1 <- enquo(group1)
group2 <- enquo(group2)
df <- df %>% group_by(!!group1, !!group2) %>% summarise(n_seq = sum(n_seq))
g_treemap <- ggplot(df, aes(area = n_seq, fill = !!group2, label = !!group2,
subgroup = !!group1)) + ggtitle(title) + treemapify::geom_treemap() +
treemapify::geom_treemap_subgroup_border() + treemapify::geom_treemap_text(colour = "black",
place = "topleft", reflow = T, padding.x = grid::unit(3, "mm"), padding.y = grid::unit(3,
"mm")) + treemapify::geom_treemap_subgroup_text(place = "centre",
grow = T, alpha = 0.5, colour = "white", fontface = "italic", min.size = 0) +
scale_fill_viridis(discrete = TRUE) + theme(legend.position = "none",
plot.title = element_text(size = 16, face = "bold"))
print(g_treemap)
return(g_treemap)
}
6.7.2 Do the individual treemaps
array_treemap_all <- list()
array_treemap_by_kingdom <- list()
for (one_strait in c("Singapore", "Johor")) {
label <- str_c(one_strait, "-all")
array_treemap_all[[label]] <- treemap_gg_dv(filter(long_all, strait == one_strait),
kingdom, supergroup, str_c("", one_strait))
label <- str_c(one_strait, "-arch")
array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_prok, strait ==
one_strait & kingdom == "Archaea"), division, class, str_c("A - Archaea - ",
one_strait))
label <- str_c(one_strait, "-bact")
array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_prok, strait ==
one_strait & kingdom == "Bacteria"), division, class, str_c("B - Bacteria - ",
one_strait))
label <- str_c(one_strait, "-euks")
array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_euk, strait ==
one_strait), division, class, str_c("C - All Eulkaryotes - ", one_strait))
# label <- str_c(one_strait,'-photeuks') array_treemap[[label]] <-
# treemap_gg_dv(filter(long_photo, strait == one_strait), division, class,
# str_c('D - Photosynthetic Eukaryotes - ', one_strait ))
# treemap_dv(filter(long_euk, strait == one_strait), c('division',
# 'class'),'n_seq', str_c('All euks - ', one_strait ))
# treemap_dv(filter(long_photo, strait == one_strait), c('division',
# 'class'),'n_seq',str_c('Photo euks', one_strait ))
# treemap_dv(filter(long_prok, strait == one_strait & kingdom=='Bacteria'),
# c('division', 'class'),'n_seq',str_c('Bacteria - ', one_strait ))
# treemap_dv(filter(long_prok, strait == one_strait & kingdom=='Archaea'),
# c('division', 'class'),'n_seq',str_c('Archaea - ', one_strait ))
}
6.7.3 FIG - Arrange the different treemaps in a grid and save
# Main Fig, all groups
fig_grid_treemap_all <- gridExtra::grid.arrange(grobs = array_treemap_all, ncol = 2,
nrow = 1, clip = FALSE, padding = unit(0, "line"), as.table = FALSE)
TableGrob (1 x 2) "arrange": 2 grobs
z cells name grob
Singapore-all 1 (1-1,1-1) arrange gtable[layout]
Johor-all 2 (1-1,2-2) arrange gtable[layout]
ggsave("../fig/fig_treemap_all.pdf", fig_grid_treemap_all, height = 5, width = 10,
dpi = 300)
# Supp Fig, by kingdom
fig_grid_treemap_by_kingdom <- gridExtra::grid.arrange(grobs = array_treemap_by_kingdom,
ncol = 2, nrow = 3, clip = FALSE, padding = unit(0, "line"), as.table = FALSE)
TableGrob (3 x 2) "arrange": 6 grobs
z cells name grob
Singapore-arch 1 (1-1,1-1) arrange gtable[layout]
Singapore-bact 2 (2-2,1-1) arrange gtable[layout]
Singapore-euks 3 (3-3,1-1) arrange gtable[layout]
Johor-arch 4 (1-1,2-2) arrange gtable[layout]
Johor-bact 5 (2-2,2-2) arrange gtable[layout]
Johor-euks 6 (3-3,2-2) arrange gtable[layout]
6.8 Most abundant taxa
6.8.1 Most abundant eukaryote species
df <- filter(long_euk, strait %in% c("Johor", "Singapore")) %>% mutate(species_label = str_c(class,
species, sep = "-")) %>% group_by(division, class, species, species_label,
strait) %>% summarize(n_seq = sum(n_seq)) %>% arrange(desc(n_seq)) %>% filter(n_seq >
0) %>% ungroup()
df_strait <- df %>% select(species, strait) %>% group_by(species) %>% summarise(n_strait = n()) %>%
ungroup()
df <- df %>% left_join(df_strait, by = c(species = "species")) %>% mutate(species = case_when(n_strait ==
1 ~ str_c(species, " *"), TRUE ~ species))
array_euk <- list()
for (one_strait in c("Singapore", "Johor")) {
df_one <- filter(df, strait == one_strait)
array_euk[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(species,
n_seq), y = n_seq, fill = division)) + coord_flip() + xlab("") + ylab("Number of reads") +
scale_fill_manual(values = division_euk_colors) + ggtitle(one_strait) +
theme_bw() + theme(plot.title = element_text(size = 22, hjust = 0.5)) +
# theme(axis.text=element_text(size=14), legend.text =
# element_text(size=16)) +
theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top",
ncol = 4, nrow = 2, byrow = FALSE))
print(array_euk[[one_strait]])
}
6.8.2 Most abundant prokaryotes taxo6 (~ genus)
df <- filter(long_prok, strait %in% c("Johor", "Singapore")) %>% mutate(genus_label = str_c(division,
family, sep = "-")) %>% group_by(supergroup, division, class, family, genus_label,
strait) %>% summarize(n_seq = sum(n_seq)) %>% arrange(desc(n_seq)) %>% filter(n_seq >
0) %>% ungroup()
df_strait <- df %>% select(genus_label, strait) %>% group_by(genus_label) %>%
summarise(n_strait = n()) %>% ungroup()
df <- df %>% left_join(df_strait) %>% mutate(genus_label = case_when(n_strait ==
1 ~ str_c(genus_label, " *"), TRUE ~ genus_label))
array_prok <- list()
for (one_strait in c("Singapore", "Johor")) {
df_one <- filter(df, strait == one_strait)
array_prok[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(genus_label,
n_seq), y = n_seq, fill = supergroup)) + coord_flip() + xlab("") + ylab("Number of reads") +
scale_fill_manual(values = supergroup_colors) + ggtitle(one_strait) +
theme_bw() + theme(plot.title = element_text(size = 22, hjust = 0.5)) +
# theme(axis.text=element_text(size=14), legend.text =
# element_text(size=16)) +
theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top",
ncol = 4, nrow = 2, byrow = FALSE))
print(array_prok[[one_strait]])
}
6.8.3 Most abundant taxa - Save as Grid
6.9 Most abundant OTUs
6.9.1 Compute and plot separately Euk and Prok
df <- filter(long_all, strait %in% c("Johor", "Singapore")) %>% mutate(otu_label = case_when(kingdom ==
"Eukaryota" ~ str_c(otu_id, species, sep = "-"), TRUE ~ str_c(otu_id, family,
sep = "-"))) %>% group_by(kingdom, supergroup, division, class, genus, species,
otu_id, otu_label, strait) %>% summarize(n_seq = sum(n_seq)) %>% arrange(desc(n_seq)) %>%
filter(n_seq > 0) %>% ungroup()
df_strait <- df %>% select(otu_label, strait) %>% group_by(otu_label) %>% summarise(n_strait = n()) %>%
ungroup()
df <- df %>% left_join(df_strait) %>% mutate(otu_label = case_when(n_strait ==
1 ~ str_c(otu_label, " +"), TRUE ~ otu_label))
otu_strait <- df
array_euk <- list()
array_prok <- list()
# Do plots
for (one_strait in c("Singapore", "Johor")) {
# For euks
df_one <- filter(df, strait == one_strait & kingdom == "Eukaryota")
array_euk[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(otu_label,
n_seq), y = n_seq, fill = division)) + coord_flip() + theme_bw() + theme(axis.text.x = element_text(size = 16,
angle = 0, hjust = 1, vjust = 1)) + theme(axis.text.y = element_text(size = 16,
angle = 0, hjust = 0, vjust = 0)) + theme(legend.title = element_text(size = 24)) +
theme(legend.text = element_text(size = 16)) + xlab("") + ylab("Number of reads") +
scale_fill_manual(values = division_euk_colors) + ggtitle(one_strait) +
theme(plot.title = element_text(size = 22, hjust = 0.5)) + # theme(axis.text=element_text(size=14), legend.text =
# element_text(size=16)) +
theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top",
ncol = 2, nrow = 4, byrow = FALSE))
print(array_euk[[one_strait]])
# For proks
df_one <- filter(df, strait == one_strait & kingdom != "Eukaryota")
array_prok[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(otu_label,
n_seq), y = n_seq, fill = supergroup)) + coord_flip() + theme_bw() +
theme(axis.text.x = element_text(size = 16, angle = 0, hjust = 1, vjust = 1)) +
theme(axis.text.y = element_text(size = 16, angle = 0, hjust = 0, vjust = 0)) +
theme(legend.title = element_text(size = 24)) + theme(legend.text = element_text(size = 16)) +
xlab("") + ylab("Number of reads") + scale_fill_manual(values = supergroup_colors) +
ggtitle(one_strait) + theme(plot.title = element_text(size = 22, hjust = 0.5)) +
# theme(axis.text=element_text(size=14), legend.text =
# element_text(size=16)) +
theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top",
ncol = 2, nrow = 4, byrow = FALSE))
print(array_prok[[one_strait]])
}
6.9.2 FIG Most abundant ASVs - Save as Grid
6.10 Bar plots
6.10.1 Bar plot 20 most abundant divisions
- RM samples not considered
division_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom,
supergroup, division) %>% summarise(n_seq_division = sum(n_seq)) %>% arrange(-n_seq_division)
long_division_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>%
mutate(division = case_when(division %in% division_top$division[1:20] ~
division, TRUE ~ "Other")) %>% group_by(strait, sample_label, division) %>%
summarise(n_seq_division = sum(n_seq))
seq_stations <- long_division_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_division))
n_seq_station <- seq_stations$n_seq_station[1]
# Set up the colors
colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD",
"#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711",
"#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788",
"#999099", "#AA4455")
colours_name <- t(lapply(colours, plotrix::color.id))
# Read the colors from Excel file
top_division_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "D1:F22")
top_division_colors <- structure(top_division_colors.df$color_name, .Names = top_division_colors.df$division)
ggplot(long_division_top, aes(x = sample_label, y = n_seq_division/n_seq_station,
fill = factor(division, levels = top_division_colors.df$division))) + geom_col() +
theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) +
# scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_division_colors) + scale_y_continuous(labels = scales::percent,
expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") +
guides(fill = guide_legend(title = "Division", title.position = "top", ncol = 5,
nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait,
scales = "free_x", space = "free_x")
6.10.2 FIG - Bar plot 20 most abundant classes
- RM samples not considered
class_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom,
supergroup, division, class) %>% summarise(n_seq_class = sum(n_seq)) %>%
arrange(-n_seq_class)
long_class_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>%
mutate(class = case_when(class %in% class_top$class[1:20] ~ class, TRUE ~
"Other")) %>% group_by(strait, sample_label, class) %>% summarise(n_seq_class = sum(n_seq))
seq_stations <- long_class_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_class))
n_seq_station <- seq_stations$n_seq_station[1]
# Set up the colors
colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD",
"#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711",
"#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788",
"#999099", "#AA4455")
colours_name <- t(lapply(colours, plotrix::color.id))
# RColorBrewer::display.brewer.pal(n=3, name='Reds')
colours <- RColorBrewer::brewer.pal(n = 3, name = "Reds")
colours_name_archeae <- data.frame(purrr::flatten_chr(purrr::map(colours, plotrix::color.id)))
# RColorBrewer::display.brewer.pal(n=9, name='Blues')
colours <- RColorBrewer::brewer.pal(n = 9, name = "Blues")
colours_name_bacteria_1 <- data.frame(purrr::flatten_chr(purrr::map(colours,
plotrix::color.id)))
# RColorBrewer::display.brewer.pal(n=9, name='Purples')
colours <- RColorBrewer::brewer.pal(n = 9, name = "Purples")
colours_name_bacteria_2 <- data.frame(purrr::flatten_chr(purrr::map(colours,
plotrix::color.id)))
# RColorBrewer::display.brewer.pal(n=4, name='Greens')
colours <- RColorBrewer::brewer.pal(n = 4, name = "Greens")
colours_name_euks <- data.frame(purrr::flatten_chr(purrr::map(colours, plotrix::color.id)))
colours_name <- bind_rows(colours_name_archeae, colours_name_bacteria_1, colours_name_bacteria_2,
colours_name_euks)
writeClipboard(colours_name[, 1])
# Read the colors from Excel file
top_class_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "L1:N22")
top_class_colors <- structure(top_class_colors.df$color_name, .Names = top_class_colors.df$class)
g <- ggplot(long_class_top, aes(x = sample_label, y = n_seq_class/n_seq_station,
fill = factor(class, levels = top_class_colors.df$class))) + geom_col() +
theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) +
# scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_class_colors) + scale_y_continuous(labels = scales::percent,
expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") +
guides(fill = guide_legend(title = "Class", title.position = "top", ncol = 5,
nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait,
scales = "free_x", space = "free_x")
print(g)
6.10.3 Bar plot 20 most abundant orders
- RM samples not considered
order_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom,
supergroup, division, class, order) %>% summarise(n_seq_order = sum(n_seq)) %>%
arrange(-n_seq_order)
long_order_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>%
mutate(order = case_when(order %in% order_top$order[1:20] ~ order, TRUE ~
"Other")) %>% group_by(strait, sample_label, order) %>% summarise(n_seq_order = sum(n_seq))
seq_stations <- long_order_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_order))
n_seq_station <- seq_stations$n_seq_station[1]
# Set up the colors
colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD",
"#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711",
"#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788",
"#999099", "#AA4455")
colours_name <- t(lapply(colours, plotrix::color.id))
# Read the colors from Excel file
top_order_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "U1:W22")
top_order_colors <- structure(top_order_colors.df$color_name, .Names = top_order_colors.df$order)
ggplot(long_order_top, aes(x = sample_label, y = n_seq_order/n_seq_station,
fill = factor(order, levels = top_order_colors.df$order))) + geom_col() +
theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) +
# scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_order_colors) + scale_y_continuous(labels = scales::percent,
expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") +
guides(fill = guide_legend(title = "Order", title.position = "top", ncol = 5,
nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait,
scales = "free_x", space = "free_x")
6.10.4 Bar plot of divisions per station
Note: some stations are completely missing heterotrophs (Only Opistokonta)
for (i in 1:3) {
p <- plot_bar(ps_list$ps[[i]], x = "sample_label", fill = "division") +
geom_bar(aes(color = division, fill = division), stat = "identity",
position = "stack") + ggtitle(str_c("Division level - ", ps_list$title[[i]])) +
theme(axis.text.y = element_text(size = 10)) + theme(axis.text.x = element_text(angle = 0,
hjust = 0.5)) + coord_flip() + scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE)
print(p)
}
6.10.5 Bar plot of class per station
Only consider the abundant taxa
for (i in 1:3) {
p <- plot_bar(ps_list_abundant$ps[[i]], x = "sample_label", fill = "class") +
geom_bar(aes(color = class, fill = class), stat = "identity", position = "stack") +
ggtitle(str_c("Class level - ", ps_list_abundant$title[[i]])) + theme(axis.text.y = element_text(size = 10)) +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) + coord_flip() +
scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
print(p)
}
6.11 Synthetic comparisons
6.11.1 Compare by Straight, Site, Moonsoon (abundant OTUs only)
for (factor in c("strait", "location", "monsoon")) {
for (i in 1:3) {
ps_aggregate <- merge_samples(ps_list_abundant$ps[[i]], factor)
ps_aggregate <- transform_sample_counts(ps_aggregate, function(x) 100 *
(x/sum(x)))
p <- plot_bar(ps_aggregate, fill = "division") + geom_col(aes(color = division,
fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]], " - ",
factor)) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5,
hjust = 1)) + ylab("%") + scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE)
print(p)
}
}
6.11.2 Compare by Moonsoon in Singapore straight (abundant OTUs)
for (i in 1:3) {
ps_aggregate <- subset_samples(ps_list_abundant$ps[[i]], strait == "Singapore")
ps_aggregate <- merge_samples(ps_aggregate, "monsoon")
ps_aggregate <- transform_sample_counts(ps_aggregate, function(x) 100 *
(x/sum(x)))
p <- plot_bar(ps_aggregate, fill = "division") + geom_col(aes(color = division,
fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]], " - ",
factor)) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5,
hjust = 1)) + ylab("%") + scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
print(p)
}
6.12 Time series
6.12.1 Time series (abundant OTUs) aggregated for Singapore strait only
for (factor in c("date")) {
for (i in 1:3) {
ps_aggregate <- subset_samples(ps_list_abundant$ps[[i]], strait == "Singapore")
ps_aggregate <- merge_samples(ps_aggregate, "date")
ps_aggregate <- transform_sample_counts(ps_aggregate, function(x) 100 *
(x/sum(x)))
p <- plot_bar(ps_aggregate, fill = "division") + geom_col(aes(color = division,
fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]], " - ",
factor)) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5,
hjust = 1)) + ylab("%") + scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE)
print(p)
}
}
6.12.2 Time series (abundant OTUs) - Division level
for (i in 1:3) {
ps_plot <- ps_list_abundant$ps[[i]]
p <- plot_bar(ps_plot, x = "date", fill = "division") + facet_grid(rows = vars(location)) +
geom_col(aes(color = division, fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]],
" - Date")) + theme(axis.text.x = element_text(angle = 45, vjust = 0.5,
hjust = 1)) + ylab("%") + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) +
geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", "2018-01-01",
"2019-01-01")))) + scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
print(p)
}
6.12.3 Time series (abundant OTUs) - Genus level for Chlorophyta
for (i in 3:3) {
ps_plot <- subset_taxa(ps_list_abundant$ps[[i]], division == "Chlorophyta")
p <- plot_bar(ps_plot, x = "date", fill = "genus") + facet_grid(rows = vars(location)) +
geom_col(aes(color = genus, fill = genus)) + ggtitle(str_c(ps_list_abundant$title[[i]],
" - Date")) + theme(axis.text.x = element_text(angle = 45, vjust = 0.5,
hjust = 1)) + ylab("%") + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) +
geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", "2018-01-01",
"2019-01-01")))) + scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
print(p)
}
6.13 Main species for each division (Eukaryotes - Autrotrophs)
p <- list()
for (one_division in c("Chlorophyta", "Dinoflagellata", "Ochrophyta", "Cryptophyta",
"Haptophyta")) {
ps_subset <- subset_samples(ps_photo_abundant, strait != "Raffles Mari")
ps_subset <- subset_taxa(ps_subset, division %in% one_division)
p[[one_division]] <- plot_bar(ps_subset, x = "species") + facet_grid(rows = vars(strait),
cols = vars(monsoon)) + geom_col() + theme(axis.text.x = element_text(angle = 90,
vjust = 0.5, hjust = 1)) + ggtitle(str_c(one_division, " - Abundant OTUs")) +
coord_flip()
print(p[[one_division]])
}
# grid_array <- list(p[['Chlorophyta']],p[['Ochrophyta']]) grid_layout <-
# rbind(c(NA,1,1,1,1,1), c( 2,2,2,2,2,2)) grid_fig <-
# gridExtra::grid.arrange(grobs=grid_array, layout_matrix=grid_layout,
# clip=FALSE, padding = unit(0, 'line'), as.table = FALSE)
grid_fig <- cowplot::plot_grid(p[["Chlorophyta"]], p[["Ochrophyta"]], labels = c("A",
"B"), align = "v", nrow = 2)
# ggsave('../fig/Fig_Species_Moonsoon.png', grid_fig, height = 10, width =
# 8, dpi = 300)
6.14 Heatmaps
6.14.1 Abundant OTUs
- Data are agglomarated at the genus level. Use function
tax_glom
for (i in c(1)) {
ps_heat <- tax_glom(ps_list_abundant$ps[[i]], taxrank = "family")
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "family",
taxa.order = "division", sample.label = "sample_label", sample.order = "sample_label",
low = "beige", high = "red", na.value = "beige", title = ps_list_abundant$title[[i]])
print(p)
}
for (i in 2:3) {
ps_heat <- tax_glom(ps_list_abundant$ps[[i]], taxrank = "genus")
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "genus",
taxa.order = "division", sample.label = "sample_label", sample.order = "sample_label",
low = "beige", high = "red", na.value = "beige", title = ps_list_abundant$title[[i]])
print(p)
}
6.14.2 Chlorophyta at species level
All ASVs considered (not only abundant)
ps_heat <- subset_taxa(ps_photo, division == "Chlorophyta")
ps_heat <- tax_glom(ps_heat, taxrank = "species")
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "species",
taxa.order = "species", sample.label = "sample_label", sample.order = "sample_label",
low = "beige", high = "red", na.value = "beige", trans = scales::log10_trans(),
title = "Mamiellophyceae in Singapore")
print(p)
6.14.3 Mamiello (Only Bathy, Ostreo and Micromonas) at genus level
All ASVs considered (not only abundant)
ps_heat <- subset_taxa(ps_photo, genus %in% c("Ostreococcus", "Bathycoccus",
"Micromonas"))
ps_heat <- tax_glom(ps_heat, taxrank = "genus")
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "genus",
taxa.order = "genus", sample.label = "sample_label", sample.order = "sample_label",
low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10),
title = "Mamiellophyceae in Singapore")
print(p)
6.14.4 Mamiello (Only Bathy, Ostreo and Micromonas) at species level
All ASVs considered (not only abundant)
ps_heat <- subset_taxa(ps_photo, genus %in% c("Ostreococcus", "Bathycoccus",
"Micromonas"))
ps_heat <- tax_glom(ps_heat, taxrank = "species")
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "species",
taxa.order = "species", sample.label = "sample_label", sample.order = "sample_label",
low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10),
title = "Mamiellophyceae in Singapore")
print(p)
6.14.5 Syndiniales at order level
All ASVs considered (not only abundant)
ps_heat <- subset_taxa(ps_euk, class %in% c("Syndiniales"))
ps_heat <- subset_samples(ps_heat, !is.na(Salinity))
ps_heat <- tax_glom(ps_heat, taxrank = "order")
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "order",
taxa.order = "order", sample.label = "Salinity", sample.order = "Salinity",
low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10),
title = "Syndiniales in Singapore") + facet_grid(~location, scales = "free_x")
print(p)
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "order",
taxa.order = "order", sample.label = "Temperature", sample.order = "Temperature",
low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10),
title = "Syndiniales in Singapore") + facet_grid(~location, scales = "free_x")
print(p)
ps_heat <- subset_taxa(ps_euk, class %in% c("Syndiniales"))
ps_heat <- subset_samples(ps_heat, !is.na(Salinity))
ps_heat <- tax_glom(ps_heat, taxrank = "family")
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "family",
taxa.order = "family", sample.label = "Salinity", sample.order = "Salinity",
low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10),
title = "Syndiniales in Singapore") + facet_grid(~location, scales = "free_x")
print(p)
6.14.6 FIG - Most abundant OTUs in Johor vs Singapre and Prokaryotes vs Eukaryotes
heat_prok <- list()
heat_euk <- list()
for (strait_one in c("Singapore", "Johor")) {
for (kingdom_one in list(c("Bacteria", "Archaea"), "Eukaryota")){
otu_strait_one <- otu_strait %>%
filter(strait == strait_one & kingdom %in% kingdom_one) %>%
top_n(30, n_seq)
ps_heat <- ps_all %>%
subset_samples(strait==strait_one) %>%
subset_taxa(kingdom %in% kingdom_one)
otu_table_keep <- data.frame(otu_table(ps_heat)) %>%
rownames_to_column("otu_id") %>%
filter(otu_id %in% otu_strait_one$otu_id) %>%
left_join(select(otu_strait_one, otu_id)) %>%
column_to_rownames("otu_id")
tax_table_keep <- data.frame(tax_table(ps_heat)) %>%
rownames_to_column("otu_id") %>%
filter(otu_id %in% otu_strait_one$otu_id) %>%
left_join(select(otu_strait_one, otu_id, otu_label)) %>%
column_to_rownames("otu_id")
OTU = otu_table(as.matrix(otu_table_keep), taxa_are_rows = TRUE)
TAX = tax_table(as.matrix(tax_table_keep))
samples = sample_data(ps_heat)
ps_heat <- phyloseq(OTU, TAX, samples)
print(ps_heat)
p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray",
taxa.label = "otu_label", taxa.order="supergroup",
sample.label = "sample_label", sample.order="sample_label",
low="beige", high="red", na.value="beige",trans = scales::log10_trans(),
title=strait_one) +
theme(axis.text.y = element_text(size = 16, angle = 0, hjust = 0, vjust = 0.5)) +
theme(axis.text.x = element_text(size = 16)) +
theme(axis.title = element_text(size = 0)) +
theme(plot.title = element_text(size=22, hjust = 0.5)) +
xlab("Sample") + ylab("")
print(p)
if (kingdom_one=="Eukaryota") heat_euk[[strait_one]] <- p
else heat_prok[[strait_one]] <- p
}}
phyloseq-class experiment-level object
otu_table() OTU Table: [ 30 taxa and 30 samples ]
sample_data() Sample Data: [ 30 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 30 taxa by 9 taxonomic ranks ]
phyloseq-class experiment-level object
otu_table() OTU Table: [ 30 taxa and 30 samples ]
sample_data() Sample Data: [ 30 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 30 taxa by 9 taxonomic ranks ]
phyloseq-class experiment-level object
otu_table() OTU Table: [ 30 taxa and 40 samples ]
sample_data() Sample Data: [ 40 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 30 taxa by 9 taxonomic ranks ]
phyloseq-class experiment-level object
otu_table() OTU Table: [ 30 taxa and 40 samples ]
sample_data() Sample Data: [ 40 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 30 taxa by 9 taxonomic ranks ]
grid_fig_heat <- cowplot::plot_grid(heat_prok[["Singapore"]],heat_prok[["Johor"]], align="hv",
# labels=c("A","B"),
ncol=1, nrow=2, label_size = 20)
ggsave("../fig/fig_heat_prok.pdf", grid_fig_heat,
height = 20, width = 15, dpi = 300)
grid_fig_heat <- cowplot::plot_grid(heat_euk[["Singapore"]],heat_euk[["Johor"]], align="hv",
# labels=c("A","B"),
ncol=1, nrow=2, label_size = 20)
ggsave("../fig/fig_heat_euk.pdf", grid_fig_heat,
height = 20, width = 15, dpi = 300)
6.15 NMDS
Sample removed because they were pulling the NMDS * PR2X16XS21 it has a single eukaryote (diatom bloom ) * RM13XS36 cause problem for bacteria * PR11XS25 cause problem for hetero euks * SBW11XS26 cause problem for hetero euks * SBW13XS37 cause problem for hetero euks * RM13XS36 cause problem for hetero euks
6.15.1 Define function
- See comments inside functions (saved plots are different from displayed plots)
ps_do_nmds <- function(ps_list) {
plot_array <- list()
for (i in 1:3) {
ps_nmds <- ps_list$ps[[i]]
# Remove samples with no reads
ps_nmds <- prune_samples(sample_sums(ps_nmds) > 0, ps_nmds)
# Remove samples from Raffles and Johor
ps_nmds <- subset_samples(ps_nmds, !(str_detect(strait, "Raffles Mari|Johor")))
# Remove samples that caused problems (1= prok, 2=euk, 3=euk auto)
if (i == 1)
{
ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("RM13XS36")),
ps_nmds)
} # Prokaryotes
if (i %in% c(2, 3))
{
ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR2X16SXS21")),
ps_nmds)
} # Eukaryotes
if (i == 4)
{
ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR11XS25",
"SBW11XS26", "SBW13XS37")), ps_nmds)
} # Heterotrophs not used
singa.ord <- ordinate(ps_nmds, "NMDS", "bray")
# Fit environmental parameters
env_var <- sample_variables(ps_nmds)
env_matrix <- get_variable(ps_nmds, c("Chl", "Temperature", "Salinity",
"Phosphate", "Silicate", "DIN", "BAC"))
env_fit <- vegan::envfit(singa.ord, env = env_matrix, perm = 999, na.rm = TRUE)
env_arrows <- data.frame(env_fit$vectors$arrows * sqrt(env_fit$vectors$r)) %>%
rownames_to_column(var = "parameter")
nmds_samples <- data.frame(singa.ord[["points"]], get_variable(ps_nmds,
c("monsoon", "strait", "location", "location_label"))) %>% rownames_to_column(var = "sample")
# Factor to move the labels
nudge_x <- max(nmds_samples$MDS1) * 0.08
nudge_y <- max(nmds_samples$MDS2) * 0.08
xy_max = max(c(nmds_samples$MDS1, nmds_samples$MDS2)) * 1.5
xy_min = min(c(nmds_samples$MDS1, nmds_samples$MDS2)) * 1.5
factor <- 3 # for vectors for euks
if (i == 1)
{
factor <- 1.5
} # for vectors for proks
print(factor)
p <- plot_ordination(ps_nmds, singa.ord, type = "samples", color = "monsoon",
shape = "strait", title = ps_list$title[[i]]) + geom_point(aes(shape = strait,
color = monsoon), size = 3.5) + scale_color_viridis(discrete = TRUE) +
scale_shape_manual(values = c(15, 16)) + geom_text(aes(label = location_label,
color = monsoon), nudge_x = nudge_x, nudge_y = nudge_y, check_overlap = TRUE,
size = 2) + theme_bw() + geom_segment(data = env_arrows, aes(x = 0,
xend = NMDS1 * factor, y = 0, yend = NMDS2 * factor), inherit.aes = FALSE,
arrow = arrow(length = unit(0.5, "cm")), colour = "black") + geom_text(data = env_arrows,
aes(x = NMDS1 * factor, y = NMDS2 * factor, label = parameter),
inherit.aes = FALSE, hjust = -0.2, vjust = -0.2, size = 3)
print(singa.ord)
plot_array[[i]] <- p
print(p)
# The following lines can be used if you want to avoid using the pjhyloseq
# functions to plot the data. Notes : - must use inherit.aes = FALSE to add
# some extra layers - the saved plots have a different scale for the added
# layer than the displaued plot can figure out ggplot()+ coord_fixed() +
# xlim(xy_min, xy_max) + ylim(xy_min, xy_max) +
# geom_point(data=nmds_samples, aes(x=MDS1, y=MDS2, shape=strait,
# color=monsoon), size=5) + geom_text(data=nmds_samples, aes(x=MDS1, y=MDS2,
# label=location_label, color=monsoon), nudge_x=nudge_x, nudge_y=nudge_y,
# check_overlap = FALSE, size=2) + ggtitle(ps_list$title[[i]]) +
p <- plot_ordination(ps_nmds, singa.ord, type = "taxa", color = "division",
title = ps_list$title[[i]]) + scale_color_viridis(discrete = TRUE) +
geom_point(size = 3) + theme_bw() + geom_segment(data = env_arrows,
aes(x = 0, xend = NMDS1 * factor, y = 0, yend = NMDS2 * factor),
inherit.aes = FALSE, arrow = arrow(length = unit(0.5, "cm")), colour = "black") +
geom_text(data = env_arrows, aes(x = NMDS1 * factor, y = NMDS2 *
factor, label = parameter), inherit.aes = FALSE, hjust = -0.2,
vjust = -0.2, size = 3)
print(p)
plot_array[[i + 3]] <- p
}
return(plot_array)
}
6.15.2 All OTUs
Square root transformation
Wisconsin double standardization
Run 0 stress 0.12
Run 1 stress 0.12
... New best solution
... Procrustes: rmse 0.03 max resid 0.13
Run 2 stress 0.12
... New best solution
... Procrustes: rmse 0.00031 max resid 0.0014
... Similar to previous best
Run 3 stress 0.12
... Procrustes: rmse 3e-04 max resid 0.0014
... Similar to previous best
Run 4 stress 0.12
Run 5 stress 0.12
... New best solution
... Procrustes: rmse 0.00024 max resid 0.00077
... Similar to previous best
Run 6 stress 0.12
Run 7 stress 0.12
... New best solution
... Procrustes: rmse 0.00013 max resid 0.00045
... Similar to previous best
Run 8 stress 0.12
Run 9 stress 0.12
... Procrustes: rmse 4.6e-05 max resid 0.00012
... Similar to previous best
Run 10 stress 0.12
Run 11 stress 0.12
... Procrustes: rmse 0.00011 max resid 0.00036
... Similar to previous best
Run 12 stress 0.15
Run 13 stress 0.18
Run 14 stress 0.12
... Procrustes: rmse 0.00033 max resid 0.0012
... Similar to previous best
Run 15 stress 0.15
Run 16 stress 0.12
Run 17 stress 0.12
Run 18 stress 0.16
Run 19 stress 0.12
... Procrustes: rmse 4.9e-05 max resid 0.00021
... Similar to previous best
Run 20 stress 0.12
... Procrustes: rmse 0.00013 max resid 0.00053
... Similar to previous best
*** Solution reached
[1] 1.5
Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(veganifyOTU(physeq)))
Distance: bray
Dimensions: 2
Stress: 0.12
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
Square root transformation
Wisconsin double standardization
Run 0 stress 0.22
Run 1 stress 0.24
Run 2 stress 0.27
Run 3 stress 0.27
Run 4 stress 0.22
... New best solution
... Procrustes: rmse 0.036 max resid 0.1
Run 5 stress 0.22
Run 6 stress 0.25
Run 7 stress 0.23
Run 8 stress 0.22
Run 9 stress 0.22
Run 10 stress 0.22
Run 11 stress 0.24
Run 12 stress 0.24
Run 13 stress 0.25
Run 14 stress 0.22
Run 15 stress 0.22
... Procrustes: rmse 0.03 max resid 0.14
Run 16 stress 0.23
Run 17 stress 0.22
Run 18 stress 0.23
Run 19 stress 0.24
Run 20 stress 0.22
... New best solution
... Procrustes: rmse 0.00012 max resid 0.00037
... Similar to previous best
*** Solution reached
[1] 3
Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(veganifyOTU(physeq)))
Distance: bray
Dimensions: 2
Stress: 0.22
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
Square root transformation
Wisconsin double standardization
Run 0 stress 0.2
Run 1 stress 0.19
... New best solution
... Procrustes: rmse 0.1 max resid 0.31
Run 2 stress 0.19
... New best solution
... Procrustes: rmse 0.031 max resid 0.11
Run 3 stress 0.21
Run 4 stress 0.18
... New best solution
... Procrustes: rmse 0.081 max resid 0.27
Run 5 stress 0.25
Run 6 stress 0.19
Run 7 stress 0.2
Run 8 stress 0.21
Run 9 stress 0.18
... Procrustes: rmse 0.0082 max resid 0.031
Run 10 stress 0.22
Run 11 stress 0.18
... New best solution
... Procrustes: rmse 0.018 max resid 0.072
Run 12 stress 0.19
Run 13 stress 0.22
Run 14 stress 0.19
Run 15 stress 0.19
Run 16 stress 0.22
Run 17 stress 0.18
... Procrustes: rmse 0.018 max resid 0.071
Run 18 stress 0.19
Run 19 stress 0.22
Run 20 stress 0.18
*** No convergence -- monoMDS stopping criteria:
20: stress ratio > sratmax
[1] 3
Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(veganifyOTU(physeq)))
Distance: bray
Dimensions: 2
Stress: 0.18
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
6.15.3 Abundant OTUs
Square root transformation
Wisconsin double standardization
Run 0 stress 0.13
Run 1 stress 0.17
Run 2 stress 0.13
... Procrustes: rmse 0.00069 max resid 0.0031
... Similar to previous best
Run 3 stress 0.22
Run 4 stress 0.13
... Procrustes: rmse 0.00047 max resid 0.0021
... Similar to previous best
Run 5 stress 0.13
... Procrustes: rmse 0.00049 max resid 0.0022
... Similar to previous best
Run 6 stress 0.13
... Procrustes: rmse 0.00019 max resid 0.00085
... Similar to previous best
Run 7 stress 0.13
... Procrustes: rmse 0.00026 max resid 0.0012
... Similar to previous best
Run 8 stress 0.14
Run 9 stress 0.13
... Procrustes: rmse 0.00081 max resid 0.0037
... Similar to previous best
Run 10 stress 0.13
... Procrustes: rmse 0.00077 max resid 0.0035
... Similar to previous best
Run 11 stress 0.13
... Procrustes: rmse 0.0011 max resid 0.0051
... Similar to previous best
Run 12 stress 0.13
... Procrustes: rmse 0.00031 max resid 0.0014
... Similar to previous best
Run 13 stress 0.13
... Procrustes: rmse 0.00037 max resid 0.0017
... Similar to previous best
Run 14 stress 0.18
Run 15 stress 0.13
... Procrustes: rmse 0.00062 max resid 0.0028
... Similar to previous best
Run 16 stress 0.14
Run 17 stress 0.13
... Procrustes: rmse 0.00062 max resid 0.0028
... Similar to previous best
Run 18 stress 0.39
Run 19 stress 0.14
Run 20 stress 0.13
... Procrustes: rmse 0.00053 max resid 0.0024
... Similar to previous best
*** Solution reached
[1] 1.5
Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(veganifyOTU(physeq)))
Distance: bray
Dimensions: 2
Stress: 0.13
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
Square root transformation
Wisconsin double standardization
Run 0 stress 0.21
Run 1 stress 0.22
Run 2 stress 0.25
Run 3 stress 0.21
... Procrustes: rmse 0.018 max resid 0.084
Run 4 stress 0.25
Run 5 stress 0.22
Run 6 stress 0.24
Run 7 stress 0.21
... Procrustes: rmse 0.00018 max resid 0.00056
... Similar to previous best
Run 8 stress 0.24
Run 9 stress 0.22
Run 10 stress 0.3
Run 11 stress 0.22
Run 12 stress 0.26
Run 13 stress 0.22
Run 14 stress 0.22
Run 15 stress 0.21
... New best solution
... Procrustes: rmse 3.3e-05 max resid 7.9e-05
... Similar to previous best
Run 16 stress 0.24
Run 17 stress 0.21
... Procrustes: rmse 0.018 max resid 0.084
Run 18 stress 0.25
Run 19 stress 0.21
... New best solution
... Procrustes: rmse 0.00013 max resid 0.00044
... Similar to previous best
Run 20 stress 0.24
*** Solution reached
[1] 3
Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(veganifyOTU(physeq)))
Distance: bray
Dimensions: 2
Stress: 0.21
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
Square root transformation
Wisconsin double standardization
Run 0 stress 0.21
Run 1 stress 0.23
Run 2 stress 0.21
... Procrustes: rmse 0.11 max resid 0.3
Run 3 stress 0.21
... Procrustes: rmse 0.11 max resid 0.3
Run 4 stress 0.21
Run 5 stress 0.24
Run 6 stress 0.2
... New best solution
... Procrustes: rmse 0.061 max resid 0.27
Run 7 stress 0.22
Run 8 stress 0.23
Run 9 stress 0.21
Run 10 stress 0.2
Run 11 stress 0.2
... Procrustes: rmse 0.0065 max resid 0.02
Run 12 stress 0.22
Run 13 stress 0.22
Run 14 stress 0.22
Run 15 stress 0.22
Run 16 stress 0.22
Run 17 stress 0.24
Run 18 stress 0.2
Run 19 stress 0.21
Run 20 stress 0.2
*** No convergence -- monoMDS stopping criteria:
20: stress ratio > sratmax
[1] 3
Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(veganifyOTU(physeq)))
Distance: bray
Dimensions: 2
Stress: 0.2
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
### MSDS graph
grid_fig <- cowplot::plot_grid(plot_array_abundant[[1]], plot_array_abundant[[4]],
plot_array_abundant[[3]], plot_array_abundant[[6]], ncol = 2, labels = c("A",
"B", "C", "D"), align = "hv", label_size = 20)
grid_fig
6.16 Network analysis
for (i in 1:3) {
ps_nmds <- ps_list_abundant$ps[[i]]
# Remove samples with no reads
ps_nmds <- prune_samples(sample_sums(ps_nmds) > 0, ps_nmds)
# Remove samples that caused problems (1= prok, 2=euk, 3=euk auto)
if (i == 1)
{
ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("RM13XS36")),
ps_nmds)
} # Prokaryotes
if (i %in% c(2, 3))
{
ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR2X16SXS21")),
ps_nmds)
} # Eukaryotes
if (i == 4)
{
ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR11XS25",
"SBW11XS26", "SBW13XS37")), ps_nmds)
} # Heterotrophs not used
if (i > 1) {
p <- plot_net(ps_nmds, distance = "(A+B-2*J)/(A+B)", type = "taxa",
maxdist = 0.4, color = "class", point_label = "genus") + ggtitle(ps_list_abundant$title[[i]])
} else {
p <- plot_net(ps_nmds, distance = "(A+B-2*J)/(A+B)", type = "taxa",
maxdist = 0.4, color = "class", point_label = "family") + ggtitle(ps_list_abundant$title[[i]])
}
print(p)
}
6.17 Differential expression (DESeq2)
Love, M.I., Huber, W. & Anders, S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550.
- Follow:
- Start from raw counts (not normalized) :
ps_all_raw
[1] '1.22.2'
Block 1 is need to avoid the following error > Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means
See: https://github.com/joey711/phyloseq/issues/387
phyloseq-class experiment-level object
otu_table() OTU Table: [ 2571 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 2571 taxa by 8 taxonomic ranks ]
diagdds = phyloseq_to_deseq2(ps, ~monsoon)
# ps <- subset_samples(ps, strait %in% c('Singapore','Johor')) diagdds =
# phyloseq_to_deseq2(ps, ~ strait)
# Block 1 Start Estimate geometric mean with zeros
gm_mean = function(x, na.rm = TRUE) {
exp(sum(log(x[x > 0]), na.rm = na.rm)/length(x))
}
geoMeans = apply(counts(diagdds), 1, gm_mean)
diagdds = estimateSizeFactors(diagdds, geoMeans = geoMeans)
# Block 1 end
diagdds = DESeq(diagdds, fitType = "local")
res = results(diagdds, cooksCutoff = FALSE)
alpha = 0.01
sigtab = res[which(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(ps)[rownames(sigtab),
], "matrix"))
head(sigtab)
baseMean log2FoldChange lfcSE stat pvalue padj kingdom
asv_0005 2423.0 4.7 1.47 3.2 1.5e-03 6.8e-03 Archaea
asv_0025 39.8 8.9 2.60 3.4 6.3e-04 3.2e-03 Bacteria
asv_0054 596.5 1.4 0.42 3.3 9.2e-04 4.4e-03 Bacteria
asv_0074 245.0 4.9 1.24 4.0 7.3e-05 4.4e-04 Bacteria
asv_0100 5.6 21.2 2.96 7.1 9.4e-13 7.6e-12 Bacteria
supergroup division class
asv_0005 Thaumarchaeota Nitrososphaeria Nitrosopumilales
asv_0025 Actinobacteria Acidimicrobiia Microtrichales
asv_0054 Proteobacteria Alphaproteobacteria SAR11 clade
asv_0074 Proteobacteria Alphaproteobacteria SAR11 clade
asv_0100 Proteobacteria Gammaproteobacteria Betaproteobacteriales
order family
asv_0005 Nitrosopumilaceae Candidatus Nitrosopumilus
asv_0025 Ilumatobacteraceae Ilumatobacter
asv_0054 Clade I Clade Ib
asv_0074 Clade III Clade III
asv_0100 Nitrosomonadaceae IS-44
genus species
asv_0005 Candidatus Nitrosopumilus <NA>
asv_0025 Ilumatobacter <NA>
asv_0054 Clade Ib <NA>
asv_0074 Clade III <NA>
asv_0100 IS-44 <NA>
[ reached 'max' / getOption("max.print") -- omitted 1 rows ]
6.17.1 FIG
sigtab_plot <- sigtab %>% rownames_to_column(var = "otu_id") %>% top_n(-90,
wt = otu_id) %>% mutate(label = case_when(kingdom == "Eukaryota" ~ str_c(otu_id,
class, species, sep = "_"), TRUE ~ str_c(otu_id, class, genus, sep = "_")),
supergroup_label = str_c(str_sub(kingdom, 1, 1), supergroup, sep = "_")) %>%
arrange(desc(otu_id)) %>% rowid_to_column()
g <- ggplot(sigtab_plot, aes(x = reorder(label, log2FoldChange), y = log2FoldChange,
color = kingdom, alpha = log(rowid))) + theme_dviz_grid() + geom_point(size = 6) +
theme(axis.text.x = element_text(size = 20, angle = 0, hjust = 0.5, vjust = 0.5),
axis.title.x = element_text(size = 20), axis.text.y = element_text(size = 16,
angle = 0, hjust = 0, vjust = 0.5), legend.title = element_text(size = 24),
legend.text = element_text(size = 20)) + guides(alpha = FALSE) + ylab("log2 fold change - NE <--> SW") +
xlab("") + scale_color_viridis_d(option = "C", name = "Kingdom") + coord_flip()
g
6.17.2 Do a histogram plot for all ASV significantly different
p <- list()
for (asv_one in sigtab_plot$otu_id) {
ps_one <- prune_taxa(asv_one, ps)
p[[asv_one]] <- plot_bar(ps_one, x = "sample_label", fill = "class") + geom_bar(aes(color = class,
fill = class), stat = "identity", position = "stack") + ggtitle(asv_one) +
theme(axis.text.y = element_text(size = 10)) + theme(axis.text.x = element_text(size = 8,
angle = 0, hjust = 0.5)) + coord_flip() + scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE)
print(p[[asv_one]])
}