Metabarcoding Singapore
Phyloseq

1 Aim
2 Initialize
3 Read the data
- 3.1 File names
- 3.2 Read the files
  - 3.2.1 Metadata and sample files
  - 3.2.2 OTU table
4 Map
- 4.1 Leaflet map
5 Environmental data
- 5.1 Per station
- 5.2 Average
  - 5.2.1 Rain over 7 days
6 Phyloseq analysis

1 Aim

Create Phyloseq file from samples, metadata and OTU table (read from Excel file)
https://bioconductor.org/help/course-materials/2017/BioC2017/Day1/Workshops/Microbiome/MicrobiomeWorkflowII.html#abstract

2 Initialize

This file defines all the necessary libraries and variables

  source('Metabarcoding Singapore_init.R', echo=FALSE)

3 Read the data

3.1 File names

# Metadata and stations
metadata_xlsx <- "Singapore_metadata.xlsx"
metadata_csv <- "monsoonpaper_env_data.csv"

# OTUs
otu_excel_file <- "Singapore ASV_table.xlsx"

3.2 Read the files

The dada2 treatment has already removed the forward and reverse primers, so no need to remove them
Work with the unrarefied data

3.2.1 Metadata and sample files

# Read the sample and metadata tables
sample_table <- read_excel(metadata_xlsx, sheet = "samples", range = "A1:D89")

metadata_table <- read_csv(metadata_csv, na = c("ND", "")) %>% dplyr::rename(sample_code = Sample, 
    day_number = Day_number, date = Date, location = Location, monsoon = `Monsoon period`) %>% 
    select(-Strait) %>% mutate(date = lubridate::parse_date_time(date, "dmy"), 
    monsoon = forcats::fct_relevel(monsoon, "NE", "IM-1", "SW", "IM-2"))

station_table <- read_excel(metadata_xlsx, sheet = "stations", na = c("ND", 
    ""))

sample_table <- sample_table %>% left_join(metadata_table) %>% left_join(station_table) %>% 
    mutate(sample_label = str_c(location_label, date, monsoon, sep = "_"))

3.2.2 OTU table

NOTE: The OTU table is read from Excel file

The following editing has been done * Nanochloris changed to Picochlorum based on BLAST table * Ostreococcus sp. changed to O. lucimarinus based on the BLAST

# Read the OTU table

otu_table_final <- read_excel(otu_excel_file, sheet = "ASV")

4 Map

4.1 Leaflet map

Visualize different layers: https://leaflet-extras.github.io/leaflet-providers/preview/

  lng_center = mean(station_table$longitude)
  lat_center = mean(station_table$latitude)

  map <- leaflet(width = 1000, height = 1000) %>%
   addTiles() %>%  # Default
#    addTiles(urlTemplate = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}' ) %>%  # Satellite
#    addTiles(urlTemplate = 'https://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}' ) %>%  # Grey background
    setView(lng=lng_center, lat=lat_center, zoom=11) %>% 
    addCircleMarkers(data = station_table, lat = ~ latitude, lng = ~ longitude, 
                     radius = 5,
               label = ~ location, 
               labelOptions = labelOptions(textsize = "10px", noHide = T),
               clusterOptions = markerClusterOptions()) 

  map

5 Environmental data

5.1 Per station

5.1.1 Temp

p <- ggplot(filter(sample_table, location_label != "BL")) + geom_line(aes(x = date, 
    y = Temperature), na.rm = TRUE) + geom_point(aes(x = date, y = Temperature, 
    color = monsoon), size = 5) + facet_grid(rows = vars(location)) + ylim(26, 
    34) + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", 
    "2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p

### Salinity

p <- ggplot(filter(sample_table, location_label != "BL")) + geom_line(aes(x = date, 
    y = Salinity), na.rm = TRUE) + geom_point(aes(x = date, y = Salinity, color = monsoon), 
    size = 5) + facet_grid(rows = vars(location)) + ylim(15, 35) + xlim(as.POSIXct(as.Date(c("2017-01-01", 
    "2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", 
    "2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p

5.1.2 Chlorophyll

p <- ggplot(filter(sample_table, location_label != "BL")) + geom_line(aes(x = date, 
    y = Chl), na.rm = TRUE) + geom_point(aes(x = date, y = Chl, color = monsoon), 
    size = 5) + facet_grid(rows = vars(location)) + xlim(as.POSIXct(as.Date(c("2017-01-01", 
    "2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", 
    "2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p

5.1.3 Rain over 7 days

p <- ggplot(filter(sample_table, location_label != "BL")) + geom_col(aes(x = date, 
    y = Rain7, fill = monsoon)) + facet_grid(rows = vars(location)) + xlim(as.POSIXct(as.Date(c("2017-01-01", 
    "2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", 
    "2018-01-01", "2019-01-01")))) + scale_fill_viridis(discrete = TRUE)
p

5.2 Average

5.2.1 Rain over 7 days

rain <- sample_table %>% group_by(date, monsoon) %>% summarise(Rain7_mean = mean(Rain7, 
    na.rm = TRUE))

p <- ggplot(rain) + geom_point(aes(x = date, y = Rain7_mean, color = monsoon), 
    size = 5) + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) + geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", 
    "2018-01-01", "2019-01-01")))) + scale_color_viridis(discrete = TRUE)
p

6 Phyloseq analysis

6.1 Create phyloseq files for all taxa after filtering the data

Filter the euk data to remove the low bootstraps values (threshold : bootstrap > 90% at the supergroup level) and create a phyloseq file

Note the bootstrap threshold 90% for ASVs

6.1.1 Filter the ASV table to remove low bootstraps

otu_table_euk_final <- otu_table_final %>% filter(supergroup_boot > 90)
otu_table_prok_final <- otu_table_final %>% filter(kingdom %in% c("Bacteria", 
    "Archaea"))

otu_table_ps <- bind_rows(otu_table_euk_final, otu_table_prok_final) %>% arrange(otu_id)

6.1.2 Create the phyloseq file

Remove sequences with bad quality
Remove Metazoa
Remove Chloroplast

otu_mat <- otu_table_ps %>% select(otu = otu_id, matches("EC|PR|RM|SBW|STJ"), 
    -species, -species_boot)
tax_mat <- otu_table_ps %>% select(otu = otu_id, kingdom:species)
samples_df <- sample_table %>% rename(sample = sample_id)

row.names(otu_mat) <- otu_mat$otu
otu_mat <- otu_mat %>% select(-otu)
row.names(tax_mat) <- tax_mat$otu
tax_mat <- tax_mat %>% select(-otu)
row.names(samples_df) <- samples_df$sample
samples_df <- samples_df %>% select(-sample)
otu_mat <- as.matrix(otu_mat)
tax_mat <- as.matrix(tax_mat)

OTU = otu_table(otu_mat, taxa_are_rows = TRUE)
TAX = tax_table(tax_mat)
samples = sample_data(samples_df)

ps_all_raw <- phyloseq(OTU, TAX, samples)

# Filtering
ps_all_raw <- subset_samples(ps_all_raw, sequence_quality == "good")
ps_all_raw <- subset_taxa(ps_all_raw, !(division %in% c("Metazoa")))
ps_all_raw <- subset_taxa(ps_all_raw, !(kingdom %in% c("Eukaryota_plastid")))

6.2 Break up into photosynthetic and non-photosynthetic

cat("\nPhyloseq All \n========== \n")


Phyloseq All 
==========

ps_all <- ps_all_raw
ps_all

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2571 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 2571 taxa by 8 taxonomic ranks ]

ps_euk <- subset_taxa(ps_all, (kingdom %in% c("Eukaryota")))
cat("\nPhyloseq Eukaryotes \n========== \n")


Phyloseq Eukaryotes 
==========

ps_euk

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 683 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 683 taxa by 8 taxonomic ranks ]

ps_photo <- subset_taxa(ps_euk, (division %in% c("Chlorophyta", "Cryptophyta", 
    "Rhodophyta", "Haptophyta", "Ochrophyta")) | ((division == "Dinoflagellata") & 
    (class != "Syndiniales")) | (class == "Filosa-Chlorarachnea"))
cat("\nPhyloseq Photosynthetic Eukaryotes \n========== \n")


Phyloseq Photosynthetic Eukaryotes 
==========

ps_photo

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 271 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 271 taxa by 8 taxonomic ranks ]

ps_hetero <- subset_taxa(ps_euk, !(division %in% c("Chlorophyta", "Cryptophyta", 
    "Rhodophyta", "Haptophyta", "Ochrophyta")) & !((division == "Dinoflagellata") & 
    !(class == "Syndiniales")) & !(class == "Filosa-Chlorarachnea"))
cat("\nPhyloseq Heterotrophic Eukaryotes \n========== \n")


Phyloseq Heterotrophic Eukaryotes 
==========

ps_hetero

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 412 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 412 taxa by 8 taxonomic ranks ]

ps_prok <- subset_taxa(ps_all, (kingdom %in% c("Bacteria", "Archaea")))
cat("\nPhyloseq Prokaryotes \n========== \n")


Phyloseq Prokaryotes 
==========

ps_prok

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1888 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 1888 taxa by 8 taxonomic ranks ]

6.3 Normalize number of reads in each sample using median sequencing depth.

! If there no cells do not transform, just set column to 0 function(x, t=total_hetero) (if(sum(x) > 0){ t * (x / sum(x))} else {x})

# First define a function to normalize

ps_normalize_median <- function(ps, title) {
    ps_median = median(sample_sums(ps))
    cat(sprintf("\nThe median number of reads used for normalization of %s is  %.0f", 
        title, ps_median))
    normalize_median = function(x, t = ps_median) (if (sum(x) > 0) {
        t * (x/sum(x))
    } else {
        x
    })
    ps = transform_sample_counts(ps, normalize_median)
    cat(str_c("\nPhyloseq ", title, "\n========== \n"))
    print(ps)
}

# Apply to all the phyloseq files
ps_all = ps_normalize_median(ps_all, "all")


The median number of reads used for normalization of all is  54948
Phyloseq all
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2571 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 2571 taxa by 8 taxonomic ranks ]

ps_euk = ps_normalize_median(ps_euk, "eukaryotes (auto+hetero)")


The median number of reads used for normalization of eukaryotes (auto+hetero) is  4735
Phyloseq eukaryotes (auto+hetero)
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 683 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 683 taxa by 8 taxonomic ranks ]

ps_photo = ps_normalize_median(ps_photo, "eukaryotes autotrophs")


The median number of reads used for normalization of eukaryotes autotrophs is  3038
Phyloseq eukaryotes autotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 271 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 271 taxa by 8 taxonomic ranks ]

ps_hetero = ps_normalize_median(ps_hetero, "eukaryotes heterotrophs")


The median number of reads used for normalization of eukaryotes heterotrophs is  983
Phyloseq eukaryotes heterotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 412 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 412 taxa by 8 taxonomic ranks ]

ps_prok = ps_normalize_median(ps_prok, "prokaryotes")


The median number of reads used for normalization of prokaryotes is  50346
Phyloseq prokaryotes
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1888 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 1888 taxa by 8 taxonomic ranks ]

6.4 Phyloseq files for abundant taxa

Remove taxa that are < 0.10 (euks) and <0.05 (proks) in any given sample
Normalize again…

ps_abundant <- function(ps, contrib_min = 0.1, title) {
    total_per_sample <- max(sample_sums(ps))
    ps <- filter_taxa(ps, function(x) sum(x > total_per_sample * contrib_min) > 
        0, TRUE)
    ps <- ps_normalize_median(ps, title)
}

cat("Remove taxa in low abundance \n\n")

Remove taxa in low abundance

ps_all_abundant = ps_abundant(ps_all, contrib_min = 0.05, "All")


The median number of reads used for normalization of All is  24049
Phyloseq All
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 45 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 45 taxa by 8 taxonomic ranks ]

ps_euk_abundant = ps_abundant(ps_euk, contrib_min = 0.1, "eukaryotes (auto+hetero)")


The median number of reads used for normalization of eukaryotes (auto+hetero) is  3359
Phyloseq eukaryotes (auto+hetero)
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 61 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 61 taxa by 8 taxonomic ranks ]

ps_photo_abundant = ps_abundant(ps_photo, contrib_min = 0.1, "eukaryotes autotrophs")


The median number of reads used for normalization of eukaryotes autotrophs is  2767
Phyloseq eukaryotes autotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 60 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 60 taxa by 8 taxonomic ranks ]

ps_hetero_abundant = ps_abundant(ps_hetero, contrib_min = 0.1, "eukaryotes heterotrophs")


The median number of reads used for normalization of eukaryotes heterotrophs is  688
Phyloseq eukaryotes heterotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 88 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 88 taxa by 8 taxonomic ranks ]

ps_prok_abundant = ps_abundant(ps_prok, contrib_min = 0.05, "prokaryotes")


The median number of reads used for normalization of prokaryotes is  23865
Phyloseq prokaryotes
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 44 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 44 taxa by 8 taxonomic ranks ]

6.5 Create a list for the auto and hetero phyloseq files

ps_list <- list(ps = c(ps_prok, ps_euk, ps_photo), title = c("Prokaryotes - all OTUs", 
    "Eukaryotes - Auto and Hetero - all OTUs", "Eukaryotes - Autotrophs - all OTUs"))
ps_list_abundant <- list(ps = c(ps_prok_abundant, ps_euk_abundant, ps_photo_abundant), 
    title = c("Prokaryotes - abundant OTUs (> 5%)", "Eukaryotes - Auto + Hetero - abundant OTUs (> 10%)", 
        "Eukaryotes - Autotrophs - abundant OTUs (> 10%)"))

6.6 Create tabular files for other plots

ps_to_long <- function(ps) {
    otu_df <- data.frame(otu_table(ps)) %>% rownames_to_column(var = "otu_id")
    taxo_df <- data.frame(tax_table(ps)) %>% rownames_to_column(var = "otu_id")
    otu_df <- left_join(taxo_df, otu_df)
    otu_df <- gather(otu_df, "sample", "n_seq", contains("X"))  # All samples contain X
    metadata_df <- data.frame(sample_data(ps)) %>% rownames_to_column(var = "sample")
    otu_df <- left_join(otu_df, metadata_df)
}

long_all <- ps_to_long(ps_all)
long_euk <- ps_to_long(ps_euk)
long_photo <- ps_to_long(ps_photo)
long_euk_abundant <- ps_to_long(ps_euk_abundant)
long_photo_abundant <- ps_to_long(ps_photo_abundant)
long_prok <- ps_to_long(ps_prok)

6.7 Treemaps at division and class levels

6.7.1 Define function to draw treemaps

treemap_gg_dv <- function(df, group1, group2, title) {
    group1 <- enquo(group1)
    group2 <- enquo(group2)
    df <- df %>% group_by(!!group1, !!group2) %>% summarise(n_seq = sum(n_seq))
    g_treemap <- ggplot(df, aes(area = n_seq, fill = !!group2, label = !!group2, 
        subgroup = !!group1)) + ggtitle(title) + treemapify::geom_treemap() + 
        treemapify::geom_treemap_subgroup_border() + treemapify::geom_treemap_text(colour = "black", 
        place = "topleft", reflow = T, padding.x = grid::unit(3, "mm"), padding.y = grid::unit(3, 
            "mm")) + treemapify::geom_treemap_subgroup_text(place = "centre", 
        grow = T, alpha = 0.5, colour = "white", fontface = "italic", min.size = 0) + 
        scale_fill_viridis(discrete = TRUE) + theme(legend.position = "none", 
        plot.title = element_text(size = 16, face = "bold"))
    print(g_treemap)
    return(g_treemap)
}

6.7.2 Do the individual treemaps

array_treemap_all <- list()
array_treemap_by_kingdom <- list()

for (one_strait in c("Singapore", "Johor")) {
    
    label <- str_c(one_strait, "-all")
    
    array_treemap_all[[label]] <- treemap_gg_dv(filter(long_all, strait == one_strait), 
        kingdom, supergroup, str_c("", one_strait))
    
    label <- str_c(one_strait, "-arch")
    array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_prok, strait == 
        one_strait & kingdom == "Archaea"), division, class, str_c("A - Archaea - ", 
        one_strait))
    
    label <- str_c(one_strait, "-bact")
    array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_prok, strait == 
        one_strait & kingdom == "Bacteria"), division, class, str_c("B - Bacteria - ", 
        one_strait))
    
    
    label <- str_c(one_strait, "-euks")
    array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_euk, strait == 
        one_strait), division, class, str_c("C - All Eulkaryotes - ", one_strait))
    
    
    # label <- str_c(one_strait,'-photeuks') array_treemap[[label]] <-
    # treemap_gg_dv(filter(long_photo, strait == one_strait), division, class,
    # str_c('D - Photosynthetic Eukaryotes - ', one_strait ))
    
    
    # treemap_dv(filter(long_euk, strait == one_strait), c('division',
    # 'class'),'n_seq', str_c('All euks - ', one_strait ))
    # treemap_dv(filter(long_photo, strait == one_strait), c('division',
    # 'class'),'n_seq',str_c('Photo euks', one_strait ))
    # treemap_dv(filter(long_prok, strait == one_strait & kingdom=='Bacteria'),
    # c('division', 'class'),'n_seq',str_c('Bacteria - ', one_strait ))
    # treemap_dv(filter(long_prok, strait == one_strait & kingdom=='Archaea'),
    # c('division', 'class'),'n_seq',str_c('Archaea - ', one_strait ))
}

6.7.3 FIG - Arrange the different treemaps in a grid and save

# Main Fig, all groups
fig_grid_treemap_all <- gridExtra::grid.arrange(grobs = array_treemap_all, ncol = 2, 
    nrow = 1, clip = FALSE, padding = unit(0, "line"), as.table = FALSE)

fig_grid_treemap_all

TableGrob (1 x 2) "arrange": 2 grobs
              z     cells    name           grob
Singapore-all 1 (1-1,1-1) arrange gtable[layout]
Johor-all     2 (1-1,2-2) arrange gtable[layout]

ggsave("../fig/fig_treemap_all.pdf", fig_grid_treemap_all, height = 5, width = 10, 
    dpi = 300)


# Supp Fig, by kingdom

fig_grid_treemap_by_kingdom <- gridExtra::grid.arrange(grobs = array_treemap_by_kingdom, 
    ncol = 2, nrow = 3, clip = FALSE, padding = unit(0, "line"), as.table = FALSE)

fig_grid_treemap_by_kingdom

TableGrob (3 x 2) "arrange": 6 grobs
               z     cells    name           grob
Singapore-arch 1 (1-1,1-1) arrange gtable[layout]
Singapore-bact 2 (2-2,1-1) arrange gtable[layout]
Singapore-euks 3 (3-3,1-1) arrange gtable[layout]
Johor-arch     4 (1-1,2-2) arrange gtable[layout]
Johor-bact     5 (2-2,2-2) arrange gtable[layout]
Johor-euks     6 (3-3,2-2) arrange gtable[layout]

ggsave("../fig/fig_treemap_separate.pdf", fig_grid_treemap_by_kingdom, height = 15, 
    width = 10, dpi = 300)

6.8 Most abundant taxa

6.8.1 Most abundant eukaryote species

df <- filter(long_euk, strait %in% c("Johor", "Singapore")) %>% mutate(species_label = str_c(class, 
    species, sep = "-")) %>% group_by(division, class, species, species_label, 
    strait) %>% summarize(n_seq = sum(n_seq)) %>% arrange(desc(n_seq)) %>% filter(n_seq > 
    0) %>% ungroup()

df_strait <- df %>% select(species, strait) %>% group_by(species) %>% summarise(n_strait = n()) %>% 
    ungroup()

df <- df %>% left_join(df_strait, by = c(species = "species")) %>% mutate(species = case_when(n_strait == 
    1 ~ str_c(species, " *"), TRUE ~ species))

array_euk <- list()


for (one_strait in c("Singapore", "Johor")) {
    
    df_one <- filter(df, strait == one_strait)
    
    array_euk[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(species, 
        n_seq), y = n_seq, fill = division)) + coord_flip() + xlab("") + ylab("Number of reads") + 
        scale_fill_manual(values = division_euk_colors) + ggtitle(one_strait) + 
        theme_bw() + theme(plot.title = element_text(size = 22, hjust = 0.5)) + 
        # theme(axis.text=element_text(size=14), legend.text =
    # element_text(size=16)) +
    theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top", 
        ncol = 4, nrow = 2, byrow = FALSE))
    
    print(array_euk[[one_strait]])
    
}

6.8.2 Most abundant prokaryotes taxo6 (~ genus)

df <- filter(long_prok, strait %in% c("Johor", "Singapore")) %>% mutate(genus_label = str_c(division, 
    family, sep = "-")) %>% group_by(supergroup, division, class, family, genus_label, 
    strait) %>% summarize(n_seq = sum(n_seq)) %>% arrange(desc(n_seq)) %>% filter(n_seq > 
    0) %>% ungroup()

df_strait <- df %>% select(genus_label, strait) %>% group_by(genus_label) %>% 
    summarise(n_strait = n()) %>% ungroup()

df <- df %>% left_join(df_strait) %>% mutate(genus_label = case_when(n_strait == 
    1 ~ str_c(genus_label, " *"), TRUE ~ genus_label))


array_prok <- list()

for (one_strait in c("Singapore", "Johor")) {
    
    df_one <- filter(df, strait == one_strait)
    
    array_prok[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(genus_label, 
        n_seq), y = n_seq, fill = supergroup)) + coord_flip() + xlab("") + ylab("Number of reads") + 
        scale_fill_manual(values = supergroup_colors) + ggtitle(one_strait) + 
        theme_bw() + theme(plot.title = element_text(size = 22, hjust = 0.5)) + 
        # theme(axis.text=element_text(size=14), legend.text =
    # element_text(size=16)) +
    theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top", 
        ncol = 4, nrow = 2, byrow = FALSE))
    
    print(array_prok[[one_strait]])
    
}

6.8.3 Most abundant taxa - Save as Grid

   grid_fig <-  cowplot::plot_grid(array_prok[["Singapore"]],array_prok[["Johor"]], 
                                   array_euk[["Singapore"]],array_euk[["Johor"]], align="hv",
                                   # labels=c("A","B", "C", "D"),
                                   ncol=2, nrow=2, label_size = 20)


   # ggsave("../fig/fig_top_all.png", grid_fig, 
   #        height = 20, width = 20, dpi = 300)

6.9 Most abundant OTUs

6.9.1 Compute and plot separately Euk and Prok

df <- filter(long_all, strait %in% c("Johor", "Singapore")) %>% mutate(otu_label = case_when(kingdom == 
    "Eukaryota" ~ str_c(otu_id, species, sep = "-"), TRUE ~ str_c(otu_id, family, 
    sep = "-"))) %>% group_by(kingdom, supergroup, division, class, genus, species, 
    otu_id, otu_label, strait) %>% summarize(n_seq = sum(n_seq)) %>% arrange(desc(n_seq)) %>% 
    filter(n_seq > 0) %>% ungroup()

df_strait <- df %>% select(otu_label, strait) %>% group_by(otu_label) %>% summarise(n_strait = n()) %>% 
    ungroup()

df <- df %>% left_join(df_strait) %>% mutate(otu_label = case_when(n_strait == 
    1 ~ str_c(otu_label, " +"), TRUE ~ otu_label))

otu_strait <- df

array_euk <- list()
array_prok <- list()

# Do plots

for (one_strait in c("Singapore", "Johor")) {
    
    # For euks
    
    df_one <- filter(df, strait == one_strait & kingdom == "Eukaryota")
    
    array_euk[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(otu_label, 
        n_seq), y = n_seq, fill = division)) + coord_flip() + theme_bw() + theme(axis.text.x = element_text(size = 16, 
        angle = 0, hjust = 1, vjust = 1)) + theme(axis.text.y = element_text(size = 16, 
        angle = 0, hjust = 0, vjust = 0)) + theme(legend.title = element_text(size = 24)) + 
        theme(legend.text = element_text(size = 16)) + xlab("") + ylab("Number of reads") + 
        scale_fill_manual(values = division_euk_colors) + ggtitle(one_strait) + 
        theme(plot.title = element_text(size = 22, hjust = 0.5)) + # theme(axis.text=element_text(size=14), legend.text =
    # element_text(size=16)) +
    theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top", 
        ncol = 2, nrow = 4, byrow = FALSE))
    
    print(array_euk[[one_strait]])
    
    # For proks
    
    
    df_one <- filter(df, strait == one_strait & kingdom != "Eukaryota")
    
    array_prok[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(otu_label, 
        n_seq), y = n_seq, fill = supergroup)) + coord_flip() + theme_bw() + 
        theme(axis.text.x = element_text(size = 16, angle = 0, hjust = 1, vjust = 1)) + 
        theme(axis.text.y = element_text(size = 16, angle = 0, hjust = 0, vjust = 0)) + 
        theme(legend.title = element_text(size = 24)) + theme(legend.text = element_text(size = 16)) + 
        xlab("") + ylab("Number of reads") + scale_fill_manual(values = supergroup_colors) + 
        ggtitle(one_strait) + theme(plot.title = element_text(size = 22, hjust = 0.5)) + 
        # theme(axis.text=element_text(size=14), legend.text =
    # element_text(size=16)) +
    theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top", 
        ncol = 2, nrow = 4, byrow = FALSE))
    
    print(array_prok[[one_strait]])
    
}

6.9.2 FIG Most abundant ASVs - Save as Grid

   grid_fig_ASV <-  cowplot::plot_grid(array_prok[["Singapore"]],array_prok[["Johor"]], 
                                   array_euk[["Singapore"]],array_euk[["Johor"]], align="hv",
                                   # labels=c("A","B", "C", "D"),
                                   ncol=2, nrow=2, label_size = 20)


   ggsave("../fig/fig_top_asv_all.png", grid_fig_ASV, 
          height = 20, width = 20, dpi = 300)

6.10 Bar plots

6.10.1 Bar plot 20 most abundant divisions

RM samples not considered

division_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom, 
    supergroup, division) %>% summarise(n_seq_division = sum(n_seq)) %>% arrange(-n_seq_division)



long_division_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% 
    mutate(division = case_when(division %in% division_top$division[1:20] ~ 
        division, TRUE ~ "Other")) %>% group_by(strait, sample_label, division) %>% 
    summarise(n_seq_division = sum(n_seq))

seq_stations <- long_division_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_division))

n_seq_station <- seq_stations$n_seq_station[1]

# Set up the colors

colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD", 
    "#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711", 
    "#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788", 
    "#999099", "#AA4455")

colours_name <- t(lapply(colours, plotrix::color.id))

# Read the colors from Excel file

top_division_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "D1:F22")

top_division_colors <- structure(top_division_colors.df$color_name, .Names = top_division_colors.df$division)


ggplot(long_division_top, aes(x = sample_label, y = n_seq_division/n_seq_station, 
    fill = factor(division, levels = top_division_colors.df$division))) + geom_col() + 
    theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) + 
    # scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_division_colors) + scale_y_continuous(labels = scales::percent, 
    expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") + 
    guides(fill = guide_legend(title = "Division", title.position = "top", ncol = 5, 
        nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait, 
    scales = "free_x", space = "free_x")

6.10.2 FIG - Bar plot 20 most abundant classes

RM samples not considered

class_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom, 
    supergroup, division, class) %>% summarise(n_seq_class = sum(n_seq)) %>% 
    arrange(-n_seq_class)



long_class_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% 
    mutate(class = case_when(class %in% class_top$class[1:20] ~ class, TRUE ~ 
        "Other")) %>% group_by(strait, sample_label, class) %>% summarise(n_seq_class = sum(n_seq))

seq_stations <- long_class_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_class))

n_seq_station <- seq_stations$n_seq_station[1]

# Set up the colors

colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD", 
    "#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711", 
    "#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788", 
    "#999099", "#AA4455")

colours_name <- t(lapply(colours, plotrix::color.id))


# RColorBrewer::display.brewer.pal(n=3, name='Reds')
colours <- RColorBrewer::brewer.pal(n = 3, name = "Reds")
colours_name_archeae <- data.frame(purrr::flatten_chr(purrr::map(colours, plotrix::color.id)))

# RColorBrewer::display.brewer.pal(n=9, name='Blues')
colours <- RColorBrewer::brewer.pal(n = 9, name = "Blues")
colours_name_bacteria_1 <- data.frame(purrr::flatten_chr(purrr::map(colours, 
    plotrix::color.id)))

# RColorBrewer::display.brewer.pal(n=9, name='Purples')
colours <- RColorBrewer::brewer.pal(n = 9, name = "Purples")
colours_name_bacteria_2 <- data.frame(purrr::flatten_chr(purrr::map(colours, 
    plotrix::color.id)))

# RColorBrewer::display.brewer.pal(n=4, name='Greens')
colours <- RColorBrewer::brewer.pal(n = 4, name = "Greens")
colours_name_euks <- data.frame(purrr::flatten_chr(purrr::map(colours, plotrix::color.id)))

colours_name <- bind_rows(colours_name_archeae, colours_name_bacteria_1, colours_name_bacteria_2, 
    colours_name_euks)
writeClipboard(colours_name[, 1])


# Read the colors from Excel file

top_class_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "L1:N22")

top_class_colors <- structure(top_class_colors.df$color_name, .Names = top_class_colors.df$class)


g <- ggplot(long_class_top, aes(x = sample_label, y = n_seq_class/n_seq_station, 
    fill = factor(class, levels = top_class_colors.df$class))) + geom_col() + 
    theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) + 
    # scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_class_colors) + scale_y_continuous(labels = scales::percent, 
    expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") + 
    guides(fill = guide_legend(title = "Class", title.position = "top", ncol = 5, 
        nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait, 
    scales = "free_x", space = "free_x")

print(g)

ggsave("../fig/Suppl_Taxa_all_barplot.pdf", g, height = 9, width = 9, dpi = 300)

6.10.3 Bar plot 20 most abundant orders

RM samples not considered

order_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom, 
    supergroup, division, class, order) %>% summarise(n_seq_order = sum(n_seq)) %>% 
    arrange(-n_seq_order)



long_order_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% 
    mutate(order = case_when(order %in% order_top$order[1:20] ~ order, TRUE ~ 
        "Other")) %>% group_by(strait, sample_label, order) %>% summarise(n_seq_order = sum(n_seq))

seq_stations <- long_order_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_order))

n_seq_station <- seq_stations$n_seq_station[1]

# Set up the colors

colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD", 
    "#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711", 
    "#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788", 
    "#999099", "#AA4455")

colours_name <- t(lapply(colours, plotrix::color.id))

# Read the colors from Excel file

top_order_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "U1:W22")

top_order_colors <- structure(top_order_colors.df$color_name, .Names = top_order_colors.df$order)


ggplot(long_order_top, aes(x = sample_label, y = n_seq_order/n_seq_station, 
    fill = factor(order, levels = top_order_colors.df$order))) + geom_col() + 
    theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) + 
    # scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_order_colors) + scale_y_continuous(labels = scales::percent, 
    expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") + 
    guides(fill = guide_legend(title = "Order", title.position = "top", ncol = 5, 
        nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait, 
    scales = "free_x", space = "free_x")

6.10.4 Bar plot of divisions per station

Note: some stations are completely missing heterotrophs (Only Opistokonta)

for (i in 1:3) {
    p <- plot_bar(ps_list$ps[[i]], x = "sample_label", fill = "division") + 
        geom_bar(aes(color = division, fill = division), stat = "identity", 
            position = "stack") + ggtitle(str_c("Division level - ", ps_list$title[[i]])) + 
        theme(axis.text.y = element_text(size = 10)) + theme(axis.text.x = element_text(angle = 0, 
        hjust = 0.5)) + coord_flip() + scale_fill_viridis(discrete = TRUE) + 
        scale_color_viridis(discrete = TRUE)
    print(p)
}

6.10.5 Bar plot of class per station

Only consider the abundant taxa

for (i in 1:3) {
    p <- plot_bar(ps_list_abundant$ps[[i]], x = "sample_label", fill = "class") + 
        geom_bar(aes(color = class, fill = class), stat = "identity", position = "stack") + 
        ggtitle(str_c("Class level - ", ps_list_abundant$title[[i]])) + theme(axis.text.y = element_text(size = 10)) + 
        theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) + coord_flip() + 
        scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
    print(p)
}

6.11 Synthetic comparisons

6.11.1 Compare by Straight, Site, Moonsoon (abundant OTUs only)

for (factor in c("strait", "location", "monsoon")) {
    for (i in 1:3) {
        ps_aggregate <- merge_samples(ps_list_abundant$ps[[i]], factor)
        ps_aggregate <- transform_sample_counts(ps_aggregate, function(x) 100 * 
            (x/sum(x)))
        p <- plot_bar(ps_aggregate, fill = "division") + geom_col(aes(color = division, 
            fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]], " - ", 
            factor)) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
            hjust = 1)) + ylab("%") + scale_fill_viridis(discrete = TRUE) + 
            scale_color_viridis(discrete = TRUE)
        print(p)
    }
}

6.11.2 Compare by Moonsoon in Singapore straight (abundant OTUs)

for (i in 1:3) {
    ps_aggregate <- subset_samples(ps_list_abundant$ps[[i]], strait == "Singapore")
    ps_aggregate <- merge_samples(ps_aggregate, "monsoon")
    ps_aggregate <- transform_sample_counts(ps_aggregate, function(x) 100 * 
        (x/sum(x)))
    p <- plot_bar(ps_aggregate, fill = "division") + geom_col(aes(color = division, 
        fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]], " - ", 
        factor)) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
        hjust = 1)) + ylab("%") + scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
    print(p)
}

6.12 Time series

6.12.1 Time series (abundant OTUs) aggregated for Singapore strait only

for (factor in c("date")) {
    for (i in 1:3) {
        ps_aggregate <- subset_samples(ps_list_abundant$ps[[i]], strait == "Singapore")
        ps_aggregate <- merge_samples(ps_aggregate, "date")
        ps_aggregate <- transform_sample_counts(ps_aggregate, function(x) 100 * 
            (x/sum(x)))
        p <- plot_bar(ps_aggregate, fill = "division") + geom_col(aes(color = division, 
            fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]], " - ", 
            factor)) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
            hjust = 1)) + ylab("%") + scale_fill_viridis(discrete = TRUE) + 
            scale_color_viridis(discrete = TRUE)
        print(p)
    }
}

6.12.2 Time series (abundant OTUs) - Division level

for (i in 1:3) {
    ps_plot <- ps_list_abundant$ps[[i]]
    p <- plot_bar(ps_plot, x = "date", fill = "division") + facet_grid(rows = vars(location)) + 
        geom_col(aes(color = division, fill = division)) + ggtitle(str_c(ps_list_abundant$title[[i]], 
        " - Date")) + theme(axis.text.x = element_text(angle = 45, vjust = 0.5, 
        hjust = 1)) + ylab("%") + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) + 
        geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", "2018-01-01", 
            "2019-01-01")))) + scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
    print(p)
}

6.12.3 Time series (abundant OTUs) - Genus level for Chlorophyta

for (i in 3:3) {
    ps_plot <- subset_taxa(ps_list_abundant$ps[[i]], division == "Chlorophyta")
    p <- plot_bar(ps_plot, x = "date", fill = "genus") + facet_grid(rows = vars(location)) + 
        geom_col(aes(color = genus, fill = genus)) + ggtitle(str_c(ps_list_abundant$title[[i]], 
        " - Date")) + theme(axis.text.x = element_text(angle = 45, vjust = 0.5, 
        hjust = 1)) + ylab("%") + xlim(as.POSIXct(as.Date(c("2017-01-01", "2019-01-01")))) + 
        geom_vline(xintercept = as.POSIXct(as.Date(c("2017-01-01", "2018-01-01", 
            "2019-01-01")))) + scale_fill_viridis(discrete = TRUE) + scale_color_viridis(discrete = TRUE)
    print(p)
}

6.13 Main species for each division (Eukaryotes - Autrotrophs)

p <- list()
for (one_division in c("Chlorophyta", "Dinoflagellata", "Ochrophyta", "Cryptophyta", 
    "Haptophyta")) {
    ps_subset <- subset_samples(ps_photo_abundant, strait != "Raffles Mari")
    ps_subset <- subset_taxa(ps_subset, division %in% one_division)
    p[[one_division]] <- plot_bar(ps_subset, x = "species") + facet_grid(rows = vars(strait), 
        cols = vars(monsoon)) + geom_col() + theme(axis.text.x = element_text(angle = 90, 
        vjust = 0.5, hjust = 1)) + ggtitle(str_c(one_division, " - Abundant OTUs")) + 
        coord_flip()
    print(p[[one_division]])
}

# grid_array <- list(p[['Chlorophyta']],p[['Ochrophyta']]) grid_layout <-
# rbind(c(NA,1,1,1,1,1), c( 2,2,2,2,2,2)) grid_fig <-
# gridExtra::grid.arrange(grobs=grid_array, layout_matrix=grid_layout,
# clip=FALSE, padding = unit(0, 'line'), as.table = FALSE)
grid_fig <- cowplot::plot_grid(p[["Chlorophyta"]], p[["Ochrophyta"]], labels = c("A", 
    "B"), align = "v", nrow = 2)
# ggsave('../fig/Fig_Species_Moonsoon.png', grid_fig, height = 10, width =
# 8, dpi = 300)

6.14 Heatmaps

6.14.1 Abundant OTUs

Data are agglomarated at the genus level. Use function tax_glom

for (i in c(1)) {
    ps_heat <- tax_glom(ps_list_abundant$ps[[i]], taxrank = "family")
    p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "family", 
        taxa.order = "division", sample.label = "sample_label", sample.order = "sample_label", 
        low = "beige", high = "red", na.value = "beige", title = ps_list_abundant$title[[i]])
    print(p)
}

for (i in 2:3) {
    
    ps_heat <- tax_glom(ps_list_abundant$ps[[i]], taxrank = "genus")
    p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "genus", 
        taxa.order = "division", sample.label = "sample_label", sample.order = "sample_label", 
        low = "beige", high = "red", na.value = "beige", title = ps_list_abundant$title[[i]])
    print(p)
}

6.14.2 Chlorophyta at species level

All ASVs considered (not only abundant)

ps_heat <- subset_taxa(ps_photo, division == "Chlorophyta")

ps_heat <- tax_glom(ps_heat, taxrank = "species")

p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "species", 
    taxa.order = "species", sample.label = "sample_label", sample.order = "sample_label", 
    low = "beige", high = "red", na.value = "beige", trans = scales::log10_trans(), 
    title = "Mamiellophyceae in Singapore")
print(p)

6.14.3 Mamiello (Only Bathy, Ostreo and Micromonas) at genus level

All ASVs considered (not only abundant)

ps_heat <- subset_taxa(ps_photo, genus %in% c("Ostreococcus", "Bathycoccus", 
    "Micromonas"))

ps_heat <- tax_glom(ps_heat, taxrank = "genus")

p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "genus", 
    taxa.order = "genus", sample.label = "sample_label", sample.order = "sample_label", 
    low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10), 
    title = "Mamiellophyceae in Singapore")
print(p)

6.14.4 Mamiello (Only Bathy, Ostreo and Micromonas) at species level

All ASVs considered (not only abundant)

ps_heat <- subset_taxa(ps_photo, genus %in% c("Ostreococcus", "Bathycoccus", 
    "Micromonas"))

ps_heat <- tax_glom(ps_heat, taxrank = "species")

p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "species", 
    taxa.order = "species", sample.label = "sample_label", sample.order = "sample_label", 
    low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10), 
    title = "Mamiellophyceae in Singapore")
print(p)

6.14.5 Syndiniales at order level

All ASVs considered (not only abundant)

ps_heat <- subset_taxa(ps_euk, class %in% c("Syndiniales"))
ps_heat <- subset_samples(ps_heat, !is.na(Salinity))

ps_heat <- tax_glom(ps_heat, taxrank = "order")

p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "order", 
    taxa.order = "order", sample.label = "Salinity", sample.order = "Salinity", 
    low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10), 
    title = "Syndiniales in Singapore") + facet_grid(~location, scales = "free_x")
print(p)

p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "order", 
    taxa.order = "order", sample.label = "Temperature", sample.order = "Temperature", 
    low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10), 
    title = "Syndiniales in Singapore") + facet_grid(~location, scales = "free_x")
print(p)

ps_heat <- subset_taxa(ps_euk, class %in% c("Syndiniales"))
ps_heat <- subset_samples(ps_heat, !is.na(Salinity))

ps_heat <- tax_glom(ps_heat, taxrank = "family")

p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", taxa.label = "family", 
    taxa.order = "family", sample.label = "Salinity", sample.order = "Salinity", 
    low = "beige", high = "red", na.value = "beige", trans = scales::log_trans(10), 
    title = "Syndiniales in Singapore") + facet_grid(~location, scales = "free_x")
print(p)

6.14.6 FIG - Most abundant OTUs in Johor vs Singapre and Prokaryotes vs Eukaryotes

heat_prok <- list()
heat_euk <- list()

for (strait_one in c("Singapore", "Johor")) {
     
for (kingdom_one in list(c("Bacteria", "Archaea"), "Eukaryota")){

  otu_strait_one <- otu_strait %>% 
    filter(strait == strait_one & kingdom %in% kingdom_one) %>% 
    top_n(30, n_seq)

  ps_heat <- ps_all %>% 
    subset_samples(strait==strait_one) %>%
    subset_taxa(kingdom %in% kingdom_one)
  
  otu_table_keep <- data.frame(otu_table(ps_heat)) %>% 
    rownames_to_column("otu_id") %>% 
    filter(otu_id %in% otu_strait_one$otu_id) %>% 
    left_join(select(otu_strait_one, otu_id)) %>% 
    column_to_rownames("otu_id")
  
  tax_table_keep <- data.frame(tax_table(ps_heat)) %>% 
    rownames_to_column("otu_id") %>% 
    filter(otu_id %in% otu_strait_one$otu_id) %>% 
    left_join(select(otu_strait_one, otu_id, otu_label)) %>% 
    column_to_rownames("otu_id")

  OTU = otu_table(as.matrix(otu_table_keep), taxa_are_rows = TRUE)
  TAX = tax_table(as.matrix(tax_table_keep))
  samples = sample_data(ps_heat)
  
  ps_heat <- phyloseq(OTU, TAX, samples)    

  print(ps_heat)

  p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", 
               taxa.label = "otu_label",  taxa.order="supergroup",
               sample.label = "sample_label", sample.order="sample_label",
               low="beige", high="red", na.value="beige",trans = scales::log10_trans(), 
               title=strait_one) +
    theme(axis.text.y = element_text(size = 16, angle = 0, hjust = 0, vjust = 0.5)) +
    theme(axis.text.x = element_text(size = 16)) +
    theme(axis.title = element_text(size = 0)) +
    theme(plot.title = element_text(size=22, hjust = 0.5)) +
    xlab("Sample") + ylab("")

  print(p)
  if (kingdom_one=="Eukaryota") heat_euk[[strait_one]] <- p 
  else heat_prok[[strait_one]] <- p

}}

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 30 samples ]
sample_data() Sample Data:       [ 30 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 30 samples ]
sample_data() Sample Data:       [ 30 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 40 samples ]
sample_data() Sample Data:       [ 40 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 40 samples ]
sample_data() Sample Data:       [ 40 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

   grid_fig_heat <-  cowplot::plot_grid(heat_prok[["Singapore"]],heat_prok[["Johor"]], align="hv",
                                   # labels=c("A","B"),
                                   ncol=1, nrow=2, label_size = 20)


   ggsave("../fig/fig_heat_prok.pdf", grid_fig_heat, 
          height = 20, width = 15, dpi = 300)
   

   grid_fig_heat <-  cowplot::plot_grid(heat_euk[["Singapore"]],heat_euk[["Johor"]], align="hv",
                                   # labels=c("A","B"),
                                   ncol=1, nrow=2, label_size = 20)


   ggsave("../fig/fig_heat_euk.pdf", grid_fig_heat, 
          height = 20, width = 15, dpi = 300)

6.15 NMDS

Sample removed because they were pulling the NMDS * PR2X16XS21 it has a single eukaryote (diatom bloom ) * RM13XS36 cause problem for bacteria * PR11XS25 cause problem for hetero euks * SBW11XS26 cause problem for hetero euks * SBW13XS37 cause problem for hetero euks * RM13XS36 cause problem for hetero euks

6.15.1 Define function

See comments inside functions (saved plots are different from displayed plots)

ps_do_nmds <- function(ps_list) {
    
    plot_array <- list()
    
    for (i in 1:3) {
        ps_nmds <- ps_list$ps[[i]]
        
        # Remove samples with no reads
        ps_nmds <- prune_samples(sample_sums(ps_nmds) > 0, ps_nmds)
        
        # Remove samples from Raffles and Johor
        ps_nmds <- subset_samples(ps_nmds, !(str_detect(strait, "Raffles Mari|Johor")))
        
        # Remove samples that caused problems (1= prok, 2=euk, 3=euk auto)
        if (i == 1) 
            {
                ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("RM13XS36")), 
                  ps_nmds)
            }  # Prokaryotes
        if (i %in% c(2, 3)) 
            {
                ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR2X16SXS21")), 
                  ps_nmds)
            }  # Eukaryotes
        if (i == 4) 
            {
                ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR11XS25", 
                  "SBW11XS26", "SBW13XS37")), ps_nmds)
            }  # Heterotrophs not used
        
        singa.ord <- ordinate(ps_nmds, "NMDS", "bray")
        
        # Fit environmental parameters
        env_var <- sample_variables(ps_nmds)
        env_matrix <- get_variable(ps_nmds, c("Chl", "Temperature", "Salinity", 
            "Phosphate", "Silicate", "DIN", "BAC"))
        env_fit <- vegan::envfit(singa.ord, env = env_matrix, perm = 999, na.rm = TRUE)
        env_arrows <- data.frame(env_fit$vectors$arrows * sqrt(env_fit$vectors$r)) %>% 
            rownames_to_column(var = "parameter")
        
        
        nmds_samples <- data.frame(singa.ord[["points"]], get_variable(ps_nmds, 
            c("monsoon", "strait", "location", "location_label"))) %>% rownames_to_column(var = "sample")
        
        # Factor to move the labels
        nudge_x <- max(nmds_samples$MDS1) * 0.08
        nudge_y <- max(nmds_samples$MDS2) * 0.08
        xy_max = max(c(nmds_samples$MDS1, nmds_samples$MDS2)) * 1.5
        xy_min = min(c(nmds_samples$MDS1, nmds_samples$MDS2)) * 1.5
        factor <- 3  # for vectors for euks
        if (i == 1) 
            {
                factor <- 1.5
            }  # for vectors for proks
        print(factor)
        
        p <- plot_ordination(ps_nmds, singa.ord, type = "samples", color = "monsoon", 
            shape = "strait", title = ps_list$title[[i]]) + geom_point(aes(shape = strait, 
            color = monsoon), size = 3.5) + scale_color_viridis(discrete = TRUE) + 
            scale_shape_manual(values = c(15, 16)) + geom_text(aes(label = location_label, 
            color = monsoon), nudge_x = nudge_x, nudge_y = nudge_y, check_overlap = TRUE, 
            size = 2) + theme_bw() + geom_segment(data = env_arrows, aes(x = 0, 
            xend = NMDS1 * factor, y = 0, yend = NMDS2 * factor), inherit.aes = FALSE, 
            arrow = arrow(length = unit(0.5, "cm")), colour = "black") + geom_text(data = env_arrows, 
            aes(x = NMDS1 * factor, y = NMDS2 * factor, label = parameter), 
            inherit.aes = FALSE, hjust = -0.2, vjust = -0.2, size = 3)
        print(singa.ord)
        plot_array[[i]] <- p
        print(p)
        
        # The following lines can be used if you want to avoid using the pjhyloseq
        # functions to plot the data.  Notes : - must use inherit.aes = FALSE to add
        # some extra layers - the saved plots have a different scale for the added
        # layer than the displaued plot can figure out ggplot()+ coord_fixed() +
        # xlim(xy_min, xy_max) + ylim(xy_min, xy_max) +
        # geom_point(data=nmds_samples, aes(x=MDS1, y=MDS2, shape=strait,
        # color=monsoon), size=5) + geom_text(data=nmds_samples, aes(x=MDS1, y=MDS2,
        # label=location_label, color=monsoon), nudge_x=nudge_x, nudge_y=nudge_y,
        # check_overlap = FALSE, size=2) + ggtitle(ps_list$title[[i]]) +
        
        p <- plot_ordination(ps_nmds, singa.ord, type = "taxa", color = "division", 
            title = ps_list$title[[i]]) + scale_color_viridis(discrete = TRUE) + 
            geom_point(size = 3) + theme_bw() + geom_segment(data = env_arrows, 
            aes(x = 0, xend = NMDS1 * factor, y = 0, yend = NMDS2 * factor), 
            inherit.aes = FALSE, arrow = arrow(length = unit(0.5, "cm")), colour = "black") + 
            geom_text(data = env_arrows, aes(x = NMDS1 * factor, y = NMDS2 * 
                factor, label = parameter), inherit.aes = FALSE, hjust = -0.2, 
                vjust = -0.2, size = 3)
        
        print(p)
        plot_array[[i + 3]] <- p
    }
    return(plot_array)
}

6.15.2 All OTUs

plot_array_all <- ps_do_nmds(ps_list)

Square root transformation
Wisconsin double standardization
Run 0 stress 0.12 
Run 1 stress 0.12 
... New best solution
... Procrustes: rmse 0.03  max resid 0.13 
Run 2 stress 0.12 
... New best solution
... Procrustes: rmse 0.00031  max resid 0.0014 
... Similar to previous best
Run 3 stress 0.12 
... Procrustes: rmse 3e-04  max resid 0.0014 
... Similar to previous best
Run 4 stress 0.12 
Run 5 stress 0.12 
... New best solution
... Procrustes: rmse 0.00024  max resid 0.00077 
... Similar to previous best
Run 6 stress 0.12 
Run 7 stress 0.12 
... New best solution
... Procrustes: rmse 0.00013  max resid 0.00045 
... Similar to previous best
Run 8 stress 0.12 
Run 9 stress 0.12 
... Procrustes: rmse 4.6e-05  max resid 0.00012 
... Similar to previous best
Run 10 stress 0.12 
Run 11 stress 0.12 
... Procrustes: rmse 0.00011  max resid 0.00036 
... Similar to previous best
Run 12 stress 0.15 
Run 13 stress 0.18 
Run 14 stress 0.12 
... Procrustes: rmse 0.00033  max resid 0.0012 
... Similar to previous best
Run 15 stress 0.15 
Run 16 stress 0.12 
Run 17 stress 0.12 
Run 18 stress 0.16 
Run 19 stress 0.12 
... Procrustes: rmse 4.9e-05  max resid 0.00021 
... Similar to previous best
Run 20 stress 0.12 
... Procrustes: rmse 0.00013  max resid 0.00053 
... Similar to previous best
*** Solution reached
[1] 1.5

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.12 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'

Square root transformation
Wisconsin double standardization
Run 0 stress 0.22 
Run 1 stress 0.24 
Run 2 stress 0.27 
Run 3 stress 0.27 
Run 4 stress 0.22 
... New best solution
... Procrustes: rmse 0.036  max resid 0.1 
Run 5 stress 0.22 
Run 6 stress 0.25 
Run 7 stress 0.23 
Run 8 stress 0.22 
Run 9 stress 0.22 
Run 10 stress 0.22 
Run 11 stress 0.24 
Run 12 stress 0.24 
Run 13 stress 0.25 
Run 14 stress 0.22 
Run 15 stress 0.22 
... Procrustes: rmse 0.03  max resid 0.14 
Run 16 stress 0.23 
Run 17 stress 0.22 
Run 18 stress 0.23 
Run 19 stress 0.24 
Run 20 stress 0.22 
... New best solution
... Procrustes: rmse 0.00012  max resid 0.00037 
... Similar to previous best
*** Solution reached
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.22 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'

Square root transformation
Wisconsin double standardization
Run 0 stress 0.2 
Run 1 stress 0.19 
... New best solution
... Procrustes: rmse 0.1  max resid 0.31 
Run 2 stress 0.19 
... New best solution
... Procrustes: rmse 0.031  max resid 0.11 
Run 3 stress 0.21 
Run 4 stress 0.18 
... New best solution
... Procrustes: rmse 0.081  max resid 0.27 
Run 5 stress 0.25 
Run 6 stress 0.19 
Run 7 stress 0.2 
Run 8 stress 0.21 
Run 9 stress 0.18 
... Procrustes: rmse 0.0082  max resid 0.031 
Run 10 stress 0.22 
Run 11 stress 0.18 
... New best solution
... Procrustes: rmse 0.018  max resid 0.072 
Run 12 stress 0.19 
Run 13 stress 0.22 
Run 14 stress 0.19 
Run 15 stress 0.19 
Run 16 stress 0.22 
Run 17 stress 0.18 
... Procrustes: rmse 0.018  max resid 0.071 
Run 18 stress 0.19 
Run 19 stress 0.22 
Run 20 stress 0.18 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.18 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'

6.15.3 Abundant OTUs

plot_array_abundant <- ps_do_nmds(ps_list_abundant)

Square root transformation
Wisconsin double standardization
Run 0 stress 0.13 
Run 1 stress 0.17 
Run 2 stress 0.13 
... Procrustes: rmse 0.00069  max resid 0.0031 
... Similar to previous best
Run 3 stress 0.22 
Run 4 stress 0.13 
... Procrustes: rmse 0.00047  max resid 0.0021 
... Similar to previous best
Run 5 stress 0.13 
... Procrustes: rmse 0.00049  max resid 0.0022 
... Similar to previous best
Run 6 stress 0.13 
... Procrustes: rmse 0.00019  max resid 0.00085 
... Similar to previous best
Run 7 stress 0.13 
... Procrustes: rmse 0.00026  max resid 0.0012 
... Similar to previous best
Run 8 stress 0.14 
Run 9 stress 0.13 
... Procrustes: rmse 0.00081  max resid 0.0037 
... Similar to previous best
Run 10 stress 0.13 
... Procrustes: rmse 0.00077  max resid 0.0035 
... Similar to previous best
Run 11 stress 0.13 
... Procrustes: rmse 0.0011  max resid 0.0051 
... Similar to previous best
Run 12 stress 0.13 
... Procrustes: rmse 0.00031  max resid 0.0014 
... Similar to previous best
Run 13 stress 0.13 
... Procrustes: rmse 0.00037  max resid 0.0017 
... Similar to previous best
Run 14 stress 0.18 
Run 15 stress 0.13 
... Procrustes: rmse 0.00062  max resid 0.0028 
... Similar to previous best
Run 16 stress 0.14 
Run 17 stress 0.13 
... Procrustes: rmse 0.00062  max resid 0.0028 
... Similar to previous best
Run 18 stress 0.39 
Run 19 stress 0.14 
Run 20 stress 0.13 
... Procrustes: rmse 0.00053  max resid 0.0024 
... Similar to previous best
*** Solution reached
[1] 1.5

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.13 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'

Square root transformation
Wisconsin double standardization
Run 0 stress 0.21 
Run 1 stress 0.22 
Run 2 stress 0.25 
Run 3 stress 0.21 
... Procrustes: rmse 0.018  max resid 0.084 
Run 4 stress 0.25 
Run 5 stress 0.22 
Run 6 stress 0.24 
Run 7 stress 0.21 
... Procrustes: rmse 0.00018  max resid 0.00056 
... Similar to previous best
Run 8 stress 0.24 
Run 9 stress 0.22 
Run 10 stress 0.3 
Run 11 stress 0.22 
Run 12 stress 0.26 
Run 13 stress 0.22 
Run 14 stress 0.22 
Run 15 stress 0.21 
... New best solution
... Procrustes: rmse 3.3e-05  max resid 7.9e-05 
... Similar to previous best
Run 16 stress 0.24 
Run 17 stress 0.21 
... Procrustes: rmse 0.018  max resid 0.084 
Run 18 stress 0.25 
Run 19 stress 0.21 
... New best solution
... Procrustes: rmse 0.00013  max resid 0.00044 
... Similar to previous best
Run 20 stress 0.24 
*** Solution reached
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.21 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'

Square root transformation
Wisconsin double standardization
Run 0 stress 0.21 
Run 1 stress 0.23 
Run 2 stress 0.21 
... Procrustes: rmse 0.11  max resid 0.3 
Run 3 stress 0.21 
... Procrustes: rmse 0.11  max resid 0.3 
Run 4 stress 0.21 
Run 5 stress 0.24 
Run 6 stress 0.2 
... New best solution
... Procrustes: rmse 0.061  max resid 0.27 
Run 7 stress 0.22 
Run 8 stress 0.23 
Run 9 stress 0.21 
Run 10 stress 0.2 
Run 11 stress 0.2 
... Procrustes: rmse 0.0065  max resid 0.02 
Run 12 stress 0.22 
Run 13 stress 0.22 
Run 14 stress 0.22 
Run 15 stress 0.22 
Run 16 stress 0.22 
Run 17 stress 0.24 
Run 18 stress 0.2 
Run 19 stress 0.21 
Run 20 stress 0.2 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.2 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'

### MSDS graph

grid_fig <- cowplot::plot_grid(plot_array_abundant[[1]], plot_array_abundant[[4]], 
    plot_array_abundant[[3]], plot_array_abundant[[6]], ncol = 2, labels = c("A", 
        "B", "C", "D"), align = "hv", label_size = 20)
grid_fig

# ggsave('../fig/Fig_NMDS-Singapore.png', grid_fig, height = 12, width = 18,
# dpi = 300)

6.16 Network analysis

for (i in 1:3) {
    
    ps_nmds <- ps_list_abundant$ps[[i]]
    
    # Remove samples with no reads
    ps_nmds <- prune_samples(sample_sums(ps_nmds) > 0, ps_nmds)
    
    # Remove samples that caused problems (1= prok, 2=euk, 3=euk auto)
    if (i == 1) 
        {
            ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("RM13XS36")), 
                ps_nmds)
        }  # Prokaryotes
    if (i %in% c(2, 3)) 
        {
            ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR2X16SXS21")), 
                ps_nmds)
        }  # Eukaryotes
    if (i == 4) 
        {
            ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR11XS25", 
                "SBW11XS26", "SBW13XS37")), ps_nmds)
        }  # Heterotrophs not used
    
    if (i > 1) {
        p <- plot_net(ps_nmds, distance = "(A+B-2*J)/(A+B)", type = "taxa", 
            maxdist = 0.4, color = "class", point_label = "genus") + ggtitle(ps_list_abundant$title[[i]])
    } else {
        p <- plot_net(ps_nmds, distance = "(A+B-2*J)/(A+B)", type = "taxa", 
            maxdist = 0.4, color = "class", point_label = "family") + ggtitle(ps_list_abundant$title[[i]])
    }
    
    print(p)
}

6.17 Differential expression (DESeq2)

Love, M.I., Huber, W. & Anders, S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550.

Follow:
- https://joey711.github.io/phyloseq-extensions/DESeq2.html
- https://bioconductor.org/packages/devel/bioc/vignettes/phyloseq/inst/doc/phyloseq-mixture-models.html
Start from raw counts (not normalized) : ps_all_raw

library("DESeq2")
packageVersion("DESeq2")

[1] '1.22.2'

Block 1 is need to avoid the following error > Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means

See: https://github.com/joey711/phyloseq/issues/387

ps <- ps_all_raw
ps <- subset_samples(ps, strait == "Singapore" & monsoon %in% c("NE", "SW"))

ps

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2571 taxa and 21 samples ]
sample_data() Sample Data:       [ 21 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 2571 taxa by 8 taxonomic ranks ]

diagdds = phyloseq_to_deseq2(ps, ~monsoon)

# ps <- subset_samples(ps, strait %in% c('Singapore','Johor')) diagdds =
# phyloseq_to_deseq2(ps, ~ strait)

# Block 1 Start Estimate geometric mean with zeros

gm_mean = function(x, na.rm = TRUE) {
    exp(sum(log(x[x > 0]), na.rm = na.rm)/length(x))
}
geoMeans = apply(counts(diagdds), 1, gm_mean)
diagdds = estimateSizeFactors(diagdds, geoMeans = geoMeans)

# Block 1 end

diagdds = DESeq(diagdds, fitType = "local")

res = results(diagdds, cooksCutoff = FALSE)
alpha = 0.01
sigtab = res[which(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(ps)[rownames(sigtab), 
    ], "matrix"))
head(sigtab)

         baseMean log2FoldChange lfcSE stat  pvalue    padj  kingdom
asv_0005   2423.0            4.7  1.47  3.2 1.5e-03 6.8e-03  Archaea
asv_0025     39.8            8.9  2.60  3.4 6.3e-04 3.2e-03 Bacteria
asv_0054    596.5            1.4  0.42  3.3 9.2e-04 4.4e-03 Bacteria
asv_0074    245.0            4.9  1.24  4.0 7.3e-05 4.4e-04 Bacteria
asv_0100      5.6           21.2  2.96  7.1 9.4e-13 7.6e-12 Bacteria
             supergroup            division                 class
asv_0005 Thaumarchaeota     Nitrososphaeria      Nitrosopumilales
asv_0025 Actinobacteria      Acidimicrobiia        Microtrichales
asv_0054 Proteobacteria Alphaproteobacteria           SAR11 clade
asv_0074 Proteobacteria Alphaproteobacteria           SAR11 clade
asv_0100 Proteobacteria Gammaproteobacteria Betaproteobacteriales
                      order                    family
asv_0005  Nitrosopumilaceae Candidatus Nitrosopumilus
asv_0025 Ilumatobacteraceae             Ilumatobacter
asv_0054            Clade I                  Clade Ib
asv_0074          Clade III                 Clade III
asv_0100  Nitrosomonadaceae                     IS-44
                             genus species
asv_0005 Candidatus Nitrosopumilus    <NA>
asv_0025             Ilumatobacter    <NA>
asv_0054                  Clade Ib    <NA>
asv_0074                 Clade III    <NA>
asv_0100                     IS-44    <NA>
 [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

6.17.1 FIG

sigtab_plot <- sigtab %>% rownames_to_column(var = "otu_id") %>% top_n(-90, 
    wt = otu_id) %>% mutate(label = case_when(kingdom == "Eukaryota" ~ str_c(otu_id, 
    class, species, sep = "_"), TRUE ~ str_c(otu_id, class, genus, sep = "_")), 
    supergroup_label = str_c(str_sub(kingdom, 1, 1), supergroup, sep = "_")) %>% 
    arrange(desc(otu_id)) %>% rowid_to_column()




g <- ggplot(sigtab_plot, aes(x = reorder(label, log2FoldChange), y = log2FoldChange, 
    color = kingdom, alpha = log(rowid))) + theme_dviz_grid() + geom_point(size = 6) + 
    theme(axis.text.x = element_text(size = 20, angle = 0, hjust = 0.5, vjust = 0.5), 
        axis.title.x = element_text(size = 20), axis.text.y = element_text(size = 16, 
            angle = 0, hjust = 0, vjust = 0.5), legend.title = element_text(size = 24), 
        legend.text = element_text(size = 20)) + guides(alpha = FALSE) + ylab("log2 fold change - NE <--> SW") + 
    xlab("") + scale_color_viridis_d(option = "C", name = "Kingdom") + coord_flip()

g

ggsave(filename = "../fig/fig_DESeq2.png", g, height = 15, width = 15, dpi = 300)

6.17.2 Do a histogram plot for all ASV significantly different

p <- list()
for (asv_one in sigtab_plot$otu_id) {
    ps_one <- prune_taxa(asv_one, ps)
    p[[asv_one]] <- plot_bar(ps_one, x = "sample_label", fill = "class") + geom_bar(aes(color = class, 
        fill = class), stat = "identity", position = "stack") + ggtitle(asv_one) + 
        theme(axis.text.y = element_text(size = 10)) + theme(axis.text.x = element_text(size = 8, 
        angle = 0, hjust = 0.5)) + coord_flip() + scale_fill_viridis(discrete = TRUE) + 
        scale_color_viridis(discrete = TRUE)
    print(p[[asv_one]])
}

# p_grid <- gridExtra::grid.arrange(grobs=p, nrow=20, ncol=4)
# ggsave(filename='../fig/fig_DESeq2_grid.png', p_grid, height = 45, width =
# 20, dpi = 400)

Metabarcoding Singapore Phyloseq

Metabarcoding Singapore Phyloseq

1 Aim

2 Initialize

3 Read the data

3.1 File names

3.2 Read the files

3.2.1 Metadata and sample files

3.2.2 OTU table

4 Map

4.1 Leaflet map

5 Environmental data

5.1 Per station

5.1.1 Temp

5.1.2 Chlorophyll

5.1.3 Rain over 7 days

5.2 Average

5.2.1 Rain over 7 days

6 Phyloseq analysis

6.1 Create phyloseq files for all taxa after filtering the data

6.1.1 Filter the ASV table to remove low bootstraps

6.1.2 Create the phyloseq file

6.2 Break up into photosynthetic and non-photosynthetic

6.3 Normalize number of reads in each sample using median sequencing depth.

6.4 Phyloseq files for abundant taxa

6.5 Create a list for the auto and hetero phyloseq files

6.6 Create tabular files for other plots

6.7 Treemaps at division and class levels

6.7.1 Define function to draw treemaps

6.7.2 Do the individual treemaps

6.7.3 FIG - Arrange the different treemaps in a grid and save

6.8 Most abundant taxa

6.8.1 Most abundant eukaryote species

6.8.2 Most abundant prokaryotes taxo6 (~ genus)

6.8.3 Most abundant taxa - Save as Grid

6.9 Most abundant OTUs

6.9.1 Compute and plot separately Euk and Prok

6.9.2 FIG Most abundant ASVs - Save as Grid

6.10 Bar plots

6.10.1 Bar plot 20 most abundant divisions

6.10.2 FIG - Bar plot 20 most abundant classes

6.10.3 Bar plot 20 most abundant orders

6.10.4 Bar plot of divisions per station

6.10.5 Bar plot of class per station

6.11 Synthetic comparisons

6.11.1 Compare by Straight, Site, Moonsoon (abundant OTUs only)

6.11.2 Compare by Moonsoon in Singapore straight (abundant OTUs)

6.12 Time series

6.12.1 Time series (abundant OTUs) aggregated for Singapore strait only

6.12.2 Time series (abundant OTUs) - Division level

6.12.3 Time series (abundant OTUs) - Genus level for Chlorophyta

6.13 Main species for each division (Eukaryotes - Autrotrophs)

6.14 Heatmaps

6.14.1 Abundant OTUs

6.14.2 Chlorophyta at species level

6.14.3 Mamiello (Only Bathy, Ostreo and Micromonas) at genus level

6.14.4 Mamiello (Only Bathy, Ostreo and Micromonas) at species level

6.14.5 Syndiniales at order level

6.14.6 FIG - Most abundant OTUs in Johor vs Singapre and Prokaryotes vs Eukaryotes

6.15 NMDS

6.15.1 Define function

6.15.2 All OTUs

6.15.3 Abundant OTUs

6.16 Network analysis

6.17 Differential expression (DESeq2)

6.17.1 FIG

6.17.2 Do a histogram plot for all ASV significantly different

Metabarcoding Singapore
Phyloseq

Metabarcoding Singapore
Phyloseq