Metabarcoding Singapore
Phyloseq

1 Aim

2 Initialize

This file defines all the necessary libraries and variables

4 Map

4.1 Leaflet map

5 Environmental data

5.1 Per station

6 Phyloseq analysis

6.1 Create phyloseq files for all taxa after filtering the data

Filter the euk data to remove the low bootstraps values (threshold : bootstrap > 90% at the supergroup level) and create a phyloseq file

Note the bootstrap threshold 90% for ASVs

6.2 Break up into photosynthetic and non-photosynthetic


Phyloseq All 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2571 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 2571 taxa by 8 taxonomic ranks ]

Phyloseq Eukaryotes 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 683 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 683 taxa by 8 taxonomic ranks ]

Phyloseq Photosynthetic Eukaryotes 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 271 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 271 taxa by 8 taxonomic ranks ]

Phyloseq Heterotrophic Eukaryotes 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 412 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 412 taxa by 8 taxonomic ranks ]

Phyloseq Prokaryotes 
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1888 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 1888 taxa by 8 taxonomic ranks ]

6.3 Normalize number of reads in each sample using median sequencing depth.

  • ! If there no cells do not transform, just set column to 0 function(x, t=total_hetero) (if(sum(x) > 0){ t * (x / sum(x))} else {x})

The median number of reads used for normalization of all is  54948
Phyloseq all
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2571 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 2571 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes (auto+hetero) is  4735
Phyloseq eukaryotes (auto+hetero)
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 683 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 683 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes autotrophs is  3038
Phyloseq eukaryotes autotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 271 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 271 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes heterotrophs is  983
Phyloseq eukaryotes heterotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 412 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 412 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of prokaryotes is  50346
Phyloseq prokaryotes
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1888 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 1888 taxa by 8 taxonomic ranks ]

6.4 Phyloseq files for abundant taxa

  • Remove taxa that are < 0.10 (euks) and <0.05 (proks) in any given sample
  • Normalize again…
Remove taxa in low abundance 

The median number of reads used for normalization of All is  24049
Phyloseq All
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 45 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 45 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes (auto+hetero) is  3359
Phyloseq eukaryotes (auto+hetero)
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 61 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 61 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes autotrophs is  2767
Phyloseq eukaryotes autotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 60 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 60 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of eukaryotes heterotrophs is  688
Phyloseq eukaryotes heterotrophs
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 88 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 88 taxa by 8 taxonomic ranks ]

The median number of reads used for normalization of prokaryotes is  23865
Phyloseq prokaryotes
========== 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 44 taxa and 81 samples ]
sample_data() Sample Data:       [ 81 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 44 taxa by 8 taxonomic ranks ]

6.7 Treemaps at division and class levels

6.7.2 Do the individual treemaps

array_treemap_all <- list()
array_treemap_by_kingdom <- list()

for (one_strait in c("Singapore", "Johor")) {
    
    label <- str_c(one_strait, "-all")
    
    array_treemap_all[[label]] <- treemap_gg_dv(filter(long_all, strait == one_strait), 
        kingdom, supergroup, str_c("", one_strait))
    
    label <- str_c(one_strait, "-arch")
    array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_prok, strait == 
        one_strait & kingdom == "Archaea"), division, class, str_c("A - Archaea - ", 
        one_strait))
    
    label <- str_c(one_strait, "-bact")
    array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_prok, strait == 
        one_strait & kingdom == "Bacteria"), division, class, str_c("B - Bacteria - ", 
        one_strait))
    
    
    label <- str_c(one_strait, "-euks")
    array_treemap_by_kingdom[[label]] <- treemap_gg_dv(filter(long_euk, strait == 
        one_strait), division, class, str_c("C - All Eulkaryotes - ", one_strait))
    
    
    # label <- str_c(one_strait,'-photeuks') array_treemap[[label]] <-
    # treemap_gg_dv(filter(long_photo, strait == one_strait), division, class,
    # str_c('D - Photosynthetic Eukaryotes - ', one_strait ))
    
    
    # treemap_dv(filter(long_euk, strait == one_strait), c('division',
    # 'class'),'n_seq', str_c('All euks - ', one_strait ))
    # treemap_dv(filter(long_photo, strait == one_strait), c('division',
    # 'class'),'n_seq',str_c('Photo euks', one_strait ))
    # treemap_dv(filter(long_prok, strait == one_strait & kingdom=='Bacteria'),
    # c('division', 'class'),'n_seq',str_c('Bacteria - ', one_strait ))
    # treemap_dv(filter(long_prok, strait == one_strait & kingdom=='Archaea'),
    # c('division', 'class'),'n_seq',str_c('Archaea - ', one_strait ))
}

6.7.3 FIG - Arrange the different treemaps in a grid and save

TableGrob (1 x 2) "arrange": 2 grobs
              z     cells    name           grob
Singapore-all 1 (1-1,1-1) arrange gtable[layout]
Johor-all     2 (1-1,2-2) arrange gtable[layout]

TableGrob (3 x 2) "arrange": 6 grobs
               z     cells    name           grob
Singapore-arch 1 (1-1,1-1) arrange gtable[layout]
Singapore-bact 2 (2-2,1-1) arrange gtable[layout]
Singapore-euks 3 (3-3,1-1) arrange gtable[layout]
Johor-arch     4 (1-1,2-2) arrange gtable[layout]
Johor-bact     5 (2-2,2-2) arrange gtable[layout]
Johor-euks     6 (3-3,2-2) arrange gtable[layout]

6.8 Most abundant taxa

6.9 Most abundant OTUs

6.9.1 Compute and plot separately Euk and Prok

df <- filter(long_all, strait %in% c("Johor", "Singapore")) %>% mutate(otu_label = case_when(kingdom == 
    "Eukaryota" ~ str_c(otu_id, species, sep = "-"), TRUE ~ str_c(otu_id, family, 
    sep = "-"))) %>% group_by(kingdom, supergroup, division, class, genus, species, 
    otu_id, otu_label, strait) %>% summarize(n_seq = sum(n_seq)) %>% arrange(desc(n_seq)) %>% 
    filter(n_seq > 0) %>% ungroup()

df_strait <- df %>% select(otu_label, strait) %>% group_by(otu_label) %>% summarise(n_strait = n()) %>% 
    ungroup()

df <- df %>% left_join(df_strait) %>% mutate(otu_label = case_when(n_strait == 
    1 ~ str_c(otu_label, " +"), TRUE ~ otu_label))

otu_strait <- df

array_euk <- list()
array_prok <- list()

# Do plots

for (one_strait in c("Singapore", "Johor")) {
    
    # For euks
    
    df_one <- filter(df, strait == one_strait & kingdom == "Eukaryota")
    
    array_euk[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(otu_label, 
        n_seq), y = n_seq, fill = division)) + coord_flip() + theme_bw() + theme(axis.text.x = element_text(size = 16, 
        angle = 0, hjust = 1, vjust = 1)) + theme(axis.text.y = element_text(size = 16, 
        angle = 0, hjust = 0, vjust = 0)) + theme(legend.title = element_text(size = 24)) + 
        theme(legend.text = element_text(size = 16)) + xlab("") + ylab("Number of reads") + 
        scale_fill_manual(values = division_euk_colors) + ggtitle(one_strait) + 
        theme(plot.title = element_text(size = 22, hjust = 0.5)) + # theme(axis.text=element_text(size=14), legend.text =
    # element_text(size=16)) +
    theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top", 
        ncol = 2, nrow = 4, byrow = FALSE))
    
    print(array_euk[[one_strait]])
    
    # For proks
    
    
    df_one <- filter(df, strait == one_strait & kingdom != "Eukaryota")
    
    array_prok[[one_strait]] <- ggplot(top_n(df_one, 30, n_seq)) + geom_col(aes(x = reorder(otu_label, 
        n_seq), y = n_seq, fill = supergroup)) + coord_flip() + theme_bw() + 
        theme(axis.text.x = element_text(size = 16, angle = 0, hjust = 1, vjust = 1)) + 
        theme(axis.text.y = element_text(size = 16, angle = 0, hjust = 0, vjust = 0)) + 
        theme(legend.title = element_text(size = 24)) + theme(legend.text = element_text(size = 16)) + 
        xlab("") + ylab("Number of reads") + scale_fill_manual(values = supergroup_colors) + 
        ggtitle(one_strait) + theme(plot.title = element_text(size = 22, hjust = 0.5)) + 
        # theme(axis.text=element_text(size=14), legend.text =
    # element_text(size=16)) +
    theme(legend.position = "top", legend.box = "vertical") + guides(fill = guide_legend(title.position = "top", 
        ncol = 2, nrow = 4, byrow = FALSE))
    
    print(array_prok[[one_strait]])
    
}

6.10 Bar plots

6.10.1 Bar plot 20 most abundant divisions

  • RM samples not considered
division_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom, 
    supergroup, division) %>% summarise(n_seq_division = sum(n_seq)) %>% arrange(-n_seq_division)



long_division_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% 
    mutate(division = case_when(division %in% division_top$division[1:20] ~ 
        division, TRUE ~ "Other")) %>% group_by(strait, sample_label, division) %>% 
    summarise(n_seq_division = sum(n_seq))

seq_stations <- long_division_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_division))

n_seq_station <- seq_stations$n_seq_station[1]

# Set up the colors

colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD", 
    "#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711", 
    "#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788", 
    "#999099", "#AA4455")

colours_name <- t(lapply(colours, plotrix::color.id))

# Read the colors from Excel file

top_division_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "D1:F22")

top_division_colors <- structure(top_division_colors.df$color_name, .Names = top_division_colors.df$division)


ggplot(long_division_top, aes(x = sample_label, y = n_seq_division/n_seq_station, 
    fill = factor(division, levels = top_division_colors.df$division))) + geom_col() + 
    theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) + 
    # scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_division_colors) + scale_y_continuous(labels = scales::percent, 
    expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") + 
    guides(fill = guide_legend(title = "Division", title.position = "top", ncol = 5, 
        nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait, 
    scales = "free_x", space = "free_x")

6.10.2 FIG - Bar plot 20 most abundant classes

  • RM samples not considered
class_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom, 
    supergroup, division, class) %>% summarise(n_seq_class = sum(n_seq)) %>% 
    arrange(-n_seq_class)



long_class_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% 
    mutate(class = case_when(class %in% class_top$class[1:20] ~ class, TRUE ~ 
        "Other")) %>% group_by(strait, sample_label, class) %>% summarise(n_seq_class = sum(n_seq))

seq_stations <- long_class_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_class))

n_seq_station <- seq_stations$n_seq_station[1]

# Set up the colors

colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD", 
    "#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711", 
    "#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788", 
    "#999099", "#AA4455")

colours_name <- t(lapply(colours, plotrix::color.id))


# RColorBrewer::display.brewer.pal(n=3, name='Reds')
colours <- RColorBrewer::brewer.pal(n = 3, name = "Reds")
colours_name_archeae <- data.frame(purrr::flatten_chr(purrr::map(colours, plotrix::color.id)))

# RColorBrewer::display.brewer.pal(n=9, name='Blues')
colours <- RColorBrewer::brewer.pal(n = 9, name = "Blues")
colours_name_bacteria_1 <- data.frame(purrr::flatten_chr(purrr::map(colours, 
    plotrix::color.id)))

# RColorBrewer::display.brewer.pal(n=9, name='Purples')
colours <- RColorBrewer::brewer.pal(n = 9, name = "Purples")
colours_name_bacteria_2 <- data.frame(purrr::flatten_chr(purrr::map(colours, 
    plotrix::color.id)))

# RColorBrewer::display.brewer.pal(n=4, name='Greens')
colours <- RColorBrewer::brewer.pal(n = 4, name = "Greens")
colours_name_euks <- data.frame(purrr::flatten_chr(purrr::map(colours, plotrix::color.id)))

colours_name <- bind_rows(colours_name_archeae, colours_name_bacteria_1, colours_name_bacteria_2, 
    colours_name_euks)
writeClipboard(colours_name[, 1])


# Read the colors from Excel file

top_class_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "L1:N22")

top_class_colors <- structure(top_class_colors.df$color_name, .Names = top_class_colors.df$class)


g <- ggplot(long_class_top, aes(x = sample_label, y = n_seq_class/n_seq_station, 
    fill = factor(class, levels = top_class_colors.df$class))) + geom_col() + 
    theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) + 
    # scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_class_colors) + scale_y_continuous(labels = scales::percent, 
    expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") + 
    guides(fill = guide_legend(title = "Class", title.position = "top", ncol = 5, 
        nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait, 
    scales = "free_x", space = "free_x")

print(g)

6.10.3 Bar plot 20 most abundant orders

  • RM samples not considered
order_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% group_by(kingdom, 
    supergroup, division, class, order) %>% summarise(n_seq_order = sum(n_seq)) %>% 
    arrange(-n_seq_order)



long_order_top <- long_all %>% filter(!str_detect(location_label, "RM")) %>% 
    mutate(order = case_when(order %in% order_top$order[1:20] ~ order, TRUE ~ 
        "Other")) %>% group_by(strait, sample_label, order) %>% summarise(n_seq_order = sum(n_seq))

seq_stations <- long_order_top %>% group_by(sample_label) %>% summarise(n_seq_station = sum(n_seq_order))

n_seq_station <- seq_stations$n_seq_station[1]

# Set up the colors

colours <- c("#771155", "#AA4488", "#CC99BB", "#114477", "#4477AA", "#77AADD", 
    "#117777", "#44AAAA", "#77CCCC", "#117744", "#44AA77", "#88CCAA", "#777711", 
    "#AAAA44", "#DDDD77", "#774411", "#AA7744", "#DDAA77", "#771122", "#DD7788", 
    "#999099", "#AA4455")

colours_name <- t(lapply(colours, plotrix::color.id))

# Read the colors from Excel file

top_order_colors.df <- read_excel(metadata_xlsx, sheet = "colors", range = "U1:W22")

top_order_colors <- structure(top_order_colors.df$color_name, .Names = top_order_colors.df$order)


ggplot(long_order_top, aes(x = sample_label, y = n_seq_order/n_seq_station, 
    fill = factor(order, levels = top_order_colors.df$order))) + geom_col() + 
    theme_bw() + theme(axis.text.y = element_text(size = 10, hjust = 0, vjust = 0.5)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 0, vjust = 0.5)) + 
    # scale_fill_viridis(discrete=TRUE) +
scale_fill_manual(values = top_order_colors) + scale_y_continuous(labels = scales::percent, 
    expand = expand_scale()) + theme(legend.position = "top", legend.box = "vertical") + 
    guides(fill = guide_legend(title = "Order", title.position = "top", ncol = 5, 
        nrow = 5, byrow = FALSE)) + xlab("") + ylab("Proportion") + facet_grid(~strait, 
    scales = "free_x", space = "free_x")

6.12 Time series

6.14 Heatmaps

6.14.6 FIG - Most abundant OTUs in Johor vs Singapre and Prokaryotes vs Eukaryotes

heat_prok <- list()
heat_euk <- list()

for (strait_one in c("Singapore", "Johor")) {
     
for (kingdom_one in list(c("Bacteria", "Archaea"), "Eukaryota")){

  otu_strait_one <- otu_strait %>% 
    filter(strait == strait_one & kingdom %in% kingdom_one) %>% 
    top_n(30, n_seq)

  ps_heat <- ps_all %>% 
    subset_samples(strait==strait_one) %>%
    subset_taxa(kingdom %in% kingdom_one)
  
  otu_table_keep <- data.frame(otu_table(ps_heat)) %>% 
    rownames_to_column("otu_id") %>% 
    filter(otu_id %in% otu_strait_one$otu_id) %>% 
    left_join(select(otu_strait_one, otu_id)) %>% 
    column_to_rownames("otu_id")
  
  tax_table_keep <- data.frame(tax_table(ps_heat)) %>% 
    rownames_to_column("otu_id") %>% 
    filter(otu_id %in% otu_strait_one$otu_id) %>% 
    left_join(select(otu_strait_one, otu_id, otu_label)) %>% 
    column_to_rownames("otu_id")

  OTU = otu_table(as.matrix(otu_table_keep), taxa_are_rows = TRUE)
  TAX = tax_table(as.matrix(tax_table_keep))
  samples = sample_data(ps_heat)
  
  ps_heat <- phyloseq(OTU, TAX, samples)    

  print(ps_heat)

  p <- plot_heatmap(ps_heat, method = "NMDS", distance = "bray", 
               taxa.label = "otu_label",  taxa.order="supergroup",
               sample.label = "sample_label", sample.order="sample_label",
               low="beige", high="red", na.value="beige",trans = scales::log10_trans(), 
               title=strait_one) +
    theme(axis.text.y = element_text(size = 16, angle = 0, hjust = 0, vjust = 0.5)) +
    theme(axis.text.x = element_text(size = 16)) +
    theme(axis.title = element_text(size = 0)) +
    theme(plot.title = element_text(size=22, hjust = 0.5)) +
    xlab("Sample") + ylab("")

  print(p)
  if (kingdom_one=="Eukaryota") heat_euk[[strait_one]] <- p 
  else heat_prok[[strait_one]] <- p

}}
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 30 samples ]
sample_data() Sample Data:       [ 30 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 30 samples ]
sample_data() Sample Data:       [ 30 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 40 samples ]
sample_data() Sample Data:       [ 40 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 30 taxa and 40 samples ]
sample_data() Sample Data:       [ 40 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 30 taxa by 9 taxonomic ranks ]

6.15 NMDS

Sample removed because they were pulling the NMDS * PR2X16XS21 it has a single eukaryote (diatom bloom ) * RM13XS36 cause problem for bacteria * PR11XS25 cause problem for hetero euks * SBW11XS26 cause problem for hetero euks * SBW13XS37 cause problem for hetero euks * RM13XS36 cause problem for hetero euks

6.15.1 Define function

  • See comments inside functions (saved plots are different from displayed plots)
ps_do_nmds <- function(ps_list) {
    
    plot_array <- list()
    
    for (i in 1:3) {
        ps_nmds <- ps_list$ps[[i]]
        
        # Remove samples with no reads
        ps_nmds <- prune_samples(sample_sums(ps_nmds) > 0, ps_nmds)
        
        # Remove samples from Raffles and Johor
        ps_nmds <- subset_samples(ps_nmds, !(str_detect(strait, "Raffles Mari|Johor")))
        
        # Remove samples that caused problems (1= prok, 2=euk, 3=euk auto)
        if (i == 1) 
            {
                ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("RM13XS36")), 
                  ps_nmds)
            }  # Prokaryotes
        if (i %in% c(2, 3)) 
            {
                ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR2X16SXS21")), 
                  ps_nmds)
            }  # Eukaryotes
        if (i == 4) 
            {
                ps_nmds <- prune_samples(!(sample_names(ps_nmds) %in% c("PR11XS25", 
                  "SBW11XS26", "SBW13XS37")), ps_nmds)
            }  # Heterotrophs not used
        
        singa.ord <- ordinate(ps_nmds, "NMDS", "bray")
        
        # Fit environmental parameters
        env_var <- sample_variables(ps_nmds)
        env_matrix <- get_variable(ps_nmds, c("Chl", "Temperature", "Salinity", 
            "Phosphate", "Silicate", "DIN", "BAC"))
        env_fit <- vegan::envfit(singa.ord, env = env_matrix, perm = 999, na.rm = TRUE)
        env_arrows <- data.frame(env_fit$vectors$arrows * sqrt(env_fit$vectors$r)) %>% 
            rownames_to_column(var = "parameter")
        
        
        nmds_samples <- data.frame(singa.ord[["points"]], get_variable(ps_nmds, 
            c("monsoon", "strait", "location", "location_label"))) %>% rownames_to_column(var = "sample")
        
        # Factor to move the labels
        nudge_x <- max(nmds_samples$MDS1) * 0.08
        nudge_y <- max(nmds_samples$MDS2) * 0.08
        xy_max = max(c(nmds_samples$MDS1, nmds_samples$MDS2)) * 1.5
        xy_min = min(c(nmds_samples$MDS1, nmds_samples$MDS2)) * 1.5
        factor <- 3  # for vectors for euks
        if (i == 1) 
            {
                factor <- 1.5
            }  # for vectors for proks
        print(factor)
        
        p <- plot_ordination(ps_nmds, singa.ord, type = "samples", color = "monsoon", 
            shape = "strait", title = ps_list$title[[i]]) + geom_point(aes(shape = strait, 
            color = monsoon), size = 3.5) + scale_color_viridis(discrete = TRUE) + 
            scale_shape_manual(values = c(15, 16)) + geom_text(aes(label = location_label, 
            color = monsoon), nudge_x = nudge_x, nudge_y = nudge_y, check_overlap = TRUE, 
            size = 2) + theme_bw() + geom_segment(data = env_arrows, aes(x = 0, 
            xend = NMDS1 * factor, y = 0, yend = NMDS2 * factor), inherit.aes = FALSE, 
            arrow = arrow(length = unit(0.5, "cm")), colour = "black") + geom_text(data = env_arrows, 
            aes(x = NMDS1 * factor, y = NMDS2 * factor, label = parameter), 
            inherit.aes = FALSE, hjust = -0.2, vjust = -0.2, size = 3)
        print(singa.ord)
        plot_array[[i]] <- p
        print(p)
        
        # The following lines can be used if you want to avoid using the pjhyloseq
        # functions to plot the data.  Notes : - must use inherit.aes = FALSE to add
        # some extra layers - the saved plots have a different scale for the added
        # layer than the displaued plot can figure out ggplot()+ coord_fixed() +
        # xlim(xy_min, xy_max) + ylim(xy_min, xy_max) +
        # geom_point(data=nmds_samples, aes(x=MDS1, y=MDS2, shape=strait,
        # color=monsoon), size=5) + geom_text(data=nmds_samples, aes(x=MDS1, y=MDS2,
        # label=location_label, color=monsoon), nudge_x=nudge_x, nudge_y=nudge_y,
        # check_overlap = FALSE, size=2) + ggtitle(ps_list$title[[i]]) +
        
        p <- plot_ordination(ps_nmds, singa.ord, type = "taxa", color = "division", 
            title = ps_list$title[[i]]) + scale_color_viridis(discrete = TRUE) + 
            geom_point(size = 3) + theme_bw() + geom_segment(data = env_arrows, 
            aes(x = 0, xend = NMDS1 * factor, y = 0, yend = NMDS2 * factor), 
            inherit.aes = FALSE, arrow = arrow(length = unit(0.5, "cm")), colour = "black") + 
            geom_text(data = env_arrows, aes(x = NMDS1 * factor, y = NMDS2 * 
                factor, label = parameter), inherit.aes = FALSE, hjust = -0.2, 
                vjust = -0.2, size = 3)
        
        print(p)
        plot_array[[i + 3]] <- p
    }
    return(plot_array)
}

6.15.2 All OTUs

Square root transformation
Wisconsin double standardization
Run 0 stress 0.12 
Run 1 stress 0.12 
... New best solution
... Procrustes: rmse 0.03  max resid 0.13 
Run 2 stress 0.12 
... New best solution
... Procrustes: rmse 0.00031  max resid 0.0014 
... Similar to previous best
Run 3 stress 0.12 
... Procrustes: rmse 3e-04  max resid 0.0014 
... Similar to previous best
Run 4 stress 0.12 
Run 5 stress 0.12 
... New best solution
... Procrustes: rmse 0.00024  max resid 0.00077 
... Similar to previous best
Run 6 stress 0.12 
Run 7 stress 0.12 
... New best solution
... Procrustes: rmse 0.00013  max resid 0.00045 
... Similar to previous best
Run 8 stress 0.12 
Run 9 stress 0.12 
... Procrustes: rmse 4.6e-05  max resid 0.00012 
... Similar to previous best
Run 10 stress 0.12 
Run 11 stress 0.12 
... Procrustes: rmse 0.00011  max resid 0.00036 
... Similar to previous best
Run 12 stress 0.15 
Run 13 stress 0.18 
Run 14 stress 0.12 
... Procrustes: rmse 0.00033  max resid 0.0012 
... Similar to previous best
Run 15 stress 0.15 
Run 16 stress 0.12 
Run 17 stress 0.12 
Run 18 stress 0.16 
Run 19 stress 0.12 
... Procrustes: rmse 4.9e-05  max resid 0.00021 
... Similar to previous best
Run 20 stress 0.12 
... Procrustes: rmse 0.00013  max resid 0.00053 
... Similar to previous best
*** Solution reached
[1] 1.5

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.12 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.22 
Run 1 stress 0.24 
Run 2 stress 0.27 
Run 3 stress 0.27 
Run 4 stress 0.22 
... New best solution
... Procrustes: rmse 0.036  max resid 0.1 
Run 5 stress 0.22 
Run 6 stress 0.25 
Run 7 stress 0.23 
Run 8 stress 0.22 
Run 9 stress 0.22 
Run 10 stress 0.22 
Run 11 stress 0.24 
Run 12 stress 0.24 
Run 13 stress 0.25 
Run 14 stress 0.22 
Run 15 stress 0.22 
... Procrustes: rmse 0.03  max resid 0.14 
Run 16 stress 0.23 
Run 17 stress 0.22 
Run 18 stress 0.23 
Run 19 stress 0.24 
Run 20 stress 0.22 
... New best solution
... Procrustes: rmse 0.00012  max resid 0.00037 
... Similar to previous best
*** Solution reached
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.22 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.2 
Run 1 stress 0.19 
... New best solution
... Procrustes: rmse 0.1  max resid 0.31 
Run 2 stress 0.19 
... New best solution
... Procrustes: rmse 0.031  max resid 0.11 
Run 3 stress 0.21 
Run 4 stress 0.18 
... New best solution
... Procrustes: rmse 0.081  max resid 0.27 
Run 5 stress 0.25 
Run 6 stress 0.19 
Run 7 stress 0.2 
Run 8 stress 0.21 
Run 9 stress 0.18 
... Procrustes: rmse 0.0082  max resid 0.031 
Run 10 stress 0.22 
Run 11 stress 0.18 
... New best solution
... Procrustes: rmse 0.018  max resid 0.072 
Run 12 stress 0.19 
Run 13 stress 0.22 
Run 14 stress 0.19 
Run 15 stress 0.19 
Run 16 stress 0.22 
Run 17 stress 0.18 
... Procrustes: rmse 0.018  max resid 0.071 
Run 18 stress 0.19 
Run 19 stress 0.22 
Run 20 stress 0.18 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.18 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

6.15.3 Abundant OTUs

Square root transformation
Wisconsin double standardization
Run 0 stress 0.13 
Run 1 stress 0.17 
Run 2 stress 0.13 
... Procrustes: rmse 0.00069  max resid 0.0031 
... Similar to previous best
Run 3 stress 0.22 
Run 4 stress 0.13 
... Procrustes: rmse 0.00047  max resid 0.0021 
... Similar to previous best
Run 5 stress 0.13 
... Procrustes: rmse 0.00049  max resid 0.0022 
... Similar to previous best
Run 6 stress 0.13 
... Procrustes: rmse 0.00019  max resid 0.00085 
... Similar to previous best
Run 7 stress 0.13 
... Procrustes: rmse 0.00026  max resid 0.0012 
... Similar to previous best
Run 8 stress 0.14 
Run 9 stress 0.13 
... Procrustes: rmse 0.00081  max resid 0.0037 
... Similar to previous best
Run 10 stress 0.13 
... Procrustes: rmse 0.00077  max resid 0.0035 
... Similar to previous best
Run 11 stress 0.13 
... Procrustes: rmse 0.0011  max resid 0.0051 
... Similar to previous best
Run 12 stress 0.13 
... Procrustes: rmse 0.00031  max resid 0.0014 
... Similar to previous best
Run 13 stress 0.13 
... Procrustes: rmse 0.00037  max resid 0.0017 
... Similar to previous best
Run 14 stress 0.18 
Run 15 stress 0.13 
... Procrustes: rmse 0.00062  max resid 0.0028 
... Similar to previous best
Run 16 stress 0.14 
Run 17 stress 0.13 
... Procrustes: rmse 0.00062  max resid 0.0028 
... Similar to previous best
Run 18 stress 0.39 
Run 19 stress 0.14 
Run 20 stress 0.13 
... Procrustes: rmse 0.00053  max resid 0.0024 
... Similar to previous best
*** Solution reached
[1] 1.5

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.13 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.21 
Run 1 stress 0.22 
Run 2 stress 0.25 
Run 3 stress 0.21 
... Procrustes: rmse 0.018  max resid 0.084 
Run 4 stress 0.25 
Run 5 stress 0.22 
Run 6 stress 0.24 
Run 7 stress 0.21 
... Procrustes: rmse 0.00018  max resid 0.00056 
... Similar to previous best
Run 8 stress 0.24 
Run 9 stress 0.22 
Run 10 stress 0.3 
Run 11 stress 0.22 
Run 12 stress 0.26 
Run 13 stress 0.22 
Run 14 stress 0.22 
Run 15 stress 0.21 
... New best solution
... Procrustes: rmse 3.3e-05  max resid 7.9e-05 
... Similar to previous best
Run 16 stress 0.24 
Run 17 stress 0.21 
... Procrustes: rmse 0.018  max resid 0.084 
Run 18 stress 0.25 
Run 19 stress 0.21 
... New best solution
... Procrustes: rmse 0.00013  max resid 0.00044 
... Similar to previous best
Run 20 stress 0.24 
*** Solution reached
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.21 
Stress type 1, weak ties
Two convergent solutions found after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

Square root transformation
Wisconsin double standardization
Run 0 stress 0.21 
Run 1 stress 0.23 
Run 2 stress 0.21 
... Procrustes: rmse 0.11  max resid 0.3 
Run 3 stress 0.21 
... Procrustes: rmse 0.11  max resid 0.3 
Run 4 stress 0.21 
Run 5 stress 0.24 
Run 6 stress 0.2 
... New best solution
... Procrustes: rmse 0.061  max resid 0.27 
Run 7 stress 0.22 
Run 8 stress 0.23 
Run 9 stress 0.21 
Run 10 stress 0.2 
Run 11 stress 0.2 
... Procrustes: rmse 0.0065  max resid 0.02 
Run 12 stress 0.22 
Run 13 stress 0.22 
Run 14 stress 0.22 
Run 15 stress 0.22 
Run 16 stress 0.22 
Run 17 stress 0.24 
Run 18 stress 0.2 
Run 19 stress 0.21 
Run 20 stress 0.2 
*** No convergence -- monoMDS stopping criteria:
    20: stress ratio > sratmax
[1] 3

Call:
metaMDS(comm = veganifyOTU(physeq), distance = distance) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
Distance: bray 

Dimensions: 2 
Stress:     0.2 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))' 

### MSDS graph

6.17 Differential expression (DESeq2)

Love, M.I., Huber, W. & Anders, S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550.

[1] '1.22.2'

Block 1 is need to avoid the following error > Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means

See: https://github.com/joey711/phyloseq/issues/387

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2571 taxa and 21 samples ]
sample_data() Sample Data:       [ 21 samples by 26 sample variables ]
tax_table()   Taxonomy Table:    [ 2571 taxa by 8 taxonomic ranks ]
         baseMean log2FoldChange lfcSE stat  pvalue    padj  kingdom
asv_0005   2423.0            4.7  1.47  3.2 1.5e-03 6.8e-03  Archaea
asv_0025     39.8            8.9  2.60  3.4 6.3e-04 3.2e-03 Bacteria
asv_0054    596.5            1.4  0.42  3.3 9.2e-04 4.4e-03 Bacteria
asv_0074    245.0            4.9  1.24  4.0 7.3e-05 4.4e-04 Bacteria
asv_0100      5.6           21.2  2.96  7.1 9.4e-13 7.6e-12 Bacteria
             supergroup            division                 class
asv_0005 Thaumarchaeota     Nitrososphaeria      Nitrosopumilales
asv_0025 Actinobacteria      Acidimicrobiia        Microtrichales
asv_0054 Proteobacteria Alphaproteobacteria           SAR11 clade
asv_0074 Proteobacteria Alphaproteobacteria           SAR11 clade
asv_0100 Proteobacteria Gammaproteobacteria Betaproteobacteriales
                      order                    family
asv_0005  Nitrosopumilaceae Candidatus Nitrosopumilus
asv_0025 Ilumatobacteraceae             Ilumatobacter
asv_0054            Clade I                  Clade Ib
asv_0074          Clade III                 Clade III
asv_0100  Nitrosomonadaceae                     IS-44
                             genus species
asv_0005 Candidatus Nitrosopumilus    <NA>
asv_0025             Ilumatobacter    <NA>
asv_0054                  Clade Ib    <NA>
asv_0074                 Clade III    <NA>
asv_0100                     IS-44    <NA>
 [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

Daniel Vaulot, Adriana Lopes dos Santos

19 08 2019