BIOSOPE - step 4 - Compare dada2 vs. mothur
BIOSOPE - step 4 - Compare dada2 vs. mothur
- 1 Aim
- 2 Directory structure
- 3 Analysis
- 3.1 To do
- 3.2 Initialize
- 3.3 Read the phyloseq and database files
- 3.4 Make Simple tables from the phyloseq
- 3.5 Merge and convert to long format
- 3.6 Aggregate by Division, Class, Genus, Species
- 3.7 Compare overall composition
- 3.8 Compare Chlorophyta composition
- 3.9 Compare estimates of taxon abundance
- 3.10 Comparaison per station for the whole community
- 3.11 Comparaison per station for Chlorophyta
- 4 Conclusions
## Warning: package 'rmdformats' was built under R version 3.4.4
1 Aim
Comparing the community composition with dada2 vs mothur
2 Directory structure
- /cutadapt : fastq files treated by cutadapt
- /dada2 : dada2 files
- /mothur : mothur files
3 Analysis
3.1 To do
- Move from Species to Genus
- Color dots by division
- Do the treemaps for the two methods
- Do station histograms at genus level for chlorophyta
3.2 Initialize
- BIOSOPE_init.R defines all the necessary libraries and variables # r, eval=TRUE, message=FALSE, warning=FALSE
source("BIOSOPE_init.R", echo = FALSE)3.3 Read the phyloseq and database files
ps_dada2 <- readRDS(file = str_c(dir_dada2, "ps_biosope_dada2.rds"))
ps_mothur <- readRDS(file = str_c(dir_mothur, "ps_biosope_mothur.rds"))3.4 Make Simple tables from the phyloseq
Note: for Dada2 only consider the samples common to mothur and dada2merge and tranform to long format
samples <- data.frame(sample_data(ps_dada2), stringsAsFactors = FALSE)
# Dada2 : only consider the samples common to mothur and dada2
otus_dada2 <- data.frame(otu_table(subset_samples(ps_dada2, mothur_missing ==
0)))
taxa_dada2 <- data.frame(tax_table(subset_samples(ps_dada2, mothur_missing ==
0)))
# It is necessary to create a column from the row names
otus_dada2 <- rownames_to_column(otus_dada2, var = "OTUNumber")
taxa_dada2 <- rownames_to_column(taxa_dada2, var = "OTUNumber")
# Mothur
otus_mothur <- data.frame(otu_table(ps_mothur))
taxa_mothur <- data.frame(tax_table(ps_mothur))
otus_mothur <- rownames_to_column(otus_mothur, var = "OTUNumber")
taxa_mothur <- rownames_to_column(taxa_mothur, var = "OTUNumber")3.5 Merge and convert to long format
Note: when converting to from phyloseq format to dataframe format, an X is added at the start of column name for samples because they are starting with a digit. This X is removed.
dada2_long <- left_join(taxa_dada2, otus_dada2) %>% gather(key = "sample", value = "n_seq",
starts_with("X")) %>% mutate(sample = str_replace(sample, "X", ""))
mothur_long <- left_join(taxa_mothur, otus_mothur) %>% gather(key = "sample",
value = "n_seq", starts_with("X")) %>% mutate(sample = str_replace(sample,
"X", ""))3.6 Aggregate by Division, Class, Genus, Species
species_dada2 <- dada2_long %>% group_by(Division, Class, Genus, Species) %>%
summarize(n_seq = sum(n_seq))
write_tsv(species_dada2, str_c(dir_R_out, "species_dada2.txt"), na = "")
genus_dada2 <- dada2_long %>% group_by(Division, Class, Genus) %>% summarize(n_seq = sum(n_seq))
genus_dada2 <- genus_dada2 %>% ungroup() %>% arrange(desc(n_seq))
knitr::kable(head(species_dada2, 20), caption = "Genera from dada2 - first 20 lines") %>%
kable_styling(latex_options = "scale_down", font_size = 7)| Division | Class | Genus | Species | n_seq |
|---|---|---|---|---|
| Apicomplexa | Apicomplexa_X | Colpodella | Colpodella_pontica | 133 |
| Apusomonadidae | Apusomonadidae_Group-1 | Amastigomonas | Amastigomonas_debruynei | 660 |
| Apusomonadidae | Apusomonadidae_Group-2 | Apusomonadidae_Group-2A_XX | Apusomonadidae_Group-2A_XX_sp. | 619 |
| Apusomonadidae | Apusomonadidae_Group-2 | Apusomonadidae_Group-2B_XX | Apusomonadidae_Group-2B_XX_sp. | 8338 |
| Cercozoa | Cercozoa_X | Cercozoa_XXXX | Cercozoa_XXXX_sp. | 45 |
| Cercozoa | Endomyxa | Endo4-lineage_X | Endo4-lineage_X_sp. | 2777 |
| Cercozoa | Filosa | Discomonas | Discomonas_retusa | 3 |
| Cercozoa | Filosa-Chlorarachnea | Lotharella | Lotharella_sp. | 4 |
| Cercozoa | Filosa-Chlorarachnea | Minorisa | Minorisa_minuta | 249 |
| Cercozoa | Filosa-Chlorarachnea | Minorisa-lineage_X | Minorisa-lineage_X_sp. | 27 |
| Cercozoa | Filosa-Chlorarachnea | NPK2-lineage_X | NPK2-lineage_X_sp. | 1346 |
| Cercozoa | Filosa-Granofilosea | Massisteria | Massisteria_marina | 93 |
| Cercozoa | Filosa-Granofilosea | Massiteriidae_X | Massiteriidae_X_sp. | 383 |
| Cercozoa | Filosa-Imbricatea | Novel-clade-2_X | Novel-clade-2_X_sp. | 1728 |
| Cercozoa | Filosa-Sarcomonadea | Helkesimastix | Helkesimastix_marina | 13 |
| Cercozoa | Filosa-Thecofilosea | Botuliforma | Botuliforma_benthica | 7 |
| Cercozoa | Filosa-Thecofilosea | Protaspa | Protaspa_grandis | 183 |
| Cercozoa | Filosa-Thecofilosea | Ventrifissura | Ventrifissura_foliiformis | 5 |
| Cercozoa | Phaeodarea | Challengeron | Challengeron_tizardi | 265 |
| Cercozoa | Phaeodarea | Entocannula | Entocannula_infundibulum | 4 |
species_mothur <- mothur_long %>% group_by(Division, Class, Genus, Species) %>%
summarize(n_seq = sum(n_seq))
write_tsv(species_mothur, str_c(dir_R_out, "species_mothur.txt"), na = "")
genus_mothur <- mothur_long %>% group_by(Division, Class, Genus) %>% summarize(n_seq = sum(n_seq))
genus_mothur <- genus_mothur %>% ungroup() %>% arrange(desc(n_seq))
knitr::kable(head(genus_mothur, 20), caption = "Genera from mothur - first 20 lines") %>%
kable_styling(latex_options = "scale_down", font_size = 7)| Division | Class | Genus | n_seq |
|---|---|---|---|
| Radiolaria | RAD-A | RAD-A_XXX | 11871 |
| Chlorophyta | Prasino-Clade-VII | Prasino-Clade-VII-B-1 | 8517 |
| Dinoflagellata | Syndiniales | Dino-Group-I-Clade-1_X | 7171 |
| Radiolaria | RAD-B | RAD-B-Group-I_X | 6598 |
| Chlorophyta | Mamiellophyceae | Bathycoccus | 5949 |
| Radiolaria | Acantharea | Acanthometron | 4472 |
| Apusomonadidae | Apusomonadidae_Group-2 | Apusomonadidae_Group-2B_XX | 4450 |
| Radiolaria | RAD-B | RAD-B-Group-IV_X | 4223 |
| Chlorophyta | Prasino-Clade-9 | Prasino-Clade-9-B_X | 3827 |
| Dinoflagellata | Dinophyceae | Prorocentrum | 3539 |
| Dinoflagellata | Dinophyceae | Dinophyceae_unclassified | 3356 |
| Eukaryota_unclassified | Eukaryota_unclassified | Eukaryota_unclassified | 2184 |
| Radiolaria | RAD-B | RAD-B-Group-II_X | 2116 |
| Radiolaria | Polycystinea | Spongotrochus | 2036 |
| Chlorophyta | Prasino-Clade-VII | Prasino-Clade-VII-A-4 | 2027 |
| Radiolaria | RAD-B | Sticholonche | 2000 |
| Radiolaria | Polycystinea | Spumellarida-Group-I_X | 1990 |
| Dinoflagellata | Syndiniales | Dino-Group-III_XX | 1876 |
| Cercozoa | Endomyxa | Endo4-lineage_X | 1817 |
| Radiolaria | Polycystinea | Pterocorys | 1787 |
3.7 Compare overall composition
dv_treemap(species_mothur, c("Division", "Class"), "n_seq", "Mothur")
dv_treemap(species_dada2, c("Division", "Class"), "n_seq", "Dada2")3.8 Compare Chlorophyta composition
dv_treemap(filter(species_mothur, Division == "Chlorophyta"), c("Class", "Genus"),
"n_seq", "Mothur")
dv_treemap(filter(species_dada2, Division == "Chlorophyta"), c("Class", "Genus"),
"n_seq", "Dada2")3.9 Compare estimates of taxon abundance
3.9.1 Merge the two lists to compute the relation between mothur and dada2 estimates
both_genus <- rbind(genus_mothur, genus_dada2) %>% group_by(Division, Class,
Genus) %>% summarize(n_methods = n()) %>% left_join(genus_dada2) %>% dplyr::rename(dada2 = n_seq) %>%
left_join(genus_mothur) %>% dplyr::rename(mothur = n_seq) %>% arrange(desc(mothur))
n_genus <- nrow(both_genus)
both_class <- both_genus %>% group_by(Division, Class) %>% summarise(dada2 = sum(dada2,
na.rm = TRUE), mothur = sum(mothur, na.rm = TRUE)) %>% arrange(desc(dada2))3.9.2 Class level
ggplot(both_class) + geom_point(aes(mothur, dada2, color = Division)) + geom_smooth(aes(mothur,
dada2), method = "lm") + ggtitle("Class level count")
summary(lm(both_class$dada2 ~ both_class$mothur))
Call:
lm(formula = both_class$dada2 ~ both_class$mothur)
Residuals:
Min 1Q Median 3Q Max
-10628 -207 -167 1 13462
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 199.5926 373.1680 0.53 0.59
both_class$mothur 2.3935 0.0897 26.69 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2950 on 72 degrees of freedom
Multiple R-squared: 0.908, Adjusted R-squared: 0.907
F-statistic: 712 on 1 and 72 DF, p-value: <2e-16
3.9.3 Genus level
ggplot(both_genus) + geom_point(aes(mothur, dada2, color = Division)) + geom_smooth(aes(mothur,
dada2), method = "lm") + ggtitle("Genus level count")
summary(lm(both_genus$dada2 ~ both_genus$mothur))
Call:
lm(formula = both_genus$dada2 ~ both_genus$mothur)
Residuals:
Min 1Q Median 3Q Max
-2932.4 -447.0 -340.0 69.9 4902.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 455.67600 123.53326 3.689 0.000352 ***
both_genus$mothur 1.95003 0.05936 32.848 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1163 on 110 degrees of freedom
(225 observations deleted due to missingness)
Multiple R-squared: 0.9075, Adjusted R-squared: 0.9066
F-statistic: 1079 on 1 and 110 DF, p-value: < 2.2e-16
# Idem for Chlorophyta
ggplot(filter(both_genus, Division == "Chlorophyta")) + geom_point(aes(mothur,
dada2)) + geom_smooth(aes(mothur, dada2), method = "lm") + ggtitle("Genus level count for Chlorophyta")3.9.4 Genus only found with one method
There are 9% of the genera that are only found with mothur while 57% of the genera are only found with dada2. So mothur is missing a lot of genera.
genus_mothur_only <- both_genus %>% filter(is.na(dada2)) %>% arrange(desc(mothur))
n_genus_mothur_only = nrow(genus_mothur_only)
knitr::kable(head(genus_mothur_only, 30), caption = "Genera found only with mothur - top 30") %>%
kable_styling(latex_options = "scale_down", font_size = 7)| Division | Class | Genus | n_methods | dada2 | mothur |
|---|---|---|---|---|---|
| Dinoflagellata | Dinophyceae | Dinophyceae_unclassified | 1 | NA | 3356 |
| Eukaryota_unclassified | Eukaryota_unclassified | Eukaryota_unclassified | 1 | NA | 2184 |
| Metazoa | Ctenophora | Ctenophora_XX_unclassified | 1 | NA | 1463 |
| Radiolaria | Polycystinea | Spumellarida-Group-I_unclassified | 1 | NA | 1213 |
| Radiolaria | Acantharea | Acantharea_unclassified | 1 | NA | 1042 |
| Radiolaria | Acantharea | Arthracanthida-Symphyacanthida_X_unclassified | 1 | NA | 797 |
| Radiolaria | Polycystinea | Pterocoryothoidea_unclassified | 1 | NA | 617 |
| Radiolaria | Acantharea | Dorataspis | 1 | NA | 568 |
| Radiolaria | Polycystinea | Spumellarida_unclassified | 1 | NA | 559 |
| Radiolaria | Polycystinea | Nassellaria_unclassified | 1 | NA | 402 |
| Apusomonadidae | Apusomonadidae_unclassified | Apusomonadidae_unclassified | 1 | NA | 358 |
| Radiolaria | Polycystinea | Sphaerozoidae_unclassified | 1 | NA | 208 |
| Archaeplastida_unclassified | Archaeplastida_unclassified | Archaeplastida_unclassified | 1 | NA | 123 |
| Haptophyta | Prymnesiophyceae | Prymnesiophyceae_unclassified | 1 | NA | 118 |
| Radiolaria | Acantharea | Acantharea_XX_unclassified | 1 | NA | 108 |
| Dinoflagellata | Syndiniales | Dino-Group-II_unclassified | 1 | NA | 90 |
| Chlorophyta | Pyramimonadales | Pyramimonadales_XX_unclassified | 1 | NA | 86 |
| Apicomplexa | Apicomplexa_X | Colpodellidae_X_unclassified | 1 | NA | 83 |
| Radiolaria | Acantharea | Chaunacanthida_X_unclassified | 1 | NA | 83 |
| Chlorophyta | Prasino-Clade-VII | Prasino-Clade-VII-A_unclassified | 1 | NA | 75 |
| Radiolaria | RAD-B | RAD-B_X_unclassified | 1 | NA | 67 |
| Cercozoa | Cercozoa_unclassified | Cercozoa_unclassified | 1 | NA | 62 |
| Radiolaria | Acantharea | Acantharea-Group-VI_X_unclassified | 1 | NA | 50 |
| Dinoflagellata | Dinophyceae | Gymnodiniaceae_unclassified | 1 | NA | 39 |
| Radiolaria | Radiolaria_unclassified | Radiolaria_unclassified | 1 | NA | 34 |
| Chlorophyta | Chlorophyta_unclassified | Chlorophyta_unclassified | 1 | NA | 33 |
| Rhizaria_unclassified | Rhizaria_unclassified | Rhizaria_unclassified | 1 | NA | 32 |
| Radiolaria | RAD-B | RAD-B-Group-IV_unclassified | 1 | NA | 23 |
| Alveolata_unclassified | Alveolata_unclassified | Alveolata_unclassified | 1 | NA | 15 |
| Radiolaria | Polycystinea | Plagiacanthoidea_unclassified | 1 | NA | 14 |
genus_dada2_only <- both_genus %>% filter(is.na(mothur)) %>% arrange(desc(dada2))
n_genus_dada2_only = nrow(genus_dada2_only)
knitr::kable(head(genus_dada2_only, 30), caption = "Genera found only with dada2 - top 30") %>%
kable_styling(latex_options = "scale_down", font_size = 7)| Division | Class | Genus | n_methods | dada2 | mothur |
|---|---|---|---|---|---|
| Dinoflagellata | Syndiniales | Dino-Group-II-Clade-26_X | 1 | 11300 | NA |
| Ochrophyta | Synurophyceae | Tessellaria | 1 | 11146 | NA |
| Dinoflagellata | Dinophyceae | Heterocapsa | 1 | 6179 | NA |
| Radiolaria | Acantharea | Coleaspis | 1 | 1234 | NA |
| Radiolaria | Polycystinea | Spumellarida-Group-III_X | 1 | 838 | NA |
| Apusomonadidae | Apusomonadidae_Group-1 | Amastigomonas | 1 | 660 | NA |
| Dinoflagellata | Dinophyceae | Azadinium | 1 | 654 | NA |
| Radiolaria | Polycystinea | Sphaerozoum | 1 | 645 | NA |
| Apusomonadidae | Apusomonadidae_Group-2 | Apusomonadidae_Group-2A_XX | 1 | 619 | NA |
| Chlorophyta | Prasino-Clade-VII | Prasino-Clade-VII-A-1 | 1 | 580 | NA |
| Dinoflagellata | Dinophyceae | Lepidodinium | 1 | 499 | NA |
| Radiolaria | Acantharea | Acanthometra | 1 | 432 | NA |
| Radiolaria | Polycystinea | Dictyocoryne | 1 | 368 | NA |
| Ochrophyta | Chrysophyceae | Chrysophyceae_Clade-G_X | 1 | 279 | NA |
| Cercozoa | Phaeodarea | Challengeron | 1 | 265 | NA |
| Cercozoa | Filosa-Chlorarachnea | Minorisa | 1 | 249 | NA |
| Radiolaria | Polycystinea | Spongodiscus | 1 | 244 | NA |
| Stramenopiles_X | MAST | MAST-1C_X | 1 | 243 | NA |
| Radiolaria | Acantharea | Phractopelta | 1 | 240 | NA |
| Radiolaria | Polycystinea | Spongocore | 1 | 231 | NA |
| Radiolaria | Acantharea | Symphyacanthida | 1 | 204 | NA |
| Chlorophyta | Chlorophyceae | Scenedesmus | 1 | 186 | NA |
| Cercozoa | Filosa-Thecofilosea | Protaspa | 1 | 183 | NA |
| Dinoflagellata | Syndiniales | Dino-Group-II-Clade-1_X | 1 | 176 | NA |
| Choanoflagellida | Choanoflagellida_X | Choanoflagellida_XX_Clade-3_X | 1 | 169 | NA |
| Dinoflagellata | Dinophyceae | Blastodinium | 1 | 159 | NA |
| Dinoflagellata | Dinophyceae | Lessardia | 1 | 145 | NA |
| Dinoflagellata | Syndiniales | Dino-Group-II-Clade-23_X | 1 | 145 | NA |
| Radiolaria | Acantharea | Chaunacanthid | 1 | 145 | NA |
| Radiolaria | Acantharea | Staurolithium | 1 | 145 | NA |
3.10 Comparaison per station for the whole community
Compute sum per division and per station
division_sample_dada2 <- dada2_long %>% group_by(Division, sample) %>% summarize(n_seq = sum(n_seq)) %>%
group_by(sample) %>% mutate(n_seq_pct = n_seq/sum(n_seq))
division_sample_mothur <- mothur_long %>% group_by(Division, sample) %>% summarize(n_seq = sum(n_seq)) %>%
group_by(sample) %>% mutate(n_seq_pct = n_seq/sum(n_seq))Collapse divisions that are less than 5% of the total to “Others”
# dada2
# Compute the percentage of sequences in each division and rename the
# sequences with less 1% to others
division_dada2 <- dada2_long %>% group_by(Division) %>% summarize(n_seq = sum(n_seq)) %>%
mutate(n_seq_pct = n_seq/sum(dada2_long$n_seq) * 100) %>% mutate(Division_collapsed = if_else(n_seq_pct <
5, "Others", Division))
# Merge the new column with 'Others'
division_sample_dada2 <- left_join(division_sample_dada2, select(division_dada2,
Division, Division_collapsed))
# mothur
# Compute the percentage of sequences in each division and rename the
# sequences with less 1% to others
division_mothur <- mothur_long %>% group_by(Division) %>% summarize(n_seq = sum(n_seq)) %>%
mutate(n_seq_pct = n_seq/sum(mothur_long$n_seq) * 100) %>% mutate(Division_collapsed = if_else(n_seq_pct <
5, "Others", Division))
# Merge the new column with 'Others'
division_sample_mothur <- left_join(division_sample_mothur, select(division_mothur,
Division, Division_collapsed))Doing the graphs for the absolute abundance
ggplot(division_sample_mothur, aes(x = sample, y = n_seq, fill = Division_collapsed)) +
geom_bar(stat = "identity") + xlab("Stations") + ylab("Sequences") + ylim(0,
35000) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + ggtitle("Mothur - absolute")
ggplot(division_sample_dada2, aes(x = sample, y = n_seq, fill = Division_collapsed)) +
geom_bar(stat = "identity") + xlab("Stations") + ylab("Sequences") + ylim(0,
35000) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + ggtitle("dada2 - absolute")Doing the graphs for the relative abundance
ggplot(division_sample_mothur, aes(x = sample, y = n_seq_pct, fill = Division_collapsed)) +
geom_bar(stat = "identity") + xlab("Stations") + ylab("Sequences") + theme(axis.text.x = element_text(angle = 90,
vjust = 0.5)) + ggtitle("Mothur - relative")
ggplot(division_sample_dada2, aes(x = sample, y = n_seq_pct, fill = Division_collapsed)) +
geom_bar(stat = "identity") + xlab("Stations") + ylab("Sequences") + theme(axis.text.x = element_text(angle = 90,
vjust = 0.5)) + ggtitle("dada2 - relative") Notes Some samples have different composition with dada2 than with mothur
Sample 074_070 has a large population of Ochrophyta (collapsed into “Others” - visible by changing the threshold for others). They correspond to Chrysophyceae clade I (wrongly assigned by dada2 to Synurophyceae).
Sample 135_005 has more Dinoflagellata than in mothur
3.11 Comparaison per station for Chlorophyta
Compute sum per division and per station
chlo_genus_sample_dada2 <- dada2_long %>% filter(Division == "Chlorophyta") %>%
group_by(Genus, sample) %>% summarize(n_seq = sum(n_seq)) %>% group_by(sample) %>%
mutate(n_seq_pct = n_seq/sum(n_seq))
chlo_genus_sample_mothur <- mothur_long %>% filter(Division == "Chlorophyta") %>%
group_by(Genus, sample) %>% summarize(n_seq = sum(n_seq)) %>% group_by(sample) %>%
mutate(n_seq_pct = n_seq/sum(n_seq))Collapse genera that are less than 1% of the total to “Others”
# dada2
# Compute the percentage of sequences in each division and rename the
# sequences with less 1% to others
chlo_dada2_long <- dada2_long %>% filter(Division == "Chlorophyta")
chlo_genus_dada2 <- chlo_dada2_long %>% group_by(Genus) %>% summarize(n_seq = sum(n_seq)) %>%
mutate(n_seq_pct = n_seq/sum(chlo_dada2_long$n_seq) * 100) %>% mutate(Genus_collapsed = if_else(n_seq_pct <
1, "Others", Genus))
# Merge the new column with 'Others'
chlo_genus_sample_dada2 <- left_join(chlo_genus_sample_dada2, select(chlo_genus_dada2,
Genus, Genus_collapsed))
# mothur
# Compute the percentage of sequences in each division and rename the
# sequences with less 1% to others
chlo_mothur_long <- mothur_long %>% filter(Division == "Chlorophyta")
chlo_genus_mothur <- chlo_mothur_long %>% group_by(Genus) %>% summarize(n_seq = sum(n_seq)) %>%
mutate(n_seq_pct = n_seq/sum(chlo_mothur_long$n_seq) * 100) %>% mutate(Genus_collapsed = if_else(n_seq_pct <
1, "Others", Genus))
# Merge the new column with 'Others'
chlo_genus_sample_mothur <- left_join(chlo_genus_sample_mothur, select(chlo_genus_mothur,
Genus, Genus_collapsed))Doing the graphs for the relative abundance
ggplot(chlo_genus_sample_mothur, aes(x = sample, y = n_seq_pct, fill = Genus_collapsed)) +
geom_bar(stat = "identity") + xlab("Stations") + ylab("Sequences") + theme(axis.text.x = element_text(angle = 90,
vjust = 0.5)) + ggtitle("Mothur - Chlorophyta relative")
ggplot(chlo_genus_sample_dada2, aes(x = sample, y = n_seq_pct, fill = Genus_collapsed)) +
geom_bar(stat = "identity") + xlab("Stations") + ylab("Sequences") + theme(axis.text.x = element_text(angle = 90,
vjust = 0.5)) + ggtitle("dada2 - Chlorophyta relative")4 Conclusions
The comparison has been made only for stations where both mothur and dada2 were used. There is no overall dramatic differnces in the two methods but since there almost 3 times more sequences for dada2 than for mothur, I propose to use the dada2 data. Moreover with the dada2 sequences we have recovered the stations for which there was a problem with the barcodes.
Mothur
OTUs build at 98%
[1] "Number of sequences with mothur = 120 343"
[1] "Number of Otus with mothur = 747"
[1] "Number of Species with mothur = 155"
[1] "Number of Genera with mothur = 143"
[1] "Number of genera found only with mothur = 31 (9.20 %)"
Dada2
The raw sequences have been processed with cut-adapt
[1] "Number of sequences with dada2 = 302 809"
[1] "Number of Otus with dada2 = 1783"
[1] "Number of Species with dada2 = 358"
[1] "Number of Genera with dada2 = 306"
[1] "Number of genera found only with dada2 = 194 (57.57 %)"