Tutorial - Compare results of Dada2 vs Mothur

1 Aim

Compare the results of the analysis by mothur vs dada2

  • Prerequisite : the R_dada2_tutorial must have been run before

2 Directory structure

Relative to the main directory from GitHub

  • ../fastq : fastq files
  • ../dada2 : dada2 processed files
  • ../R_dada2 : This tutorial

4 Phyloseq

4.1 Read phyloseq object for dada2

The file has been created in the R_dada2_tutorial.Rmd

4.2 Create a phyloseq object for the mothur results

5 Comparison between dada2 and mothur

5.1 Compare at the division level

## Compare by aggregation

5.1.2 Aggregate by Division, Class, Genus, Species

Division Class Genus Species n_seq
Centroheliozoa Centroheliozoa_X Pterocystida_XX Pterocystida_XX_sp. 20
Chlorophyta Mamiellophyceae Bathycoccus Bathycoccus_prasinos 367
Chlorophyta Mamiellophyceae Micromonas Micromonas_Clade-B..4 19
Chlorophyta Mamiellophyceae Micromonas Micromonas_Clade-B.E.3 20
Chlorophyta Mamiellophyceae Ostreococcus Ostreococcus_lucimarinus 36
Chlorophyta Mamiellophyceae Ostreococcus Ostreococcus_tauri 222
Division Class Genus Species n_seq
Centroheliozoa Centroheliozoa_X Pterocystida_XX Pterocystida_XX_sp. 24
Chlorophyta Mamiellophyceae Bathycoccus Bathycoccus_prasinos 249
Chlorophyta Mamiellophyceae Dolichomastix Dolichomastix_tenuilepis 3
Chlorophyta Mamiellophyceae Micromonas Micromonas_Clade-A.ABC.1-2 21
Chlorophyta Mamiellophyceae Micromonas Micromonas_Clade-B..4 10
Chlorophyta Mamiellophyceae Micromonas Micromonas_Clade-B.E.3 11

5.1.4 Plot at Class level


Call:
lm(formula = both_class$dada2 ~ both_class$mothur)

Residuals:
    Min      1Q  Median      3Q     Max 
-96.108 -15.934   3.425  11.841 113.903 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        4.14811   11.64032   0.356    0.726    
both_class$mothur  1.68365    0.05887  28.597   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 39.53 on 18 degrees of freedom
Multiple R-squared:  0.9785,    Adjusted R-squared:  0.9773 
F-statistic: 817.8 on 1 and 18 DF,  p-value: < 2.2e-16

5.1.5 Plot at Species level


Call:
lm(formula = both_species$dada2 ~ both_species$mothur)

Residuals:
    Min      1Q  Median      3Q     Max 
-63.624  -6.747   2.490  13.762  42.152 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          1.47566    5.57181   0.265    0.793    
both_species$mothur  1.72349    0.04935  34.922   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.23 on 26 degrees of freedom
  (36 observations deleted due to missingness)
Multiple R-squared:  0.9791,    Adjusted R-squared:  0.9783 
F-statistic:  1220 on 1 and 26 DF,  p-value: < 2.2e-16

6 Conclusion on the dada2 pipeline

  • The dada2 pipeline yieds 1.7 more reads than mothur
  • The number of reads at the species and class levels are correlated
  • It is very fast, the longest step is the taxonomy assignement
  • It offers the advantage of having everything performed under R.

Daniel Vaulot

23 11 2018