NZ 2017 - Step 06
Create Phyloseq and Fasta files

Load the variables common to the different scripts and the necessary libraries

source("00-NZ_2017_init.R", echo = FALSE)

1 Steps

  • Read Mothur database file (Excel format)
  • Creating phyloseq_mothur and fasta files.
  • Reassign the otus using dada2
  • Create a new phyloseq_dada2 file based on the dada2 assignement.

2 Directory structure

  • /blast : BLAST output
  • /mothur : mothur output

3 Files used

  • NZ_2017_mothur.xlsx : contains the mothur database file with otus at 98% similarity
  • phyloseq_nz_2017_mothur.rds : phyloseq file based on the mothur taxonomic assignement
  • nz_2017_otu_mothur_taxo.fas : otu fasta file with mothur taxo on defination line
  • nz_2017_otu_mothur.fas : otu fasta file without taxonomy
  • phyloseq_nz_2017_dada2.rds : phyloseq file based on dada2 taxonomic assignement

4 Processing

4.1 To do

Note for mater : Make a function * phyloseq_import_mothur(phyloseq_file, excel_file, otu_sheet, sample_sheet, sample_reg_ex, taxo_reg_ex).

4.2 Read the Mothur Excel file

mothur_file_xls <- paste0(path_files, "NZ_2017_mothur.xlsx")
samples_metadata <- read_excel(mothur_file_xls, sheet = "samples")
otu_database <- read_excel(mothur_file_xls, sheet = "otus_0.98")

4.3 Input data from XLS file

# 1. samples table : row names are labelled by pcr_codes
samples_df <- samples_metadata
row.names(samples_df) <- samples_df$sample_code_mothur

# 2. otu table : select only OTU column and abundance columns (Use a regular
# expression...)
otu <- otu_database %>% select(OTUNumber, matches("^(BT|CS|CTD|MS|N|UW|STR)."))
row.names(otu) <- otu$OTUNumber
otu <- otu %>% select(-OTUNumber)

# 3. Taxonomy table
tax <- otu_database %>% select(OTUNumber, starts_with("Taxo"))
row.names(tax) <- tax$OTUNumber
tax <- tax %>% select(-OTUNumber)

4.4 Create and save to phyloseq object

# Transform into matrixes
# ---------------------------------------------------
otu_mat <- as.matrix(otu)
tax_mat <- as.matrix(tax)

# Transform to phyloseq object and save to Rdata file
# --------------------------
OTU = otu_table(otu_mat, taxa_are_rows = TRUE)
TAX = tax_table(tax_mat)
samples = sample_data(samples_df)

phyloseq_mothur <- phyloseq(OTU, TAX, samples)
saveRDS(phyloseq_mothur, file = "phyloseq_nz_2017_mothur.rds")  # Can be loaded with readRDS(file = 'phyloseq_nz_2017_mothur.rds')

4.5 Save sequences in fasta files with and without mothur taxonomy

File in compressed format is called nz_2017_otu_mothur.fas.gz in the /mothur directory

# 4. sequences
otu_sequence <- otu_database %>% select(OTUNumber, matches("^Taxo."), repSeq) %>% 
    transmute(seq_name = OTUNumber, sequence = repSeq, supergroup = Taxo2, division = Taxo3, 
        class = Taxo4, order = Taxo5, family = Taxo6, genus = Taxo7, species = Taxo8)
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur_taxo.fas"), 
    compress = FALSE)
[1] TRUE
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur_taxo.fas.gz"), 
    compress = TRUE)
[1] TRUE
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur.fas"), compress = FALSE, 
    taxo_include = FALSE)
[1] TRUE
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur.fas.gz"), compress = TRUE, 
    taxo_include = FALSE)
[1] TRUE

4.6 Reassign the sequences using dada2 assigner

See the BLAST analysis file (07). Mothur assingment add Unassigned which is really a problem for next step with phyloseq_mothur, so the otus are reassigned with dada2 and the phyloseq file written again with the new taxonomy (file “phyloseq_dada2_nz_2017.rds”)

tax_list_dada2 <- dada2_assign(seq_file_name = str_c(path_mothur, "nz_2017_otu_mothur.fas"))

4.7 Write a new phyloseq file with taxonomy assigned by dada2

tax <- read_tsv(str_c(path_mothur, "nz_2017_otu_mothur.dada2.taxo"))
row.names(tax) <- tax$seq_name
tax <- tax %>% select(-seq_name)
tax_mat <- as.matrix(tax)
TAX = tax_table(tax_mat)
phyloseq_dada2 <- phyloseq(OTU, TAX, samples)
saveRDS(phyloseq_dada2, file = "phyloseq_nz_2017_dada2.rds")  # Can be loaded with readRDS(file = 'phyloseq_nz_2017_dada2.rds')

Daniel Vaulot

07 05 2018