NZ 2017 - Step 06
Create Phyloseq and Fasta files
NZ 2017 - Step 06
Create Phyloseq and Fasta files
Load the variables common to the different scripts and the necessary libraries
source("00-NZ_2017_init.R", echo = FALSE)1 Steps
- Read Mothur database file (Excel format)
- Creating phyloseq_mothur and fasta files.
- Reassign the otus using dada2
- Create a new phyloseq_dada2 file based on the dada2 assignement.
2 Directory structure
- /blast : BLAST output
- /mothur : mothur output
3 Files used
- NZ_2017_mothur.xlsx : contains the mothur database file with otus at 98% similarity
- phyloseq_nz_2017_mothur.rds : phyloseq file based on the mothur taxonomic assignement
- nz_2017_otu_mothur_taxo.fas : otu fasta file with mothur taxo on defination line
- nz_2017_otu_mothur.fas : otu fasta file without taxonomy
- phyloseq_nz_2017_dada2.rds : phyloseq file based on dada2 taxonomic assignement
4 Processing
4.1 To do
Note for mater : Make a function * phyloseq_import_mothur(phyloseq_file, excel_file, otu_sheet, sample_sheet, sample_reg_ex, taxo_reg_ex).
4.2 Read the Mothur Excel file
mothur_file_xls <- paste0(path_files, "NZ_2017_mothur.xlsx")
samples_metadata <- read_excel(mothur_file_xls, sheet = "samples")
otu_database <- read_excel(mothur_file_xls, sheet = "otus_0.98")4.3 Input data from XLS file
# 1. samples table : row names are labelled by pcr_codes
samples_df <- samples_metadata
row.names(samples_df) <- samples_df$sample_code_mothur
# 2. otu table : select only OTU column and abundance columns (Use a regular
# expression...)
otu <- otu_database %>% select(OTUNumber, matches("^(BT|CS|CTD|MS|N|UW|STR)."))
row.names(otu) <- otu$OTUNumber
otu <- otu %>% select(-OTUNumber)
# 3. Taxonomy table
tax <- otu_database %>% select(OTUNumber, starts_with("Taxo"))
row.names(tax) <- tax$OTUNumber
tax <- tax %>% select(-OTUNumber)4.4 Create and save to phyloseq object
# Transform into matrixes
# ---------------------------------------------------
otu_mat <- as.matrix(otu)
tax_mat <- as.matrix(tax)
# Transform to phyloseq object and save to Rdata file
# --------------------------
OTU = otu_table(otu_mat, taxa_are_rows = TRUE)
TAX = tax_table(tax_mat)
samples = sample_data(samples_df)
phyloseq_mothur <- phyloseq(OTU, TAX, samples)
saveRDS(phyloseq_mothur, file = "phyloseq_nz_2017_mothur.rds") # Can be loaded with readRDS(file = 'phyloseq_nz_2017_mothur.rds')4.5 Save sequences in fasta files with and without mothur taxonomy
File in compressed format is called nz_2017_otu_mothur.fas.gz in the /mothur directory
# 4. sequences
otu_sequence <- otu_database %>% select(OTUNumber, matches("^Taxo."), repSeq) %>%
transmute(seq_name = OTUNumber, sequence = repSeq, supergroup = Taxo2, division = Taxo3,
class = Taxo4, order = Taxo5, family = Taxo6, genus = Taxo7, species = Taxo8)
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur_taxo.fas"),
compress = FALSE)[1] TRUE
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur_taxo.fas.gz"),
compress = TRUE)[1] TRUE
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur.fas"), compress = FALSE,
taxo_include = FALSE)[1] TRUE
fasta_write(otu_sequence, str_c(path_mothur, "nz_2017_otu_mothur.fas.gz"), compress = TRUE,
taxo_include = FALSE)[1] TRUE
4.6 Reassign the sequences using dada2 assigner
See the BLAST analysis file (07). Mothur assingment add Unassigned which is really a problem for next step with phyloseq_mothur, so the otus are reassigned with dada2 and the phyloseq file written again with the new taxonomy (file “phyloseq_dada2_nz_2017.rds”)
tax_list_dada2 <- dada2_assign(seq_file_name = str_c(path_mothur, "nz_2017_otu_mothur.fas"))4.7 Write a new phyloseq file with taxonomy assigned by dada2
tax <- read_tsv(str_c(path_mothur, "nz_2017_otu_mothur.dada2.taxo"))
row.names(tax) <- tax$seq_name
tax <- tax %>% select(-seq_name)
tax_mat <- as.matrix(tax)
TAX = tax_table(tax_mat)
phyloseq_dada2 <- phyloseq(OTU, TAX, samples)
saveRDS(phyloseq_dada2, file = "phyloseq_nz_2017_dada2.rds") # Can be loaded with readRDS(file = 'phyloseq_nz_2017_dada2.rds')