T-BAS Taxonomy Resolver
The T-BAS Taxonomy Resolver generates a single, consolidated taxonomy from one or more T-BAS placement runs by reconciling phylogenetic placement with multiple reference taxonomies and packaging the results for downstream analysis.
This page documents the full functionality exposed in the resolver interface, including:
- accession-based input
- taxonomy strategy presets
- downstream input files and exports
- DESeq2 analysis options
- abundance and core microbiome outputs
- advanced taxonomy-resolution controls
- PopPhy-CNN labeling options
- Faith's phylogenetic diversity options
- species-resolution limits and interpretation
What the resolver does
At a high level, the resolver:
- accepts one or more T-BAS accession IDs
- retrieves taxonomy and placement information linked to those accessions
- reconciles phylogenetic placement with one or more reference taxonomies
- applies confidence- and conflict-aware rules to produce a single best taxonomy per feature
- optionally prepares outputs for downstream tools such as QIIME / phyloseq / BIOM / iTOL / DESeq2 / PopPhy-CNN / Faith's PD
This makes the resolver more than a simple extractor. It is a taxonomy consolidation and downstream export tool.
Interpreting resolver output columns
The resolver output is intentionally more detailed than a simple taxonomy table. It is designed to show not only what taxonomy was assigned, but also why that assignment was made, which sources agreed or disagreed, and whether the tree forced the resolver to be more conservative.
This section explains the main output columns in practical terms and gives examples of how to read them.
Why these columns matter
A user might expect the resolver to do something simple such as:
- read a T-BAS placement
- look up a database taxonomy
- return a single name
But the resolver is doing something more sophisticated. It is trying to answer questions such as:
- Did T-BAS, SILVA, RDP, and NCBI all agree?
- Did they only agree to genus, but not species?
- Was the database genus even present in the reference tree?
- Did the tree suggest that the database label was too specific or phylogenetically inconsistent?
- Should species be retained, or should the assignment stop at genus?
The output columns are there to make those decisions visible and auditable.
A worked example
Consider a row like this:
FeatureID: OTU1
best_taxonomy: Bacteria; Bacillota; Bacilli; Bacillales; Caryophanaceae; Bhargavaea; cecembensis
best_taxonomy_confidence_filtered: Bacteria; Bacillota; Bacilli; Bacillales; Bacillaceae; Neobacillus; Unclassified
deepest_confident_rank: GENUS
tbas_proxy_lineage: TRUE
tbas_proxy_rank: FAMILY
tree_contains_tbas_genus: TRUE
tree_contains_database_genus: FALSE
tbas_forced_placement: TRUE
sources_used: T-BAS,SILVA,RDP,NCBI
conflict_flag: SILVA
conflict_rank: CLASS
agreement_sources: NCBI,RDP,T-BAS
taxonomy_change_type: tree_forced_adjustment
change_reason_detail: database_genus_absent_from_tree
taxonomy_change_summary: FAMILY:Caryophanaceae->Bacillaceae;GENUS:Bhargavaea->Neobacillus
confidence: 0.884274023
This should be interpreted as follows:
- The resolver initially had a more detailed lineage in
best_taxonomy. - After applying tree-aware checks and confidence filtering, the final recommended lineage changed to
Neobacillusand the species label was removed. tree_contains_database_genus = FALSEmeans the database genus supporting the original label was not represented in the reference tree.tbas_forced_placement = TRUEmeans the tree prevented the resolver from simply keeping the database label.deepest_confident_rank = GENUSmeans genus was considered defensible, but species was not.taxonomy_change_type = tree_forced_adjustmentmeans the final taxonomy differs from the original because the tree overruled the database-supported label.change_reason_detail = database_genus_absent_from_treegives the specific reason for that adjustment.
This is exactly the kind of situation where the output columns are essential. Without them, the user would only see that the taxonomy changed and would not know why.
Core taxonomy fields
FeatureID
The feature identifier from the input counts table or pretty report.
Examples:
OTU1ASV_24
This is the key used to link taxonomy to counts, metadata, QIIME outputs, phyloseq tables, and downstream analyses.
best_taxonomy
This is the full selected taxonomy before conservative filtering is applied.
It reflects the best-supported lineage that the resolver could assemble, using the selected policies and source logic.
This field is useful when:
- you want to inspect the most detailed interpretation
- you want to see what the resolver would keep before species or lower-rank confidence filtering is applied
- you want to compare detailed assignments against the more conservative final result
Example
best_taxonomy:
Bacteria; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Lelliottia; amnigena
This means the resolver found enough information to support a detailed lineage down to species before filtering.
best_taxonomy_confidence_filtered
This is the final recommended taxonomy and is usually the most important field for downstream use.
It is built by taking best_taxonomy and then applying:
- confidence filtering
- tree-aware conflict handling
- species-mode rules
- synonym-aware reconciliation
- rank padding, if enabled
This is typically the field used for:
- QIIME
- phyloseq
- PopPhy-CNN
- bundled exports
- publication-facing summaries
Example
best_taxonomy_confidence_filtered:
Bacteria; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Enterobacter; Unclassified
This means:
- the resolver accepts the lineage to genus
- species is not considered reliable enough to keep
- the final export should stop at genus
deepest_confident_rank
This field tells you the lowest rank that the resolver considers reliable after filtering.
It is extremely useful because it summarizes, in one word, where the taxonomy should stop.
Possible values include:
PHYLUMCLASSORDERFAMILYGENUSSPECIES
Example 1
deepest_confident_rank: GENUS
Interpretation:
- taxonomy is trusted down to genus
- species was dropped or not defensible
Example 2
deepest_confident_rank: FAMILY
Interpretation:
- genus is not considered safe
- the assignment should only be interpreted at family level
confidence
This is the numeric support value associated with the selected assignment, often based on EPA LWR or a related confidence metric.
A higher value usually indicates stronger placement support.
Example
confidence: 0.995802794
This would usually be interpreted as very strong support.
Example
confidence: 0.51
This would be much weaker and may be more likely to trigger conservative fallback behavior.
Important:
- a high confidence value does not automatically guarantee species-level support
- the species-mode rules and tree checks can still remove species if the lineage is not considered defensible
Tree-aware interpretation columns
These columns explain how the resolver used the phylogenetic tree to constrain or reinterpret taxonomy.
tbas_proxy_lineage
This indicates whether the T-BAS lineage should be interpreted as a proxy lineage rather than a definitive direct match.
A proxy lineage means:
- the placement indicates the nearest lineage available in the tree
- but the exact taxon of interest may not actually be represented in the tree
This is common when:
- the reference tree does not contain every species
- placements are being made against a broader lineage framework
Example
tbas_proxy_lineage: TRUE
Interpretation:
- the T-BAS lineage is being used as the nearest supported phylogenetic lineage
- it should not automatically be read as exact species identity
tbas_proxy_rank
This tells you the deepest rank at which the T-BAS lineage is acting as a reasonable proxy.
Example
tbas_proxy_rank: FAMILY
Interpretation:
- the resolver considers the lineage informative only up to family in a proxy sense
- deeper ranks such as genus or species are less trustworthy as direct identity labels
Example
tbas_proxy_rank: GENUS
Interpretation:
- the placement likely supports genus-level interpretation
- but not necessarily species
tree_contains_tbas_genus
This tells you whether the genus implied by the T-BAS lineage is actually present in the reference tree.
Example
tree_contains_tbas_genus: TRUE
Interpretation:
- the T-BAS genus is represented in the tree
- the placement has a phylogenetic anchor at genus level
Example
tree_contains_tbas_genus: FALSE
Interpretation:
- the genus implied by T-BAS is not literally represented in the tree
- the assignment may only be interpretable at a higher rank or as a proxy lineage
tree_contains_database_genus
This tells you whether the genus coming from the database-supported taxonomy is present in the reference tree.
This is a critical tree-vs-database consistency check.
Example
tree_contains_database_genus: TRUE
Interpretation:
- the database genus is represented in the tree
- there is no immediate tree-representation conflict at genus level
Example
tree_contains_database_genus: FALSE
Interpretation:
- the database genus is not represented in the tree
- the resolver may consider the database assignment too specific or not phylogenetically supported by the current reference tree
tbas_forced_placement
This indicates that the final filtered taxonomy was affected by a tree-forced placement decision.
When this is TRUE, it usually means:
- a database label was available
- but the tree did not support simply keeping it
- so the resolver adjusted the result to maintain phylogenetic consistency
Example
tbas_forced_placement: TRUE
Interpretation:
- the tree overruled a more naive database-based interpretation
- the final taxonomy reflects phylogenetic constraint, not just label agreement
Conflict and agreement columns
These columns show whether sources agreed, which ones disagreed, and where the disagreement started.
sources_used
This is the list of taxonomy sources that contributed non-empty information for the feature.
Examples:
T-BAS,SILVA,RDP,NCBIT-BAS,UNITE,NCBI
This tells you which sources actually participated in the decision.
Example
sources_used: T-BAS,SILVA,RDP,NCBI
Interpretation:
- all four of those sources had usable taxonomy for this feature
- consensus and conflict logic considered all of them
agreement_sources
These are the sources that agree with the final retained taxonomy at the decisive conflict point.
This is one of the most important transparency fields because it tells you which sources support the final answer.
Example
agreement_sources: NCBI,RDP,T-BAS
Interpretation:
- the final lineage is supported by those three sources
- another source, such as SILVA, disagreed in a meaningful way
conflict_flag
This field identifies which source or sources are in conflict.
It does not list agreeing sources. It is designed to flag the disagreeing source(s).
Example
conflict_flag: SILVA
Interpretation:
- SILVA disagreed with the supported consensus
Example
conflict_flag: none
Interpretation:
- all relevant sources agreed after synonym handling
conflict_rank
This tells you the first rank at which a meaningful disagreement begins.
Example
conflict_rank: GENUS
Interpretation:
- sources agree above genus
- disagreement begins at genus
Example
conflict_rank: CLASS
Interpretation:
- disagreement begins much higher
- this is a more substantial conflict
This is useful because it tells you whether the conflict is:
- minor and late in the lineage
- or broad and deep in the taxonomy
Change-tracking columns
These explain why the final confidence-filtered taxonomy differs from the more detailed raw choice.
taxonomy_change_type
This is a high-level label describing the reason the final taxonomy changed.
Common values include:
nonesynonym_replacementconsensus_overridetree_forced_adjustment
Example
taxonomy_change_type: none
Interpretation:
- the filtered taxonomy is effectively the same as the raw selected taxonomy
Example
taxonomy_change_type: consensus_override
Interpretation:
- the final taxonomy changed because a better-supported cross-source consensus was chosen
Example
taxonomy_change_type: tree_forced_adjustment
Interpretation:
- the tree forced the resolver to step away from the more naive or database-driven label
change_reason_detail
This is a more specific explanation nested inside taxonomy_change_type.
This is the field where you often see phrases that really need documentation.
Examples include:
database_genus_absent_from_treemulti_source_consensus_overrideconsensus_override_with_genus_reclassification
Example
change_reason_detail: database_genus_absent_from_tree
Interpretation:
- the database genus was not represented in the reference tree
- therefore the resolver could not simply keep that database genus as the final answer
- a tree-aware correction was applied
Example
change_reason_detail: multi_source_consensus_override
Interpretation:
- several sources supported a different lineage than the raw detailed choice
- the final taxonomy followed the stronger consensus
Example
change_reason_detail: consensus_override_with_genus_reclassification
Interpretation:
- the consensus changed the genus, and the visible change was best described as a genus-level reclassification
taxonomy_change_summary
This is a compact, human-readable summary of the visible taxonomic changes.
Example
taxonomy_change_summary: GENUS:Lelliottia->Enterobacter
Interpretation:
- the final confidence-filtered lineage changed the genus from Lelliottia to Enterobacter
Example
taxonomy_change_summary: FAMILY:Caryophanaceae->Bacillaceae;GENUS:Bhargavaea->Neobacillus
Interpretation:
- both family and genus changed
- this was not a tiny species-level tweak but a more substantial lineage correction
This field is especially useful when scanning large result tables by eye.
synonym_applied
This indicates whether an explicit synonym or reclassification rule contributed to the final result.
Example
synonym_applied: TRUE
Interpretation:
- part of the final taxonomy depended on a synonym or updated taxonomic equivalence
- for example, an old and new lineage name may have been reconciled
Example
synonym_applied: FALSE
Interpretation:
- the final result did not require synonym handling in a meaningful way
Special example: database_genus_absent_from_tree
This is one of the most important terms to document because users often do not immediately understand it.
What it means
The database-supported genus was not represented as a tip or lineage in the current reference tree.
That does not automatically mean the database genus is wrong in an absolute sense. It means:
- the current tree cannot directly support that genus-level label
- the resolver must decide whether to:
- follow the tree
- follow the database
- or back off to a more conservative rank
Why this matters
The resolver is designed to be phylogeny-aware, not just database-aware.
So if the database says:
Genus = Bhargavaea
but the tree does not contain Bhargavaea at all, then the resolver cannot simply claim that the phylogenetic placement supports that genus.
Instead, it may:
- keep a broader lineage that the tree does support
- switch to a neighboring supported lineage
- or fall back to genus or family-level confidence filtering
Example interpretation
Suppose you see:
tree_contains_database_genus: FALSE
tbas_forced_placement: TRUE
taxonomy_change_type: tree_forced_adjustment
change_reason_detail: database_genus_absent_from_tree
This means:
- a database-based genus was available
- that genus was not represented in the tree
- the tree forced a more conservative or alternative lineage
- the final answer was adjusted to stay phylogenetically defensible
This is exactly the kind of case where the resolver is doing something more careful than a database lookup.
Single-source comparison columns
When enabled, the resolver can add source-specific comparison columns.
These are designed for method comparison, not for ordinary default use.
Compact source-specific fields
In the cleaner version of the exporter, the preferred comparison columns are:
tbas_only_best_taxonomy_confidence_filteredncbi_only_best_taxonomy_confidence_filteredrdp_only_best_taxonomy_confidence_filteredunite_only_best_taxonomy_confidence_filteredsilva_only_best_taxonomy_confidence_filtered
These show what the final confidence-filtered taxonomy would look like if only that source were used.
Example
If you see:
best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
tbas_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
ncbi_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
Interpretation:
- the consensus and the single-source methods agree
- the assignment is stable across methods
If instead you see:
best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
tbas_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Lelliottia; Unclassified
ncbi_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
Interpretation:
- T-BAS and NCBI disagree at genus level
- the consensus logic resolved that disagreement in favor of Enterobacter
This kind of comparison is extremely useful when auditing why a consensus taxonomy was chosen.
Why the documentation must explain examples, not just field names
A simple glossary is not enough for this tool because many of the fields are not ordinary taxonomy labels. They are decision-trace fields.
For example:
database_genus_absent_from_treeis not just a label- it represents a whole logic pathway:
- database genus present
- tree genus missing
- tree-aware override
- conservative fallback
The documentation therefore needs to explain:
- what the term literally means
- why the resolver uses it
- what kind of biological or analytical situation produces it
- how a user should interpret it
That is why these worked examples are so important.
Supported reference taxonomies
The resolver can incorporate the following sources when available:
- SILVA for 16S/18S
- UNITE for ITS
- RDP for 16S
- NCBI RefSeq
- T-BAS phylogenetic placement
The final taxonomy can reflect agreement across these sources, optionally weighted or prioritized according to the advanced settings described below.
Interface overview
The page is organized into major sections:
- Accession input
- Downstream analysis inputs and exports
- DESeq2 analysis
- Annotation
- Taxonomy strategy
- Advanced / expert settings
- Faith's PD mode and inputs
- Faith's PD analysis settings
The documentation link shown near the page header should point here so users can understand what each option does before running the tool.
Resolver workflow
The normal workflow is:
- submit one or more T-BAS accessions
- choose a taxonomy strategy
- optionally upload counts, metadata, or a consolidated Newick tree
- optionally enable downstream analyses and exports
- run the resolver
- review the consolidated taxonomy and downstream outputs
1. Accession input
Purpose
This is the main required input area for the resolver.
How to use it
Enter one T-BAS accession per line.
Example:
4D2ZPWQH
X8K91M2A
Why multiple accessions are important
The resolver is designed to work with one or more accessions, not just a single run. This is especially important when:
- a large study is analyzed in subsets
- multiple T-BAS placement runs were carried out separately on the same reference tree
- placements from several runs need to be consolidated into one taxonomy table
This enables scalable analysis of very large datasets while preserving a common phylogenetic framework.
Validation
The interface validates accession formatting and shows an error box if one or more entries do not appear to be valid.
2. Downstream analysis inputs and exports
This section is labeled:
Downstream analysis inputs and exports
It explains what additional files are needed to produce analysis-ready outputs.
Why this section matters
The resolver can produce a taxonomy table without counts or metadata, but most meaningful downstream analyses require more than taxonomy alone.
In practice:
- counts are needed for abundance-based analyses
- metadata are needed for grouping, labeling, DESeq2, PopPhy-CNN, and many figure outputs
- a consolidated Newick tree may be needed when placements from multiple runs are merged
Counts file
Upload a counts table (.csv, .tsv, or .txt) if the T-BAS run did not already include one.
Used for:
- phyloseq exports
- BIOM exports
- iTOL count-linked outputs
- DESeq2
- abundance plots
- Faith's PD
Without counts, the taxonomy can still be resolved, but abundance-aware downstream analyses are not possible.
Metadata file
Upload sample metadata if it was not included in the T-BAS run.
Used for:
- sample grouping
- DESeq2 design variables
- PopPhy-CNN label creation
- figure grouping
- paired or blocked analyses
- Faith's PD grouping/blocking
Without metadata, many downstream analyses can still run in limited form, but group-aware interpretation is reduced or impossible.
Consolidated Newick tree (optional)
Upload a Newick tree when results are being resolved across:
- multiple T-BAS runs
- merged placement subsets
- a consolidated tree not stored in any single run folder
This is especially useful when you have placed subsets of a very large dataset separately and then combined them into a grand phylogeny. In that case, the resolver may need access to a Newick tree that does not exist in the original T-BAS run directories.
Export types referenced in this section
The interface specifically highlights:
- Phyloseq / BIOM
- PopPhy-CNN
- iTOL
These are downstream products that may depend on counts, metadata, tree context, or all three.
3. DESeq2 analysis
The DESeq2 card lets the user choose between two high-level presets.
Run full differential analysis
This is the recommended preset for datasets with biological replication.
Use this when you want the resolver workflow to generate:
- formal statistical testing
- likelihood ratio tests
- abundance summaries and plots
- core microbiome summaries
Descriptive analysis only
Use this when the dataset has no replication or is otherwise unsuitable for inferential DESeq2 testing.
This preset is designed for:
- descriptive summaries
- plots
- exploratory abundance inspection
When selected, the interface automatically turns on descriptive_only and disables the global likelihood ratio test.
Preset behavior
The JavaScript presets in the HTML show that:
descriptivesetsdeseqDescriptiveOnly = trueanddeseqRunLrt = falsefullsetsdeseqDescriptiveOnly = falseanddeseqRunLrt = true
Both presets keep abundance plots and the core microbiome module enabled by default.
4. DESeq2 analysis settings
This section controls the actual DESeq2 model and filtering choices.
Feature level (--feature_level)
Options:
otugenus
Use:
- otu for feature-level testing
- genus when you want taxonomically collapsed testing
Grouping / treatment column (--treat_col)
Metadata column defining the groups to compare.
Examples:
- Treatment
- Variety
- Host
- Genotype
- Site
This is one of the most important DESeq2 settings because it determines the design grouping.
Reference level (--ref)
Optional baseline group used for pairwise contrasts.
Use this when one level should serve as the control or baseline.
Levels (--levels)
Comma-separated explicit ordering of group levels.
This is helpful when:
- you want to control factor ordering
- the first value should become the reference level
- output plots or contrasts should follow a particular order
Alpha / FDR (--alpha)
False discovery rate threshold for significance.
The default shown in the interface is 0.05.
Minimum total counts (--min_total)
Features below this total abundance threshold are excluded before DESeq2 testing.
The interface default is 50.
Size factors (--sf)
Options:
poscountsratio
This controls library-size normalization.
Test (--test)
Currently shown as:
Wald
This is the DESeq2 test type for contrasts in the current interface.
5. DESeq2 analysis toggles
Run likelihood ratio test (global test) (--run_lrt)
Tests whether abundance differs across groups overall, rather than only in a single pairwise comparison.
Default: enabled in the full preset.
Descriptive-only mode (--descriptive_only)
Skips inferential DESeq2 testing and generates only descriptive outputs.
Use this when you have no meaningful replication.
Include unassigned, uncultured, and unclassified taxa (--include-unassigned)
Keeps unresolved taxa in summaries and outputs.
Useful when:
- unresolved taxa are biologically meaningful
- you want complete rather than filtered summaries
Print parsed parameters at runtime (--print_params)
Helpful for:
- debugging
- reproducibility
- checking that the form resolved to the intended CLI settings
Run core microbiome module (--run_core)
Enables core microbiome identification and related visual outputs.
Default: enabled.
6. Core microbiome settings
These control how the core microbiome is defined.
Core level (--core_level)
Options:
genusotu
Core prevalence (--core_prev)
Minimum fraction of samples in which a taxon must appear to be considered core.
Default shown: 0.8
Core minimum relative abundance (--core_min_rel)
Minimum relative abundance required for a taxon to count as present.
Default shown: 0.001
Presence count (--core_presence_count)
Minimum number of samples where a taxon must be detected.
Default shown: 1
Max taxa in core heatmap (--core_max_taxa_heatmap)
Upper bound on taxa displayed in the heatmap.
Default shown: 40
Core heatmap palette (--core_heatmap_palette)
Options include:
- Viridis
- YlGnBu
- Blues
- Plasma
- Magma
- Cividis
7. DESeq2 abundance plots and figure outputs
This section controls abundance-plot generation and figure formatting.
Generate abundance plots (--run_abundance_plots)
Creates abundance figure outputs.
Pool all samples into one abundance panel (--abund_single_group)
Useful for pooled summaries across all samples.
Reverse core heatmap palette (--core_heatmap_reverse_palette)
Reverses the selected color palette for core heatmaps.
Abundance mode (--abund_mode)
Options:
bothrelativeabsolute
Abundance taxonomic level (--abund_level)
Options:
genusotu
Grouping column 1 (--abund_group1_col)
First grouping variable for abundance plots.
Grouping column 2 (--abund_group2_col)
Second grouping variable for abundance plots.
Single-group label (--abund_single_group_label)
Label assigned when all samples are pooled.
Default shown: All_Species
Top taxa to display (--abund_top_n)
How many taxa to show explicitly before collapsing the rest.
Handling of non-top taxa (--abund_other_mode)
Options:
includeexclude_keep_scaleexclude_renormalize
Abundance palette (--abund_palette)
Palette choices include:
- Tableau
- Polychrome
- Okabe-Ito
- Dark 3
- Set 2
- Viridis
Other color (--abund_other_color)
Color used for the "Other" category.
Default shown: grey80
Taxa color key filename (--abund_color_key_out)
Output filename for the abundance color key.
Reverse abundance palette (--abund_reverse_palette)
Reverses the abundance palette.
Include unassigned taxa in abundance plots (--abund_include_unassigned)
Includes unresolved taxa in the plotted abundances.
Write figure legends / README summaries
Generates legend sidecars or README-style figure summaries when supported.
8. Annotation
The Annotation field is a free-text field.
This can be used as a run note or descriptive label to help identify the run later.
9. Taxonomy strategy
This is the main user-facing taxonomy resolution choice.
Best supported taxonomy (default)
This is the default and recommended option for most metabarcoding workflows.
Description in the UI:
- more conservative
- prioritizes agreement across methods
- avoids inferring species-level identity from single-marker data
- recommended default for metabarcoding data
Enhanced resolution (putative species)
This preset is intended to retain more species-level labels when the data support them, but it still treats species labels as putative rather than definitive identifications.
Description in the UI:
- allows species-level labels when consistent with phylogenetic placement and reference databases
- species assignments remain putative and may not reflect definitive species identity
- best suited for full-length markers
- not recommended as the default for short-read metabarcoding data
Preset behavior in the HTML
The preset logic shown in the page JavaScript indicates that:
Best supported taxonomy
speciesMode = balancedincludeConfidence = truepadConfidenceFilteredMissingRanks = truemissingRankLabel = UnclassifiedoutputTaxonomySource = best_taxonomy_confidence_filteredtbasMinLwr = 0.8ncbiEvalue = 1e-20canonicalPolicy = majority- canonical priorities:
SILVA,UNITE,T-BAS,RDP,NCBI
Enhanced resolution (putative species)
- maps to the more permissive species-retention logic
- keeps confidence filtering enabled
- still allows fallback to genus when species support is insufficient
- is intended for exploratory or full-length-marker use, not as the default for short-read metabarcoding
The interface intentionally keeps these top-level choices simple and reserves more source-specific behavior for Advanced / expert settings.
10. Advanced / expert settings
This section exposes detailed control over taxonomy-resolution logic and downstream export behavior.
Canonical name policy (--canonical-policy)
Options:
source_prioritymajorityweighted
This determines how a single canonical label is selected when equivalent or competing names are present.
source_priority
Use a preferred source order.
majority
Choose the label supported by the most sources.
weighted
Use available confidence values to influence selection.
Canonical priorities
These are only relevant when using source-priority behavior.
Canonical group priority
Comma-separated source order used for winner selection.
Canonical display priority
Optional source order used for display preference.
Include confidence columns (--include-confidence)
Adds columns such as EPA LWR or related confidence values to exported tables.
This is important because many users need to know not only the assigned taxonomy but also how well supported it was.
NCBI e-value cutoff (--ncbi-evalue-cutoff)
BLAST hits worse than this threshold are ignored for NCBI taxonomy purposes.
The preset default shown in the JS is 1e-20.
Species mode (--species-mode)
Options:
conservativebalancedaggressive
This is the low-level engine setting behind the user-facing taxonomy strategy cards.
conservative
Requires stricter support before retaining species.
balanced
Middle-ground behavior.
aggressive
Allows more species retention when genus support and conflict logic permit it.
In the interface, Higher species resolution maps to aggressive.
Output taxonomy source (--output-taxonomy-source)
Standard options:
best_taxonomybest_taxonomy_confidence_filtered
These determine which taxonomy string downstream exports should use globally.
This matters for:
- QIIME
- phyloseq
- PopPhy-CNN
- bundled export outputs
New advanced functionality: single-source comparison and export
The updated resolver can also generate single-source comparison columns and optionally use one of those fields for all downstream outputs.
This functionality is intended for expert users who want to compare:
- consensus taxonomy
- T-BAS only
- NCBI only
- RDP only
- UNITE only
- SILVA only
without running separate jobs for each source.
Enable single-source comparison columns (--write-sole-source-taxonomy)
When enabled, the main consensus table is extended with additional fields such as:
tbas_only_best_taxonomytbas_only_best_taxonomy_confidence_filteredncbi_only_best_taxonomyncbi_only_best_taxonomy_confidence_filtered
and analogous fields for RDP, UNITE, and SILVA.
These columns let users compare what the taxonomy would look like if winner selection were restricted to a single source.
Sources to compare (--sole-source-list)
Comma-separated list of sources to compute side-by-side comparison columns.
Typical default:
T-BAS,NCBI,RDP,UNITE,SILVA
Source-specific downstream output
When single-source comparison columns have been generated, --output-taxonomy-source can also be pointed to one of the source-specific fields, for example:
tbas_only_best_taxonomy_confidence_filteredncbi_only_best_taxonomy_confidence_filteredrdp_only_best_taxonomy_confidence_filteredunite_only_best_taxonomy_confidence_filteredsilva_only_best_taxonomy_confidence_filtered
This means the resolver can now do two distinct things:
- compare source-specific taxonomy side by side inside the consensus table
- drive all downstream exports from a chosen single source when explicitly requested
Recommended UI behavior
This functionality belongs in Advanced / expert settings, not as a top-level taxonomy card. The best-supported taxonomy should remain the default visible workflow, while single-source comparison and export remain optional expert tools.
Prefer curated sources for canonical display (--prefer-curated-sources)
When enabled, curated sources such as SILVA or UNITE are preferred for display if they are available.
Pad confidence-filtered missing ranks (--pad-confidence-filtered-missing-ranks)
Fills missing internal ranks using a placeholder so downstream tools preserve fixed rank positions.
This is especially useful for tools that expect a stable number of ranks.
Missing-rank label (--missing-rank-label)
Placeholder used when missing-rank padding is enabled.
The preset default shown in the JS is Unclassified.
Database info JSON (--db-info)
If provided, database/tool version information is copied into outputs.
Useful for:
- provenance
- reproducibility
- reporting
T-BAS minimum LWR (--tbas-min-lwr)
When set, T-BAS taxonomy is only trusted for best-taxonomy winner selection if support is at or above this threshold.
The preset default in the JS is 0.8.
Low-LWR behavior (--tbas-low-lwr-mode)
Current fixed behavior is exclude.
If T-BAS placement support is below the threshold, T-BAS can be excluded from winner selection and other sources are used instead.
Source-only (--source-only)
Restrict best-taxonomy winner selection to only one source, such as:
- SILVA
- RDP
- UNITE
- NCBI
- T-BAS
Prefer source (--prefer-source)
Bias winner selection toward a chosen source when it is available.
This is weaker than source-only and still allows other sources to participate.
11. PopPhy-CNN labeling options
These settings create metadata labels suitable for PopPhy-CNN workflows.
Encoding mode
Options:
- Binary
- Auto-encode
- Explicit mapping
Binary
Use zero/one coding.
Auto-encode
Map unique values deterministically to integers 0..K-1.
Explicit mapping
Provide your own VALUE=LABEL pairs.
PopPhy-CNN map column (--popphy-map-col)
Metadata column whose values will be converted into labels.
PopPhy-CNN label column name (--popphy-label-col)
Name of the encoded label column written to metadata.
Default: popphy_label
Labels file format / numeric labels (--popphy-labels-numeric)
Forces labels file output to use numeric codes.
PopPhy-CNN other-mode (--popphy-other-mode)
Controls behavior for unmapped values:
excludekeep_naerror
Zero value(s) (--popphy-zero)
Comma-separated values to code as 0.
One value(s) (--popphy-one)
Comma-separated values to code as 1.
Explicit mapping (--popphy-map)
Use VALUE=LABEL entries for custom multi-class mapping.
Examples:
Control=0Treated=1High=2
12. Faith's PD mode and inputs
The page also exposes settings for Faith's phylogenetic diversity workflows.
Mode
Options:
rarefiedobservedoverlay
This determines whether Faith's PD is computed directly, after rarefaction, or in a comparative overlay mode.
Rarefaction depth (--depth)
If blank, the minimum remaining read depth may be used.
Minimum reads (--min_reads)
Samples below this threshold are excluded before analysis.
Alpha
Used for BH-adjusted lettering/group significance in associated summaries or plots.
Block column (--block)
Optional metadata column used for paired or blocked designs.
Grouping / treatment column (--treatment)
Grouping variable for Faith's PD comparisons.
13. Species resolution: what users need to understand
Because this is a taxonomy resolver, not merely a label extractor, users often ask why a feature does not resolve to species even with full-length markers.
This section is included here because it directly affects how the resolver's options should be interpreted.
Why full-length does not guarantee species
Even with full-length ITS or full-length 16S:
- multiple species may share identical marker sequences
- recently diverged species may not yet be separable at a single locus
- databases may contain incomplete, ambiguous, or inconsistent species annotations
- marker resolution varies across taxonomic groups
Reasonable expectations by marker
| Marker | Typical expectation |
|---|---|
| Full-length ITS | genus generally reliable; species often possible but not guaranteed |
| ITS1 or ITS2 alone | genus often possible; species generally limited or unreliable |
| Full-length 16S | genus generally reliable; species sometimes possible |
| 16S V3/V4 | family-to-genus more typical; species uncommon |
Why the resolver may stop at genus
The resolver is intentionally designed to avoid false precision.
Species assignment is retained only when:
- genus support is sufficiently stable
- higher-rank conflict is absent
- placement is supported by the tree
- only one compatible species is present
Otherwise the resolver falls back to genus.
Single-marker matching is not definitive
A direct match to a species name from ITS or 16S alone does not guarantee that the sequence truly belongs to that species.
Species boundaries often rely on:
- multiple loci
- morphology
- ecology
- host association
- reproductive biology
Single-locus vs multi-locus concept
If a single marker cannot separate species, then taxonomy should not pretend otherwise. This is why the resolver prioritizes biologically justified assignment over overcalling species.
14. Recommended usage patterns
Use Higher species resolution when:
- you are working with full-length ITS or full-length 16S
- you want to retain species when justified
- you accept genus fallback when the data do not support species
Use Best supported taxonomy when:
- you are working with short-read markers
- you need more conservative calls
- you are generating publication-ready summaries where overcalling species would be risky
- you want the default downstream outputs to reflect consensus, confidence-filtered taxonomy
Upload counts and metadata when:
- you want DESeq2
- you want abundance plots
- you want PopPhy-CNN labeling
- you want phyloseq / BIOM outputs
- you want group-aware interpretation
Upload a consolidated Newick tree when:
- results come from multiple T-BAS runs
- subsets were placed separately and then merged
- the final combined tree is not available in a single run folder
Use single-source comparison and export when:
- you want to benchmark consensus vs T-BAS-only vs NCBI-only behavior
- you want to understand why consensus assignment differs from a single source
- you want every downstream file to be generated from one source only for comparison or validation
Recommended practice:
- keep Best supported taxonomy (confidence filtered) as the default downstream output for routine runs
- enable single-source comparison only when you need it
- switch downstream output to T-BAS only or NCBI only only for explicit side-by-side comparisons
15. Design principles reflected in the interface
The options exposed in the resolver UI reflect several design principles:
- phylogeny constrains lineage
- databases provide taxonomic resolution
- species calls should be conservative
- all conflicts and overrides should remain transparent
- outputs should be ready for downstream analysis
This is why the interface includes both high-level strategy cards and low-level expert options.
16. Summary
The T-BAS Taxonomy Resolver interface supports:
- one or more T-BAS accessions
- taxonomy consolidation across multiple sources
- user-selectable taxonomy stringency
- confidence-aware and conflict-aware taxonomy assignment
- downstream exports linked to counts, metadata, and optionally trees
- DESeq2, abundance, PopPhy-CNN, and Faith's PD workflows
- expert control over canonical naming, source preference, confidence filtering, and rank padding
- optional single-source comparison columns and source-specific downstream export
In short, the resolver is designed to help users move from placement runs to analysis-ready, biologically defensible taxonomy outputs.