Skip to main content

T-BAS Taxonomy Resolver

The T-BAS Taxonomy Resolver generates a single, consolidated taxonomy from one or more T-BAS placement runs by reconciling phylogenetic placement with multiple reference taxonomies and packaging the results for downstream analysis.

This page documents the full functionality exposed in the resolver interface, including:

  • accession-based input
  • taxonomy strategy presets
  • downstream input files and exports
  • DESeq2 analysis options
  • abundance and core microbiome outputs
  • advanced taxonomy-resolution controls
  • PopPhy-CNN labeling options
  • Faith's phylogenetic diversity options
  • species-resolution limits and interpretation

What the resolver does

At a high level, the resolver:

  1. accepts one or more T-BAS accession IDs
  2. retrieves taxonomy and placement information linked to those accessions
  3. reconciles phylogenetic placement with one or more reference taxonomies
  4. applies confidence- and conflict-aware rules to produce a single best taxonomy per feature
  5. optionally prepares outputs for downstream tools such as QIIME / phyloseq / BIOM / iTOL / DESeq2 / PopPhy-CNN / Faith's PD

This makes the resolver more than a simple extractor. It is a taxonomy consolidation and downstream export tool.


Interpreting resolver output columns

The resolver output is intentionally more detailed than a simple taxonomy table. It is designed to show not only what taxonomy was assigned, but also why that assignment was made, which sources agreed or disagreed, and whether the tree forced the resolver to be more conservative.

This section explains the main output columns in practical terms and gives examples of how to read them.

Why these columns matter

A user might expect the resolver to do something simple such as:

  • read a T-BAS placement
  • look up a database taxonomy
  • return a single name

But the resolver is doing something more sophisticated. It is trying to answer questions such as:

  • Did T-BAS, SILVA, RDP, and NCBI all agree?
  • Did they only agree to genus, but not species?
  • Was the database genus even present in the reference tree?
  • Did the tree suggest that the database label was too specific or phylogenetically inconsistent?
  • Should species be retained, or should the assignment stop at genus?

The output columns are there to make those decisions visible and auditable.


A worked example

Consider a row like this:

FeatureID: OTU1
best_taxonomy: Bacteria; Bacillota; Bacilli; Bacillales; Caryophanaceae; Bhargavaea; cecembensis
best_taxonomy_confidence_filtered: Bacteria; Bacillota; Bacilli; Bacillales; Bacillaceae; Neobacillus; Unclassified
deepest_confident_rank: GENUS
tbas_proxy_lineage: TRUE
tbas_proxy_rank: FAMILY
tree_contains_tbas_genus: TRUE
tree_contains_database_genus: FALSE
tbas_forced_placement: TRUE
sources_used: T-BAS,SILVA,RDP,NCBI
conflict_flag: SILVA
conflict_rank: CLASS
agreement_sources: NCBI,RDP,T-BAS
taxonomy_change_type: tree_forced_adjustment
change_reason_detail: database_genus_absent_from_tree
taxonomy_change_summary: FAMILY:Caryophanaceae->Bacillaceae;GENUS:Bhargavaea->Neobacillus
confidence: 0.884274023

This should be interpreted as follows:

  • The resolver initially had a more detailed lineage in best_taxonomy.
  • After applying tree-aware checks and confidence filtering, the final recommended lineage changed to Neobacillus and the species label was removed.
  • tree_contains_database_genus = FALSE means the database genus supporting the original label was not represented in the reference tree.
  • tbas_forced_placement = TRUE means the tree prevented the resolver from simply keeping the database label.
  • deepest_confident_rank = GENUS means genus was considered defensible, but species was not.
  • taxonomy_change_type = tree_forced_adjustment means the final taxonomy differs from the original because the tree overruled the database-supported label.
  • change_reason_detail = database_genus_absent_from_tree gives the specific reason for that adjustment.

This is exactly the kind of situation where the output columns are essential. Without them, the user would only see that the taxonomy changed and would not know why.


Core taxonomy fields

FeatureID

The feature identifier from the input counts table or pretty report.

Examples:

  • OTU1
  • ASV_24

This is the key used to link taxonomy to counts, metadata, QIIME outputs, phyloseq tables, and downstream analyses.


best_taxonomy

This is the full selected taxonomy before conservative filtering is applied.

It reflects the best-supported lineage that the resolver could assemble, using the selected policies and source logic.

This field is useful when:

  • you want to inspect the most detailed interpretation
  • you want to see what the resolver would keep before species or lower-rank confidence filtering is applied
  • you want to compare detailed assignments against the more conservative final result

Example

best_taxonomy:
Bacteria; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Lelliottia; amnigena

This means the resolver found enough information to support a detailed lineage down to species before filtering.


best_taxonomy_confidence_filtered

This is the final recommended taxonomy and is usually the most important field for downstream use.

It is built by taking best_taxonomy and then applying:

  • confidence filtering
  • tree-aware conflict handling
  • species-mode rules
  • synonym-aware reconciliation
  • rank padding, if enabled

This is typically the field used for:

  • QIIME
  • phyloseq
  • PopPhy-CNN
  • bundled exports
  • publication-facing summaries

Example

best_taxonomy_confidence_filtered:
Bacteria; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Enterobacter; Unclassified

This means:

  • the resolver accepts the lineage to genus
  • species is not considered reliable enough to keep
  • the final export should stop at genus

deepest_confident_rank

This field tells you the lowest rank that the resolver considers reliable after filtering.

It is extremely useful because it summarizes, in one word, where the taxonomy should stop.

Possible values include:

  • PHYLUM
  • CLASS
  • ORDER
  • FAMILY
  • GENUS
  • SPECIES

Example 1

deepest_confident_rank: GENUS

Interpretation:

  • taxonomy is trusted down to genus
  • species was dropped or not defensible

Example 2

deepest_confident_rank: FAMILY

Interpretation:

  • genus is not considered safe
  • the assignment should only be interpreted at family level

confidence

This is the numeric support value associated with the selected assignment, often based on EPA LWR or a related confidence metric.

A higher value usually indicates stronger placement support.

Example

confidence: 0.995802794

This would usually be interpreted as very strong support.

Example

confidence: 0.51

This would be much weaker and may be more likely to trigger conservative fallback behavior.

Important:

  • a high confidence value does not automatically guarantee species-level support
  • the species-mode rules and tree checks can still remove species if the lineage is not considered defensible

Tree-aware interpretation columns

These columns explain how the resolver used the phylogenetic tree to constrain or reinterpret taxonomy.

tbas_proxy_lineage

This indicates whether the T-BAS lineage should be interpreted as a proxy lineage rather than a definitive direct match.

A proxy lineage means:

  • the placement indicates the nearest lineage available in the tree
  • but the exact taxon of interest may not actually be represented in the tree

This is common when:

  • the reference tree does not contain every species
  • placements are being made against a broader lineage framework

Example

tbas_proxy_lineage: TRUE

Interpretation:

  • the T-BAS lineage is being used as the nearest supported phylogenetic lineage
  • it should not automatically be read as exact species identity

tbas_proxy_rank

This tells you the deepest rank at which the T-BAS lineage is acting as a reasonable proxy.

Example

tbas_proxy_rank: FAMILY

Interpretation:

  • the resolver considers the lineage informative only up to family in a proxy sense
  • deeper ranks such as genus or species are less trustworthy as direct identity labels

Example

tbas_proxy_rank: GENUS

Interpretation:

  • the placement likely supports genus-level interpretation
  • but not necessarily species

tree_contains_tbas_genus

This tells you whether the genus implied by the T-BAS lineage is actually present in the reference tree.

Example

tree_contains_tbas_genus: TRUE

Interpretation:

  • the T-BAS genus is represented in the tree
  • the placement has a phylogenetic anchor at genus level

Example

tree_contains_tbas_genus: FALSE

Interpretation:

  • the genus implied by T-BAS is not literally represented in the tree
  • the assignment may only be interpretable at a higher rank or as a proxy lineage

tree_contains_database_genus

This tells you whether the genus coming from the database-supported taxonomy is present in the reference tree.

This is a critical tree-vs-database consistency check.

Example

tree_contains_database_genus: TRUE

Interpretation:

  • the database genus is represented in the tree
  • there is no immediate tree-representation conflict at genus level

Example

tree_contains_database_genus: FALSE

Interpretation:

  • the database genus is not represented in the tree
  • the resolver may consider the database assignment too specific or not phylogenetically supported by the current reference tree

tbas_forced_placement

This indicates that the final filtered taxonomy was affected by a tree-forced placement decision.

When this is TRUE, it usually means:

  • a database label was available
  • but the tree did not support simply keeping it
  • so the resolver adjusted the result to maintain phylogenetic consistency

Example

tbas_forced_placement: TRUE

Interpretation:

  • the tree overruled a more naive database-based interpretation
  • the final taxonomy reflects phylogenetic constraint, not just label agreement

Conflict and agreement columns

These columns show whether sources agreed, which ones disagreed, and where the disagreement started.

sources_used

This is the list of taxonomy sources that contributed non-empty information for the feature.

Examples:

  • T-BAS,SILVA,RDP,NCBI
  • T-BAS,UNITE,NCBI

This tells you which sources actually participated in the decision.

Example

sources_used: T-BAS,SILVA,RDP,NCBI

Interpretation:

  • all four of those sources had usable taxonomy for this feature
  • consensus and conflict logic considered all of them

agreement_sources

These are the sources that agree with the final retained taxonomy at the decisive conflict point.

This is one of the most important transparency fields because it tells you which sources support the final answer.

Example

agreement_sources: NCBI,RDP,T-BAS

Interpretation:

  • the final lineage is supported by those three sources
  • another source, such as SILVA, disagreed in a meaningful way

conflict_flag

This field identifies which source or sources are in conflict.

It does not list agreeing sources. It is designed to flag the disagreeing source(s).

Example

conflict_flag: SILVA

Interpretation:

  • SILVA disagreed with the supported consensus

Example

conflict_flag: none

Interpretation:

  • all relevant sources agreed after synonym handling

conflict_rank

This tells you the first rank at which a meaningful disagreement begins.

Example

conflict_rank: GENUS

Interpretation:

  • sources agree above genus
  • disagreement begins at genus

Example

conflict_rank: CLASS

Interpretation:

  • disagreement begins much higher
  • this is a more substantial conflict

This is useful because it tells you whether the conflict is:

  • minor and late in the lineage
  • or broad and deep in the taxonomy

Change-tracking columns

These explain why the final confidence-filtered taxonomy differs from the more detailed raw choice.

taxonomy_change_type

This is a high-level label describing the reason the final taxonomy changed.

Common values include:

  • none
  • synonym_replacement
  • consensus_override
  • tree_forced_adjustment

Example

taxonomy_change_type: none

Interpretation:

  • the filtered taxonomy is effectively the same as the raw selected taxonomy

Example

taxonomy_change_type: consensus_override

Interpretation:

  • the final taxonomy changed because a better-supported cross-source consensus was chosen

Example

taxonomy_change_type: tree_forced_adjustment

Interpretation:

  • the tree forced the resolver to step away from the more naive or database-driven label

change_reason_detail

This is a more specific explanation nested inside taxonomy_change_type.

This is the field where you often see phrases that really need documentation.

Examples include:

  • database_genus_absent_from_tree
  • multi_source_consensus_override
  • consensus_override_with_genus_reclassification

Example

change_reason_detail: database_genus_absent_from_tree

Interpretation:

  • the database genus was not represented in the reference tree
  • therefore the resolver could not simply keep that database genus as the final answer
  • a tree-aware correction was applied

Example

change_reason_detail: multi_source_consensus_override

Interpretation:

  • several sources supported a different lineage than the raw detailed choice
  • the final taxonomy followed the stronger consensus

Example

change_reason_detail: consensus_override_with_genus_reclassification

Interpretation:

  • the consensus changed the genus, and the visible change was best described as a genus-level reclassification

taxonomy_change_summary

This is a compact, human-readable summary of the visible taxonomic changes.

Example

taxonomy_change_summary: GENUS:Lelliottia->Enterobacter

Interpretation:

  • the final confidence-filtered lineage changed the genus from Lelliottia to Enterobacter

Example

taxonomy_change_summary: FAMILY:Caryophanaceae->Bacillaceae;GENUS:Bhargavaea->Neobacillus

Interpretation:

  • both family and genus changed
  • this was not a tiny species-level tweak but a more substantial lineage correction

This field is especially useful when scanning large result tables by eye.


synonym_applied

This indicates whether an explicit synonym or reclassification rule contributed to the final result.

Example

synonym_applied: TRUE

Interpretation:

  • part of the final taxonomy depended on a synonym or updated taxonomic equivalence
  • for example, an old and new lineage name may have been reconciled

Example

synonym_applied: FALSE

Interpretation:

  • the final result did not require synonym handling in a meaningful way

Special example: database_genus_absent_from_tree

This is one of the most important terms to document because users often do not immediately understand it.

What it means

The database-supported genus was not represented as a tip or lineage in the current reference tree.

That does not automatically mean the database genus is wrong in an absolute sense. It means:

  • the current tree cannot directly support that genus-level label
  • the resolver must decide whether to:
    • follow the tree
    • follow the database
    • or back off to a more conservative rank

Why this matters

The resolver is designed to be phylogeny-aware, not just database-aware.

So if the database says:

Genus = Bhargavaea

but the tree does not contain Bhargavaea at all, then the resolver cannot simply claim that the phylogenetic placement supports that genus.

Instead, it may:

  • keep a broader lineage that the tree does support
  • switch to a neighboring supported lineage
  • or fall back to genus or family-level confidence filtering

Example interpretation

Suppose you see:

tree_contains_database_genus: FALSE
tbas_forced_placement: TRUE
taxonomy_change_type: tree_forced_adjustment
change_reason_detail: database_genus_absent_from_tree

This means:

  1. a database-based genus was available
  2. that genus was not represented in the tree
  3. the tree forced a more conservative or alternative lineage
  4. the final answer was adjusted to stay phylogenetically defensible

This is exactly the kind of case where the resolver is doing something more careful than a database lookup.


Single-source comparison columns

When enabled, the resolver can add source-specific comparison columns.

These are designed for method comparison, not for ordinary default use.

Compact source-specific fields

In the cleaner version of the exporter, the preferred comparison columns are:

  • tbas_only_best_taxonomy_confidence_filtered
  • ncbi_only_best_taxonomy_confidence_filtered
  • rdp_only_best_taxonomy_confidence_filtered
  • unite_only_best_taxonomy_confidence_filtered
  • silva_only_best_taxonomy_confidence_filtered

These show what the final confidence-filtered taxonomy would look like if only that source were used.

Example

If you see:

best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
tbas_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
ncbi_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified

Interpretation:

  • the consensus and the single-source methods agree
  • the assignment is stable across methods

If instead you see:

best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
tbas_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Lelliottia; Unclassified
ncbi_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified

Interpretation:

  • T-BAS and NCBI disagree at genus level
  • the consensus logic resolved that disagreement in favor of Enterobacter

This kind of comparison is extremely useful when auditing why a consensus taxonomy was chosen.


Why the documentation must explain examples, not just field names

A simple glossary is not enough for this tool because many of the fields are not ordinary taxonomy labels. They are decision-trace fields.

For example:

  • database_genus_absent_from_tree is not just a label
  • it represents a whole logic pathway:
    • database genus present
    • tree genus missing
    • tree-aware override
    • conservative fallback

The documentation therefore needs to explain:

  1. what the term literally means
  2. why the resolver uses it
  3. what kind of biological or analytical situation produces it
  4. how a user should interpret it

That is why these worked examples are so important.


Supported reference taxonomies

The resolver can incorporate the following sources when available:

  • SILVA for 16S/18S
  • UNITE for ITS
  • RDP for 16S
  • NCBI RefSeq
  • T-BAS phylogenetic placement

The final taxonomy can reflect agreement across these sources, optionally weighted or prioritized according to the advanced settings described below.


Interface overview

The page is organized into major sections:

  1. Accession input
  2. Downstream analysis inputs and exports
  3. DESeq2 analysis
  4. Annotation
  5. Taxonomy strategy
  6. Advanced / expert settings
  7. Faith's PD mode and inputs
  8. Faith's PD analysis settings

The documentation link shown near the page header should point here so users can understand what each option does before running the tool.


Resolver workflow

Resolver workflow

The normal workflow is:

  • submit one or more T-BAS accessions
  • choose a taxonomy strategy
  • optionally upload counts, metadata, or a consolidated Newick tree
  • optionally enable downstream analyses and exports
  • run the resolver
  • review the consolidated taxonomy and downstream outputs

1. Accession input

Purpose

This is the main required input area for the resolver.

How to use it

Enter one T-BAS accession per line.

Example:

4D2ZPWQH
X8K91M2A

Why multiple accessions are important

The resolver is designed to work with one or more accessions, not just a single run. This is especially important when:

  • a large study is analyzed in subsets
  • multiple T-BAS placement runs were carried out separately on the same reference tree
  • placements from several runs need to be consolidated into one taxonomy table

This enables scalable analysis of very large datasets while preserving a common phylogenetic framework.

Validation

The interface validates accession formatting and shows an error box if one or more entries do not appear to be valid.


2. Downstream analysis inputs and exports

This section is labeled:

Downstream analysis inputs and exports

It explains what additional files are needed to produce analysis-ready outputs.

Why this section matters

The resolver can produce a taxonomy table without counts or metadata, but most meaningful downstream analyses require more than taxonomy alone.

In practice:

  • counts are needed for abundance-based analyses
  • metadata are needed for grouping, labeling, DESeq2, PopPhy-CNN, and many figure outputs
  • a consolidated Newick tree may be needed when placements from multiple runs are merged

Counts file

Upload a counts table (.csv, .tsv, or .txt) if the T-BAS run did not already include one.

Used for:

  • phyloseq exports
  • BIOM exports
  • iTOL count-linked outputs
  • DESeq2
  • abundance plots
  • Faith's PD

Without counts, the taxonomy can still be resolved, but abundance-aware downstream analyses are not possible.

Metadata file

Upload sample metadata if it was not included in the T-BAS run.

Used for:

  • sample grouping
  • DESeq2 design variables
  • PopPhy-CNN label creation
  • figure grouping
  • paired or blocked analyses
  • Faith's PD grouping/blocking

Without metadata, many downstream analyses can still run in limited form, but group-aware interpretation is reduced or impossible.

Consolidated Newick tree (optional)

Upload a Newick tree when results are being resolved across:

  • multiple T-BAS runs
  • merged placement subsets
  • a consolidated tree not stored in any single run folder

This is especially useful when you have placed subsets of a very large dataset separately and then combined them into a grand phylogeny. In that case, the resolver may need access to a Newick tree that does not exist in the original T-BAS run directories.

Export types referenced in this section

The interface specifically highlights:

  • Phyloseq / BIOM
  • PopPhy-CNN
  • iTOL

These are downstream products that may depend on counts, metadata, tree context, or all three.


3. DESeq2 analysis

The DESeq2 card lets the user choose between two high-level presets.

Run full differential analysis

This is the recommended preset for datasets with biological replication.

Use this when you want the resolver workflow to generate:

  • formal statistical testing
  • likelihood ratio tests
  • abundance summaries and plots
  • core microbiome summaries

Descriptive analysis only

Use this when the dataset has no replication or is otherwise unsuitable for inferential DESeq2 testing.

This preset is designed for:

  • descriptive summaries
  • plots
  • exploratory abundance inspection

When selected, the interface automatically turns on descriptive_only and disables the global likelihood ratio test.

Preset behavior

The JavaScript presets in the HTML show that:

  • descriptive sets deseqDescriptiveOnly = true and deseqRunLrt = false
  • full sets deseqDescriptiveOnly = false and deseqRunLrt = true

Both presets keep abundance plots and the core microbiome module enabled by default.


4. DESeq2 analysis settings

This section controls the actual DESeq2 model and filtering choices.

Feature level (--feature_level)

Options:

  • otu
  • genus

Use:

  • otu for feature-level testing
  • genus when you want taxonomically collapsed testing

Grouping / treatment column (--treat_col)

Metadata column defining the groups to compare.

Examples:

  • Treatment
  • Variety
  • Host
  • Genotype
  • Site

This is one of the most important DESeq2 settings because it determines the design grouping.

Reference level (--ref)

Optional baseline group used for pairwise contrasts.

Use this when one level should serve as the control or baseline.

Levels (--levels)

Comma-separated explicit ordering of group levels.

This is helpful when:

  • you want to control factor ordering
  • the first value should become the reference level
  • output plots or contrasts should follow a particular order

Alpha / FDR (--alpha)

False discovery rate threshold for significance.

The default shown in the interface is 0.05.

Minimum total counts (--min_total)

Features below this total abundance threshold are excluded before DESeq2 testing.

The interface default is 50.

Size factors (--sf)

Options:

  • poscounts
  • ratio

This controls library-size normalization.

Test (--test)

Currently shown as:

  • Wald

This is the DESeq2 test type for contrasts in the current interface.


5. DESeq2 analysis toggles

Run likelihood ratio test (global test) (--run_lrt)

Tests whether abundance differs across groups overall, rather than only in a single pairwise comparison.

Default: enabled in the full preset.

Descriptive-only mode (--descriptive_only)

Skips inferential DESeq2 testing and generates only descriptive outputs.

Use this when you have no meaningful replication.

Include unassigned, uncultured, and unclassified taxa (--include-unassigned)

Keeps unresolved taxa in summaries and outputs.

Useful when:

  • unresolved taxa are biologically meaningful
  • you want complete rather than filtered summaries

Helpful for:

  • debugging
  • reproducibility
  • checking that the form resolved to the intended CLI settings

Run core microbiome module (--run_core)

Enables core microbiome identification and related visual outputs.

Default: enabled.


6. Core microbiome settings

These control how the core microbiome is defined.

Core level (--core_level)

Options:

  • genus
  • otu

Core prevalence (--core_prev)

Minimum fraction of samples in which a taxon must appear to be considered core.

Default shown: 0.8

Core minimum relative abundance (--core_min_rel)

Minimum relative abundance required for a taxon to count as present.

Default shown: 0.001

Presence count (--core_presence_count)

Minimum number of samples where a taxon must be detected.

Default shown: 1

Max taxa in core heatmap (--core_max_taxa_heatmap)

Upper bound on taxa displayed in the heatmap.

Default shown: 40

Core heatmap palette (--core_heatmap_palette)

Options include:

  • Viridis
  • YlGnBu
  • Blues
  • Plasma
  • Magma
  • Cividis

7. DESeq2 abundance plots and figure outputs

This section controls abundance-plot generation and figure formatting.

Generate abundance plots (--run_abundance_plots)

Creates abundance figure outputs.

Pool all samples into one abundance panel (--abund_single_group)

Useful for pooled summaries across all samples.

Reverse core heatmap palette (--core_heatmap_reverse_palette)

Reverses the selected color palette for core heatmaps.

Abundance mode (--abund_mode)

Options:

  • both
  • relative
  • absolute

Abundance taxonomic level (--abund_level)

Options:

  • genus
  • otu

Grouping column 1 (--abund_group1_col)

First grouping variable for abundance plots.

Grouping column 2 (--abund_group2_col)

Second grouping variable for abundance plots.

Single-group label (--abund_single_group_label)

Label assigned when all samples are pooled.

Default shown: All_Species

Top taxa to display (--abund_top_n)

How many taxa to show explicitly before collapsing the rest.

Handling of non-top taxa (--abund_other_mode)

Options:

  • include
  • exclude_keep_scale
  • exclude_renormalize

Abundance palette (--abund_palette)

Palette choices include:

  • Tableau
  • Polychrome
  • Okabe-Ito
  • Dark 3
  • Set 2
  • Viridis

Other color (--abund_other_color)

Color used for the "Other" category.

Default shown: grey80

Taxa color key filename (--abund_color_key_out)

Output filename for the abundance color key.

Reverse abundance palette (--abund_reverse_palette)

Reverses the abundance palette.

Include unassigned taxa in abundance plots (--abund_include_unassigned)

Includes unresolved taxa in the plotted abundances.

Write figure legends / README summaries

Generates legend sidecars or README-style figure summaries when supported.


8. Annotation

The Annotation field is a free-text field.

This can be used as a run note or descriptive label to help identify the run later.


9. Taxonomy strategy

This is the main user-facing taxonomy resolution choice.

Best supported taxonomy (default)

This is the default and recommended option for most metabarcoding workflows.

Description in the UI:

  • more conservative
  • prioritizes agreement across methods
  • avoids inferring species-level identity from single-marker data
  • recommended default for metabarcoding data

Enhanced resolution (putative species)

This preset is intended to retain more species-level labels when the data support them, but it still treats species labels as putative rather than definitive identifications.

Description in the UI:

  • allows species-level labels when consistent with phylogenetic placement and reference databases
  • species assignments remain putative and may not reflect definitive species identity
  • best suited for full-length markers
  • not recommended as the default for short-read metabarcoding data

Preset behavior in the HTML

The preset logic shown in the page JavaScript indicates that:

Best supported taxonomy

  • speciesMode = balanced
  • includeConfidence = true
  • padConfidenceFilteredMissingRanks = true
  • missingRankLabel = Unclassified
  • outputTaxonomySource = best_taxonomy_confidence_filtered
  • tbasMinLwr = 0.8
  • ncbiEvalue = 1e-20
  • canonicalPolicy = majority
  • canonical priorities: SILVA,UNITE,T-BAS,RDP,NCBI

Enhanced resolution (putative species)

  • maps to the more permissive species-retention logic
  • keeps confidence filtering enabled
  • still allows fallback to genus when species support is insufficient
  • is intended for exploratory or full-length-marker use, not as the default for short-read metabarcoding

The interface intentionally keeps these top-level choices simple and reserves more source-specific behavior for Advanced / expert settings.


10. Advanced / expert settings

This section exposes detailed control over taxonomy-resolution logic and downstream export behavior.

Canonical name policy (--canonical-policy)

Options:

  • source_priority
  • majority
  • weighted

This determines how a single canonical label is selected when equivalent or competing names are present.

source_priority

Use a preferred source order.

majority

Choose the label supported by the most sources.

weighted

Use available confidence values to influence selection.

Canonical priorities

These are only relevant when using source-priority behavior.

Canonical group priority

Comma-separated source order used for winner selection.

Canonical display priority

Optional source order used for display preference.

Include confidence columns (--include-confidence)

Adds columns such as EPA LWR or related confidence values to exported tables.

This is important because many users need to know not only the assigned taxonomy but also how well supported it was.

NCBI e-value cutoff (--ncbi-evalue-cutoff)

BLAST hits worse than this threshold are ignored for NCBI taxonomy purposes.

The preset default shown in the JS is 1e-20.

Species mode (--species-mode)

Options:

  • conservative
  • balanced
  • aggressive

This is the low-level engine setting behind the user-facing taxonomy strategy cards.

conservative

Requires stricter support before retaining species.

balanced

Middle-ground behavior.

aggressive

Allows more species retention when genus support and conflict logic permit it.

In the interface, Higher species resolution maps to aggressive.

Output taxonomy source (--output-taxonomy-source)

Standard options:

  • best_taxonomy
  • best_taxonomy_confidence_filtered

These determine which taxonomy string downstream exports should use globally.

This matters for:

  • QIIME
  • phyloseq
  • PopPhy-CNN
  • bundled export outputs

New advanced functionality: single-source comparison and export

The updated resolver can also generate single-source comparison columns and optionally use one of those fields for all downstream outputs.

This functionality is intended for expert users who want to compare:

  • consensus taxonomy
  • T-BAS only
  • NCBI only
  • RDP only
  • UNITE only
  • SILVA only

without running separate jobs for each source.

Enable single-source comparison columns (--write-sole-source-taxonomy)

When enabled, the main consensus table is extended with additional fields such as:

  • tbas_only_best_taxonomy
  • tbas_only_best_taxonomy_confidence_filtered
  • ncbi_only_best_taxonomy
  • ncbi_only_best_taxonomy_confidence_filtered

and analogous fields for RDP, UNITE, and SILVA.

These columns let users compare what the taxonomy would look like if winner selection were restricted to a single source.

Sources to compare (--sole-source-list)

Comma-separated list of sources to compute side-by-side comparison columns.

Typical default:

T-BAS,NCBI,RDP,UNITE,SILVA
Source-specific downstream output

When single-source comparison columns have been generated, --output-taxonomy-source can also be pointed to one of the source-specific fields, for example:

  • tbas_only_best_taxonomy_confidence_filtered
  • ncbi_only_best_taxonomy_confidence_filtered
  • rdp_only_best_taxonomy_confidence_filtered
  • unite_only_best_taxonomy_confidence_filtered
  • silva_only_best_taxonomy_confidence_filtered

This means the resolver can now do two distinct things:

  1. compare source-specific taxonomy side by side inside the consensus table
  2. drive all downstream exports from a chosen single source when explicitly requested

This functionality belongs in Advanced / expert settings, not as a top-level taxonomy card. The best-supported taxonomy should remain the default visible workflow, while single-source comparison and export remain optional expert tools.

Prefer curated sources for canonical display (--prefer-curated-sources)

When enabled, curated sources such as SILVA or UNITE are preferred for display if they are available.

Pad confidence-filtered missing ranks (--pad-confidence-filtered-missing-ranks)

Fills missing internal ranks using a placeholder so downstream tools preserve fixed rank positions.

This is especially useful for tools that expect a stable number of ranks.

Missing-rank label (--missing-rank-label)

Placeholder used when missing-rank padding is enabled.

The preset default shown in the JS is Unclassified.

Database info JSON (--db-info)

If provided, database/tool version information is copied into outputs.

Useful for:

  • provenance
  • reproducibility
  • reporting

T-BAS minimum LWR (--tbas-min-lwr)

When set, T-BAS taxonomy is only trusted for best-taxonomy winner selection if support is at or above this threshold.

The preset default in the JS is 0.8.

Low-LWR behavior (--tbas-low-lwr-mode)

Current fixed behavior is exclude.

If T-BAS placement support is below the threshold, T-BAS can be excluded from winner selection and other sources are used instead.

Source-only (--source-only)

Restrict best-taxonomy winner selection to only one source, such as:

  • SILVA
  • RDP
  • UNITE
  • NCBI
  • T-BAS

Prefer source (--prefer-source)

Bias winner selection toward a chosen source when it is available.

This is weaker than source-only and still allows other sources to participate.


11. PopPhy-CNN labeling options

These settings create metadata labels suitable for PopPhy-CNN workflows.

Encoding mode

Options:

  • Binary
  • Auto-encode
  • Explicit mapping

Binary

Use zero/one coding.

Auto-encode

Map unique values deterministically to integers 0..K-1.

Explicit mapping

Provide your own VALUE=LABEL pairs.

PopPhy-CNN map column (--popphy-map-col)

Metadata column whose values will be converted into labels.

PopPhy-CNN label column name (--popphy-label-col)

Name of the encoded label column written to metadata.

Default: popphy_label

Labels file format / numeric labels (--popphy-labels-numeric)

Forces labels file output to use numeric codes.

PopPhy-CNN other-mode (--popphy-other-mode)

Controls behavior for unmapped values:

  • exclude
  • keep_na
  • error

Zero value(s) (--popphy-zero)

Comma-separated values to code as 0.

One value(s) (--popphy-one)

Comma-separated values to code as 1.

Explicit mapping (--popphy-map)

Use VALUE=LABEL entries for custom multi-class mapping.

Examples:

  • Control=0
  • Treated=1
  • High=2

12. Faith's PD mode and inputs

The page also exposes settings for Faith's phylogenetic diversity workflows.

Mode

Options:

  • rarefied
  • observed
  • overlay

This determines whether Faith's PD is computed directly, after rarefaction, or in a comparative overlay mode.

Rarefaction depth (--depth)

If blank, the minimum remaining read depth may be used.

Minimum reads (--min_reads)

Samples below this threshold are excluded before analysis.

Alpha

Used for BH-adjusted lettering/group significance in associated summaries or plots.

Block column (--block)

Optional metadata column used for paired or blocked designs.

Grouping / treatment column (--treatment)

Grouping variable for Faith's PD comparisons.


13. Species resolution: what users need to understand

Because this is a taxonomy resolver, not merely a label extractor, users often ask why a feature does not resolve to species even with full-length markers.

This section is included here because it directly affects how the resolver's options should be interpreted.

Why full-length does not guarantee species

Even with full-length ITS or full-length 16S:

  • multiple species may share identical marker sequences
  • recently diverged species may not yet be separable at a single locus
  • databases may contain incomplete, ambiguous, or inconsistent species annotations
  • marker resolution varies across taxonomic groups

Reasonable expectations by marker

MarkerTypical expectation
Full-length ITSgenus generally reliable; species often possible but not guaranteed
ITS1 or ITS2 alonegenus often possible; species generally limited or unreliable
Full-length 16Sgenus generally reliable; species sometimes possible
16S V3/V4family-to-genus more typical; species uncommon

Why the resolver may stop at genus

The resolver is intentionally designed to avoid false precision.

Species assignment is retained only when:

  • genus support is sufficiently stable
  • higher-rank conflict is absent
  • placement is supported by the tree
  • only one compatible species is present

Otherwise the resolver falls back to genus.

Single-marker matching is not definitive

A direct match to a species name from ITS or 16S alone does not guarantee that the sequence truly belongs to that species.

Species boundaries often rely on:

  • multiple loci
  • morphology
  • ecology
  • host association
  • reproductive biology

Single-locus vs multi-locus concept

Single-locus vs multi-locus resolution

If a single marker cannot separate species, then taxonomy should not pretend otherwise. This is why the resolver prioritizes biologically justified assignment over overcalling species.


Use Higher species resolution when:

  • you are working with full-length ITS or full-length 16S
  • you want to retain species when justified
  • you accept genus fallback when the data do not support species

Use Best supported taxonomy when:

  • you are working with short-read markers
  • you need more conservative calls
  • you are generating publication-ready summaries where overcalling species would be risky
  • you want the default downstream outputs to reflect consensus, confidence-filtered taxonomy

Upload counts and metadata when:

  • you want DESeq2
  • you want abundance plots
  • you want PopPhy-CNN labeling
  • you want phyloseq / BIOM outputs
  • you want group-aware interpretation

Upload a consolidated Newick tree when:

  • results come from multiple T-BAS runs
  • subsets were placed separately and then merged
  • the final combined tree is not available in a single run folder

Use single-source comparison and export when:

  • you want to benchmark consensus vs T-BAS-only vs NCBI-only behavior
  • you want to understand why consensus assignment differs from a single source
  • you want every downstream file to be generated from one source only for comparison or validation

Recommended practice:

  • keep Best supported taxonomy (confidence filtered) as the default downstream output for routine runs
  • enable single-source comparison only when you need it
  • switch downstream output to T-BAS only or NCBI only only for explicit side-by-side comparisons

15. Design principles reflected in the interface

The options exposed in the resolver UI reflect several design principles:

  • phylogeny constrains lineage
  • databases provide taxonomic resolution
  • species calls should be conservative
  • all conflicts and overrides should remain transparent
  • outputs should be ready for downstream analysis

This is why the interface includes both high-level strategy cards and low-level expert options.


16. Summary

The T-BAS Taxonomy Resolver interface supports:

  • one or more T-BAS accessions
  • taxonomy consolidation across multiple sources
  • user-selectable taxonomy stringency
  • confidence-aware and conflict-aware taxonomy assignment
  • downstream exports linked to counts, metadata, and optionally trees
  • DESeq2, abundance, PopPhy-CNN, and Faith's PD workflows
  • expert control over canonical naming, source preference, confidence filtering, and rank padding
  • optional single-source comparison columns and source-specific downstream export

In short, the resolver is designed to help users move from placement runs to analysis-ready, biologically defensible taxonomy outputs.