T-BAS Taxonomy Resolver

The T-BAS Taxonomy Resolver generates a single, consolidated taxonomy from one or more T-BAS placement runs by reconciling phylogenetic placement with multiple reference taxonomies and packaging the results for downstream analysis.

This page documents the full functionality exposed in the resolver interface, including:

accession-based input
taxonomy strategy presets
downstream input files and exports
DESeq2 analysis options
abundance and core microbiome outputs
advanced taxonomy-resolution controls
PopPhy-CNN labeling options
Faith's phylogenetic diversity options
species-resolution limits and interpretation

What the resolver does

At a high level, the resolver:

accepts one or more T-BAS accession IDs
retrieves taxonomy and placement information linked to those accessions
reconciles phylogenetic placement with one or more reference taxonomies
applies confidence- and conflict-aware rules to produce a single best taxonomy per feature
optionally prepares outputs for downstream tools such as QIIME / phyloseq / BIOM / iTOL / DESeq2 / PopPhy-CNN / Faith's PD

This makes the resolver more than a simple extractor. It is a taxonomy consolidation and downstream export tool.

Interpreting resolver output columns

The resolver output is intentionally more detailed than a simple taxonomy table. It is designed to show not only what taxonomy was assigned, but also why that assignment was made, which sources agreed or disagreed, and whether the tree forced the resolver to be more conservative.

This section explains the main output columns in practical terms and gives examples of how to read them.

Why these columns matter

A user might expect the resolver to do something simple such as:

read a T-BAS placement
look up a database taxonomy
return a single name

But the resolver is doing something more sophisticated. It is trying to answer questions such as:

Did T-BAS, SILVA, RDP, and NCBI all agree?
Did they only agree to genus, but not species?
Was the database genus even present in the reference tree?
Did the tree suggest that the database label was too specific or phylogenetically inconsistent?
Should species be retained, or should the assignment stop at genus?

The output columns are there to make those decisions visible and auditable.

A worked example

Consider a row like this:

FeatureID: OTU1
best_taxonomy: Bacteria; Bacillota; Bacilli; Bacillales; Caryophanaceae; Bhargavaea; cecembensis
best_taxonomy_confidence_filtered: Bacteria; Bacillota; Bacilli; Bacillales; Bacillaceae; Neobacillus; Unclassified
deepest_confident_rank: GENUS
tbas_proxy_lineage: TRUE
tbas_proxy_rank: FAMILY
tree_contains_tbas_genus: TRUE
tree_contains_database_genus: FALSE
tbas_forced_placement: TRUE
sources_used: T-BAS,SILVA,RDP,NCBI
conflict_flag: SILVA
conflict_rank: CLASS
agreement_sources: NCBI,RDP,T-BAS
taxonomy_change_type: tree_forced_adjustment
change_reason_detail: database_genus_absent_from_tree
taxonomy_change_summary: FAMILY:Caryophanaceae->Bacillaceae;GENUS:Bhargavaea->Neobacillus
confidence: 0.884274023

This should be interpreted as follows:

The resolver initially had a more detailed lineage in best_taxonomy.
After applying tree-aware checks and confidence filtering, the final recommended lineage changed to Neobacillus and the species label was removed.
tree_contains_database_genus = FALSE means the database genus supporting the original label was not represented in the reference tree.
tbas_forced_placement = TRUE means the tree prevented the resolver from simply keeping the database label.
deepest_confident_rank = GENUS means genus was considered defensible, but species was not.
taxonomy_change_type = tree_forced_adjustment means the final taxonomy differs from the original because the tree overruled the database-supported label.
change_reason_detail = database_genus_absent_from_tree gives the specific reason for that adjustment.

This is exactly the kind of situation where the output columns are essential. Without them, the user would only see that the taxonomy changed and would not know why.

Core taxonomy fields

`FeatureID`

The feature identifier from the input counts table or pretty report.

Examples:

OTU1
ASV_24

This is the key used to link taxonomy to counts, metadata, QIIME outputs, phyloseq tables, and downstream analyses.

`best_taxonomy`

This is the full selected taxonomy before conservative filtering is applied.

It reflects the best-supported lineage that the resolver could assemble, using the selected policies and source logic.

This field is useful when:

you want to inspect the most detailed interpretation
you want to see what the resolver would keep before species or lower-rank confidence filtering is applied
you want to compare detailed assignments against the more conservative final result

Example

best_taxonomy:
Bacteria; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Lelliottia; amnigena

This means the resolver found enough information to support a detailed lineage down to species before filtering.

`best_taxonomy_confidence_filtered`

This is the final recommended taxonomy and is usually the most important field for downstream use.

It is built by taking best_taxonomy and then applying:

confidence filtering
tree-aware conflict handling
species-mode rules
synonym-aware reconciliation
rank padding, if enabled

This is typically the field used for:

QIIME
phyloseq
PopPhy-CNN
bundled exports
publication-facing summaries

Example

best_taxonomy_confidence_filtered:
Bacteria; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Enterobacter; Unclassified

This means:

the resolver accepts the lineage to genus
species is not considered reliable enough to keep
the final export should stop at genus

`deepest_confident_rank`

This field tells you the lowest rank that the resolver considers reliable after filtering.

It is extremely useful because it summarizes, in one word, where the taxonomy should stop.

Possible values include:

PHYLUM
CLASS
ORDER
FAMILY
GENUS
SPECIES

Example 1

deepest_confident_rank: GENUS

Interpretation:

taxonomy is trusted down to genus
species was dropped or not defensible

Example 2

deepest_confident_rank: FAMILY

Interpretation:

genus is not considered safe
the assignment should only be interpreted at family level

`confidence`

This is the numeric support value associated with the selected assignment, often based on EPA LWR or a related confidence metric.

A higher value usually indicates stronger placement support.

Example

confidence: 0.995802794

This would usually be interpreted as very strong support.

Example

confidence: 0.51

This would be much weaker and may be more likely to trigger conservative fallback behavior.

Important:

a high confidence value does not automatically guarantee species-level support
the species-mode rules and tree checks can still remove species if the lineage is not considered defensible

Tree-aware interpretation columns

These columns explain how the resolver used the phylogenetic tree to constrain or reinterpret taxonomy.

`tbas_proxy_lineage`

This indicates whether the T-BAS lineage should be interpreted as a proxy lineage rather than a definitive direct match.

A proxy lineage means:

the placement indicates the nearest lineage available in the tree
but the exact taxon of interest may not actually be represented in the tree

This is common when:

the reference tree does not contain every species
placements are being made against a broader lineage framework

Example

tbas_proxy_lineage: TRUE

Interpretation:

the T-BAS lineage is being used as the nearest supported phylogenetic lineage
it should not automatically be read as exact species identity

`tbas_proxy_rank`

This tells you the deepest rank at which the T-BAS lineage is acting as a reasonable proxy.

Example

tbas_proxy_rank: FAMILY

Interpretation:

the resolver considers the lineage informative only up to family in a proxy sense
deeper ranks such as genus or species are less trustworthy as direct identity labels

Example

tbas_proxy_rank: GENUS

Interpretation:

the placement likely supports genus-level interpretation
but not necessarily species

`tree_contains_tbas_genus`

This tells you whether the genus implied by the T-BAS lineage is actually present in the reference tree.

Example

tree_contains_tbas_genus: TRUE

Interpretation:

the T-BAS genus is represented in the tree
the placement has a phylogenetic anchor at genus level

Example

tree_contains_tbas_genus: FALSE

Interpretation:

the genus implied by T-BAS is not literally represented in the tree
the assignment may only be interpretable at a higher rank or as a proxy lineage

`tree_contains_database_genus`

This tells you whether the genus coming from the database-supported taxonomy is present in the reference tree.

This is a critical tree-vs-database consistency check.

Example

tree_contains_database_genus: TRUE

Interpretation:

the database genus is represented in the tree
there is no immediate tree-representation conflict at genus level

Example

tree_contains_database_genus: FALSE

Interpretation:

the database genus is not represented in the tree
the resolver may consider the database assignment too specific or not phylogenetically supported by the current reference tree

`tbas_forced_placement`

This indicates that the final filtered taxonomy was affected by a tree-forced placement decision.

When this is TRUE, it usually means:

a database label was available
but the tree did not support simply keeping it
so the resolver adjusted the result to maintain phylogenetic consistency

Example

tbas_forced_placement: TRUE

Interpretation:

the tree overruled a more naive database-based interpretation
the final taxonomy reflects phylogenetic constraint, not just label agreement

Conflict and agreement columns

These columns show whether sources agreed, which ones disagreed, and where the disagreement started.

`sources_used`

This is the list of taxonomy sources that contributed non-empty information for the feature.

Examples:

T-BAS,SILVA,RDP,NCBI
T-BAS,UNITE,NCBI

This tells you which sources actually participated in the decision.

Example

sources_used: T-BAS,SILVA,RDP,NCBI

Interpretation:

all four of those sources had usable taxonomy for this feature
consensus and conflict logic considered all of them

`agreement_sources`

These are the sources that agree with the final retained taxonomy at the decisive conflict point.

This is one of the most important transparency fields because it tells you which sources support the final answer.

Example

agreement_sources: NCBI,RDP,T-BAS

Interpretation:

the final lineage is supported by those three sources
another source, such as SILVA, disagreed in a meaningful way

`conflict_flag`

This field identifies which source or sources are in conflict.

It does not list agreeing sources. It is designed to flag the disagreeing source(s).

Example

conflict_flag: SILVA

Interpretation:

SILVA disagreed with the supported consensus

Example

conflict_flag: none

Interpretation:

all relevant sources agreed after synonym handling

`conflict_rank`

This tells you the first rank at which a meaningful disagreement begins.

Example

conflict_rank: GENUS

Interpretation:

sources agree above genus
disagreement begins at genus

Example

conflict_rank: CLASS

Interpretation:

disagreement begins much higher
this is a more substantial conflict

This is useful because it tells you whether the conflict is:

minor and late in the lineage
or broad and deep in the taxonomy

Change-tracking columns

These explain why the final confidence-filtered taxonomy differs from the more detailed raw choice.

`taxonomy_change_type`

This is a high-level label describing the reason the final taxonomy changed.

Common values include:

none
synonym_replacement
consensus_override
tree_forced_adjustment

Example

taxonomy_change_type: none

Interpretation:

the filtered taxonomy is effectively the same as the raw selected taxonomy

Example

taxonomy_change_type: consensus_override

Interpretation:

the final taxonomy changed because a better-supported cross-source consensus was chosen

Example

taxonomy_change_type: tree_forced_adjustment

Interpretation:

the tree forced the resolver to step away from the more naive or database-driven label

`change_reason_detail`

This is a more specific explanation nested inside taxonomy_change_type.

This is the field where you often see phrases that really need documentation.

Examples include:

database_genus_absent_from_tree
multi_source_consensus_override
consensus_override_with_genus_reclassification

Example

change_reason_detail: database_genus_absent_from_tree

Interpretation:

the database genus was not represented in the reference tree
therefore the resolver could not simply keep that database genus as the final answer
a tree-aware correction was applied

Example

change_reason_detail: multi_source_consensus_override

Interpretation:

several sources supported a different lineage than the raw detailed choice
the final taxonomy followed the stronger consensus

Example

change_reason_detail: consensus_override_with_genus_reclassification

Interpretation:

the consensus changed the genus, and the visible change was best described as a genus-level reclassification

`taxonomy_change_summary`

This is a compact, human-readable summary of the visible taxonomic changes.

Example

taxonomy_change_summary: GENUS:Lelliottia->Enterobacter

Interpretation:

the final confidence-filtered lineage changed the genus from Lelliottia to Enterobacter

Example

taxonomy_change_summary: FAMILY:Caryophanaceae->Bacillaceae;GENUS:Bhargavaea->Neobacillus

Interpretation:

both family and genus changed
this was not a tiny species-level tweak but a more substantial lineage correction

This field is especially useful when scanning large result tables by eye.

`synonym_applied`

This indicates whether an explicit synonym or reclassification rule contributed to the final result.

Example

synonym_applied: TRUE

Interpretation:

part of the final taxonomy depended on a synonym or updated taxonomic equivalence
for example, an old and new lineage name may have been reconciled

Example

synonym_applied: FALSE

Interpretation:

the final result did not require synonym handling in a meaningful way

Special example: `database_genus_absent_from_tree`

This is one of the most important terms to document because users often do not immediately understand it.

What it means

The database-supported genus was not represented as a tip or lineage in the current reference tree.

That does not automatically mean the database genus is wrong in an absolute sense. It means:

the current tree cannot directly support that genus-level label
the resolver must decide whether to:
- follow the tree
- follow the database
- or back off to a more conservative rank

Why this matters

The resolver is designed to be phylogeny-aware, not just database-aware.

So if the database says:

Genus = Bhargavaea

but the tree does not contain Bhargavaea at all, then the resolver cannot simply claim that the phylogenetic placement supports that genus.

Instead, it may:

keep a broader lineage that the tree does support
switch to a neighboring supported lineage
or fall back to genus or family-level confidence filtering

Example interpretation

Suppose you see:

tree_contains_database_genus: FALSE
tbas_forced_placement: TRUE
taxonomy_change_type: tree_forced_adjustment
change_reason_detail: database_genus_absent_from_tree

This means:

a database-based genus was available
that genus was not represented in the tree
the tree forced a more conservative or alternative lineage
the final answer was adjusted to stay phylogenetically defensible

This is exactly the kind of case where the resolver is doing something more careful than a database lookup.

Single-source comparison columns

When enabled, the resolver can add source-specific comparison columns.

These are designed for method comparison, not for ordinary default use.

Compact source-specific fields

In the cleaner version of the exporter, the preferred comparison columns are:

tbas_only_best_taxonomy_confidence_filtered
ncbi_only_best_taxonomy_confidence_filtered
rdp_only_best_taxonomy_confidence_filtered
unite_only_best_taxonomy_confidence_filtered
silva_only_best_taxonomy_confidence_filtered

These show what the final confidence-filtered taxonomy would look like if only that source were used.

Example

If you see:

best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
tbas_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
ncbi_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified

Interpretation:

the consensus and the single-source methods agree
the assignment is stable across methods

If instead you see:

best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified
tbas_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Lelliottia; Unclassified
ncbi_only_best_taxonomy_confidence_filtered: Bacteria; ... ; Enterobacter; Unclassified

Interpretation:

T-BAS and NCBI disagree at genus level
the consensus logic resolved that disagreement in favor of Enterobacter

This kind of comparison is extremely useful when auditing why a consensus taxonomy was chosen.

Why the documentation must explain examples, not just field names

A simple glossary is not enough for this tool because many of the fields are not ordinary taxonomy labels. They are decision-trace fields.

For example:

database_genus_absent_from_tree is not just a label
it represents a whole logic pathway:
- database genus present
- tree genus missing
- tree-aware override
- conservative fallback

The documentation therefore needs to explain:

what the term literally means
why the resolver uses it
what kind of biological or analytical situation produces it
how a user should interpret it

That is why these worked examples are so important.

Supported reference taxonomies

The resolver can incorporate the following sources when available:

SILVA for 16S/18S
UNITE for ITS
RDP for 16S
NCBI RefSeq
T-BAS phylogenetic placement

The final taxonomy can reflect agreement across these sources, optionally weighted or prioritized according to the advanced settings described below.

Interface overview

The page is organized into major sections:

Accession input
Downstream analysis inputs and exports
DESeq2 analysis
Annotation
Taxonomy strategy
Advanced / expert settings
Faith's PD mode and inputs
Faith's PD analysis settings

The documentation link shown near the page header should point here so users can understand what each option does before running the tool.

Resolver workflow

The normal workflow is:

submit one or more T-BAS accessions
choose a taxonomy strategy
optionally upload counts, metadata, or a consolidated Newick tree
optionally enable downstream analyses and exports
run the resolver
review the consolidated taxonomy and downstream outputs

1. Accession input

Purpose

This is the main required input area for the resolver.

How to use it

Enter one T-BAS accession per line.

Example:

4D2ZPWQH
X8K91M2A

Why multiple accessions are important

The resolver is designed to work with one or more accessions, not just a single run. This is especially important when:

a large study is analyzed in subsets
multiple T-BAS placement runs were carried out separately on the same reference tree
placements from several runs need to be consolidated into one taxonomy table

This enables scalable analysis of very large datasets while preserving a common phylogenetic framework.

Validation

The interface validates accession formatting and shows an error box if one or more entries do not appear to be valid.

2. Downstream analysis inputs and exports

This section is labeled:

Downstream analysis inputs and exports

It explains what additional files are needed to produce analysis-ready outputs.

Why this section matters

The resolver can produce a taxonomy table without counts or metadata, but most meaningful downstream analyses require more than taxonomy alone.

In practice:

counts are needed for abundance-based analyses
metadata are needed for grouping, labeling, DESeq2, PopPhy-CNN, and many figure outputs
a consolidated Newick tree may be needed when placements from multiple runs are merged

Counts file

Upload a counts table (.csv, .tsv, or .txt) if the T-BAS run did not already include one.

Used for:

phyloseq exports
BIOM exports
iTOL count-linked outputs
DESeq2
abundance plots
Faith's PD

Without counts, the taxonomy can still be resolved, but abundance-aware downstream analyses are not possible.

Metadata file

Upload sample metadata if it was not included in the T-BAS run.

Used for:

sample grouping
DESeq2 design variables
PopPhy-CNN label creation
figure grouping
paired or blocked analyses
Faith's PD grouping/blocking

Without metadata, many downstream analyses can still run in limited form, but group-aware interpretation is reduced or impossible.

Consolidated Newick tree (optional)

Upload a Newick tree when results are being resolved across:

multiple T-BAS runs
merged placement subsets
a consolidated tree not stored in any single run folder

This is especially useful when you have placed subsets of a very large dataset separately and then combined them into a grand phylogeny. In that case, the resolver may need access to a Newick tree that does not exist in the original T-BAS run directories.

Export types referenced in this section

The interface specifically highlights:

Phyloseq / BIOM
PopPhy-CNN
iTOL

These are downstream products that may depend on counts, metadata, tree context, or all three.

3. DESeq2 analysis

The DESeq2 card lets the user choose between two high-level presets.

Run full differential analysis

This is the recommended preset for datasets with biological replication.

Use this when you want the resolver workflow to generate:

formal statistical testing
likelihood ratio tests
abundance summaries and plots
core microbiome summaries

Descriptive analysis only

Use this when the dataset has no replication or is otherwise unsuitable for inferential DESeq2 testing.

This preset is designed for:

descriptive summaries
plots
exploratory abundance inspection

When selected, the interface automatically turns on descriptive_only and disables the global likelihood ratio test.

Preset behavior

The JavaScript presets in the HTML show that:

descriptive sets deseqDescriptiveOnly = true and deseqRunLrt = false
full sets deseqDescriptiveOnly = false and deseqRunLrt = true

Both presets keep abundance plots and the core microbiome module enabled by default.

4. DESeq2 analysis settings

This section controls the actual DESeq2 model and filtering choices.

Feature level (`--feature_level`)

Options:

otu
genus

Use:

otu for feature-level testing
genus when you want taxonomically collapsed testing

Grouping / treatment column (`--treat_col`)

Metadata column defining the groups to compare.

Examples:

Treatment
Variety
Host
Genotype
Site

This is one of the most important DESeq2 settings because it determines the design grouping.

Reference level (`--ref`)

Optional baseline group used for pairwise contrasts.

Use this when one level should serve as the control or baseline.

Levels (`--levels`)

Comma-separated explicit ordering of group levels.

This is helpful when:

you want to control factor ordering
the first value should become the reference level
output plots or contrasts should follow a particular order

Alpha / FDR (`--alpha`)

False discovery rate threshold for significance.

The default shown in the interface is 0.05.

Minimum total counts (`--min_total`)

Features below this total abundance threshold are excluded before DESeq2 testing.

The interface default is 50.

Size factors (`--sf`)

Options:

poscounts
ratio

This controls library-size normalization.

Test (`--test`)

Currently shown as:

Wald

This is the DESeq2 test type for contrasts in the current interface.

5. DESeq2 analysis toggles

Run likelihood ratio test (global test) (`--run_lrt`)

Tests whether abundance differs across groups overall, rather than only in a single pairwise comparison.

Default: enabled in the full preset.

Descriptive-only mode (`--descriptive_only`)

Skips inferential DESeq2 testing and generates only descriptive outputs.

Use this when you have no meaningful replication.

Include unassigned, uncultured, and unclassified taxa (`--include-unassigned`)

Keeps unresolved taxa in summaries and outputs.

Useful when:

unresolved taxa are biologically meaningful
you want complete rather than filtered summaries

Print parsed parameters at runtime (`--print_params`)

Helpful for:

debugging
reproducibility
checking that the form resolved to the intended CLI settings

Run core microbiome module (`--run_core`)

Enables core microbiome identification and related visual outputs.

Default: enabled.

6. Core microbiome settings

These control how the core microbiome is defined.

Core level (`--core_level`)

Options:

genus
otu

Core prevalence (`--core_prev`)

Minimum fraction of samples in which a taxon must appear to be considered core.

Default shown: 0.8

Core minimum relative abundance (`--core_min_rel`)

Minimum relative abundance required for a taxon to count as present.

Default shown: 0.001

Presence count (`--core_presence_count`)

Minimum number of samples where a taxon must be detected.

Default shown: 1

Max taxa in core heatmap (`--core_max_taxa_heatmap`)

Upper bound on taxa displayed in the heatmap.

Default shown: 40

Core heatmap palette (`--core_heatmap_palette`)

Options include:

Viridis
YlGnBu
Blues
Plasma
Magma
Cividis

7. DESeq2 abundance plots and figure outputs

This section controls abundance-plot generation and figure formatting.

Generate abundance plots (`--run_abundance_plots`)

Creates abundance figure outputs.

Pool all samples into one abundance panel (`--abund_single_group`)

Useful for pooled summaries across all samples.

Reverse core heatmap palette (`--core_heatmap_reverse_palette`)

Reverses the selected color palette for core heatmaps.

Abundance mode (`--abund_mode`)

Options:

both
relative
absolute

Abundance taxonomic level (`--abund_level`)

Options:

genus
otu

Grouping column 1 (`--abund_group1_col`)

First grouping variable for abundance plots.

Grouping column 2 (`--abund_group2_col`)

Second grouping variable for abundance plots.

Single-group label (`--abund_single_group_label`)

Label assigned when all samples are pooled.

Default shown: All_Species

Top taxa to display (`--abund_top_n`)

How many taxa to show explicitly before collapsing the rest.

Handling of non-top taxa (`--abund_other_mode`)

Options:

include
exclude_keep_scale
exclude_renormalize

Abundance palette (`--abund_palette`)

Palette choices include:

Tableau
Polychrome
Okabe-Ito
Dark 3
Set 2
Viridis

Other color (`--abund_other_color`)

Color used for the "Other" category.

Default shown: grey80

Taxa color key filename (`--abund_color_key_out`)

Output filename for the abundance color key.

Reverse abundance palette (`--abund_reverse_palette`)

Reverses the abundance palette.

Include unassigned taxa in abundance plots (`--abund_include_unassigned`)

Includes unresolved taxa in the plotted abundances.

Write figure legends / README summaries

Generates legend sidecars or README-style figure summaries when supported.

8. Annotation

The Annotation field is a free-text field.

This can be used as a run note or descriptive label to help identify the run later.

9. Taxonomy strategy

This is the main user-facing taxonomy resolution choice.

Best supported taxonomy (default)

This is the default and recommended option for most metabarcoding workflows.

Description in the UI:

more conservative
prioritizes agreement across methods
avoids inferring species-level identity from single-marker data
recommended default for metabarcoding data

Enhanced resolution (putative species)

This preset is intended to retain more species-level labels when the data support them, but it still treats species labels as putative rather than definitive identifications.

Description in the UI:

allows species-level labels when consistent with phylogenetic placement and reference databases
species assignments remain putative and may not reflect definitive species identity
best suited for full-length markers
not recommended as the default for short-read metabarcoding data

Preset behavior in the HTML

The preset logic shown in the page JavaScript indicates that:

Best supported taxonomy

speciesMode = balanced
includeConfidence = true
padConfidenceFilteredMissingRanks = true
missingRankLabel = Unclassified
outputTaxonomySource = best_taxonomy_confidence_filtered
tbasMinLwr = 0.8
ncbiEvalue = 1e-20
canonicalPolicy = majority
canonical priorities: SILVA,UNITE,T-BAS,RDP,NCBI

Enhanced resolution (putative species)

maps to the more permissive species-retention logic
keeps confidence filtering enabled
still allows fallback to genus when species support is insufficient
is intended for exploratory or full-length-marker use, not as the default for short-read metabarcoding

The interface intentionally keeps these top-level choices simple and reserves more source-specific behavior for Advanced / expert settings.

10. Advanced / expert settings

This section exposes detailed control over taxonomy-resolution logic and downstream export behavior.

Canonical name policy (`--canonical-policy`)

Options:

source_priority
majority
weighted

This determines how a single canonical label is selected when equivalent or competing names are present.

source_priority

Use a preferred source order.

majority

Choose the label supported by the most sources.

weighted

Use available confidence values to influence selection.

Canonical priorities

These are only relevant when using source-priority behavior.

Canonical group priority

Comma-separated source order used for winner selection.

Canonical display priority

Optional source order used for display preference.

Include confidence columns (`--include-confidence`)

Adds columns such as EPA LWR or related confidence values to exported tables.

This is important because many users need to know not only the assigned taxonomy but also how well supported it was.

NCBI e-value cutoff (`--ncbi-evalue-cutoff`)

BLAST hits worse than this threshold are ignored for NCBI taxonomy purposes.

The preset default shown in the JS is 1e-20.

Species mode (`--species-mode`)

Options:

conservative
balanced
aggressive

This is the low-level engine setting behind the user-facing taxonomy strategy cards.

conservative

Requires stricter support before retaining species.

balanced

Middle-ground behavior.

aggressive

Allows more species retention when genus support and conflict logic permit it.

In the interface, Higher species resolution maps to aggressive.

Output taxonomy source (`--output-taxonomy-source`)

Standard options:

best_taxonomy
best_taxonomy_confidence_filtered

These determine which taxonomy string downstream exports should use globally.

This matters for:

QIIME
phyloseq
PopPhy-CNN
bundled export outputs

New advanced functionality: single-source comparison and export

The updated resolver can also generate single-source comparison columns and optionally use one of those fields for all downstream outputs.

This functionality is intended for expert users who want to compare:

consensus taxonomy
T-BAS only
NCBI only
RDP only
UNITE only
SILVA only

without running separate jobs for each source.

Enable single-source comparison columns (`--write-sole-source-taxonomy`)

When enabled, the main consensus table is extended with additional fields such as:

tbas_only_best_taxonomy
tbas_only_best_taxonomy_confidence_filtered
ncbi_only_best_taxonomy
ncbi_only_best_taxonomy_confidence_filtered

and analogous fields for RDP, UNITE, and SILVA.

These columns let users compare what the taxonomy would look like if winner selection were restricted to a single source.

Sources to compare (`--sole-source-list`)

Comma-separated list of sources to compute side-by-side comparison columns.

Typical default:

T-BAS,NCBI,RDP,UNITE,SILVA

Source-specific downstream output

When single-source comparison columns have been generated, --output-taxonomy-source can also be pointed to one of the source-specific fields, for example:

tbas_only_best_taxonomy_confidence_filtered
ncbi_only_best_taxonomy_confidence_filtered
rdp_only_best_taxonomy_confidence_filtered
unite_only_best_taxonomy_confidence_filtered
silva_only_best_taxonomy_confidence_filtered

This means the resolver can now do two distinct things:

compare source-specific taxonomy side by side inside the consensus table
drive all downstream exports from a chosen single source when explicitly requested

Recommended UI behavior

This functionality belongs in Advanced / expert settings, not as a top-level taxonomy card. The best-supported taxonomy should remain the default visible workflow, while single-source comparison and export remain optional expert tools.

Prefer curated sources for canonical display (`--prefer-curated-sources`)

When enabled, curated sources such as SILVA or UNITE are preferred for display if they are available.

Pad confidence-filtered missing ranks (`--pad-confidence-filtered-missing-ranks`)

Fills missing internal ranks using a placeholder so downstream tools preserve fixed rank positions.

This is especially useful for tools that expect a stable number of ranks.

Missing-rank label (`--missing-rank-label`)

Placeholder used when missing-rank padding is enabled.

The preset default shown in the JS is Unclassified.

Database info JSON (`--db-info`)

If provided, database/tool version information is copied into outputs.

Useful for:

provenance
reproducibility
reporting

T-BAS minimum LWR (`--tbas-min-lwr`)

When set, T-BAS taxonomy is only trusted for best-taxonomy winner selection if support is at or above this threshold.

The preset default in the JS is 0.8.

Low-LWR behavior (`--tbas-low-lwr-mode`)

Current fixed behavior is exclude.

If T-BAS placement support is below the threshold, T-BAS can be excluded from winner selection and other sources are used instead.

Source-only (`--source-only`)

Restrict best-taxonomy winner selection to only one source, such as:

SILVA
RDP
UNITE
NCBI
T-BAS

Prefer source (`--prefer-source`)

Bias winner selection toward a chosen source when it is available.

This is weaker than source-only and still allows other sources to participate.

11. PopPhy-CNN labeling options

These settings create metadata labels suitable for PopPhy-CNN workflows.

Encoding mode

Options:

Binary
Auto-encode
Explicit mapping

Binary

Use zero/one coding.

Auto-encode

Map unique values deterministically to integers 0..K-1.

Explicit mapping

Provide your own VALUE=LABEL pairs.

PopPhy-CNN map column (`--popphy-map-col`)

Metadata column whose values will be converted into labels.

PopPhy-CNN label column name (`--popphy-label-col`)

Name of the encoded label column written to metadata.

Default: popphy_label

Labels file format / numeric labels (`--popphy-labels-numeric`)

Forces labels file output to use numeric codes.

PopPhy-CNN other-mode (`--popphy-other-mode`)

Controls behavior for unmapped values:

exclude
keep_na
error

Zero value(s) (`--popphy-zero`)

Comma-separated values to code as 0.

One value(s) (`--popphy-one`)

Comma-separated values to code as 1.

Explicit mapping (`--popphy-map`)

Use VALUE=LABEL entries for custom multi-class mapping.

Examples:

Control=0
Treated=1
High=2

12. Faith's PD mode and inputs

The page also exposes settings for Faith's phylogenetic diversity workflows.

Mode

Options:

rarefied
observed
overlay

This determines whether Faith's PD is computed directly, after rarefaction, or in a comparative overlay mode.

Rarefaction depth (`--depth`)

If blank, the minimum remaining read depth may be used.

Minimum reads (`--min_reads`)

Samples below this threshold are excluded before analysis.

Alpha

Used for BH-adjusted lettering/group significance in associated summaries or plots.

Block column (`--block`)

Optional metadata column used for paired or blocked designs.

Grouping / treatment column (`--treatment`)

Grouping variable for Faith's PD comparisons.

13. Species resolution: what users need to understand

Because this is a taxonomy resolver, not merely a label extractor, users often ask why a feature does not resolve to species even with full-length markers.

This section is included here because it directly affects how the resolver's options should be interpreted.

Why full-length does not guarantee species

Even with full-length ITS or full-length 16S:

multiple species may share identical marker sequences
recently diverged species may not yet be separable at a single locus
databases may contain incomplete, ambiguous, or inconsistent species annotations
marker resolution varies across taxonomic groups

Reasonable expectations by marker

Marker	Typical expectation
Full-length ITS	genus generally reliable; species often possible but not guaranteed
ITS1 or ITS2 alone	genus often possible; species generally limited or unreliable
Full-length 16S	genus generally reliable; species sometimes possible
16S V3/V4	family-to-genus more typical; species uncommon

Why the resolver may stop at genus

The resolver is intentionally designed to avoid false precision.

Species assignment is retained only when:

genus support is sufficiently stable
higher-rank conflict is absent
placement is supported by the tree
only one compatible species is present

Otherwise the resolver falls back to genus.

Single-marker matching is not definitive

A direct match to a species name from ITS or 16S alone does not guarantee that the sequence truly belongs to that species.

Species boundaries often rely on:

multiple loci
morphology
ecology
host association
reproductive biology

Single-locus vs multi-locus concept

Single-locus vs multi-locus resolution

If a single marker cannot separate species, then taxonomy should not pretend otherwise. This is why the resolver prioritizes biologically justified assignment over overcalling species.

14. Recommended usage patterns

Use Higher species resolution when:

you are working with full-length ITS or full-length 16S
you want to retain species when justified
you accept genus fallback when the data do not support species

Use Best supported taxonomy when:

you are working with short-read markers
you need more conservative calls
you are generating publication-ready summaries where overcalling species would be risky
you want the default downstream outputs to reflect consensus, confidence-filtered taxonomy

Upload counts and metadata when:

you want DESeq2
you want abundance plots
you want PopPhy-CNN labeling
you want phyloseq / BIOM outputs
you want group-aware interpretation

Upload a consolidated Newick tree when:

results come from multiple T-BAS runs
subsets were placed separately and then merged
the final combined tree is not available in a single run folder

Use single-source comparison and export when:

you want to benchmark consensus vs T-BAS-only vs NCBI-only behavior
you want to understand why consensus assignment differs from a single source
you want every downstream file to be generated from one source only for comparison or validation

Recommended practice:

keep Best supported taxonomy (confidence filtered) as the default downstream output for routine runs
enable single-source comparison only when you need it
switch downstream output to T-BAS only or NCBI only only for explicit side-by-side comparisons

15. Design principles reflected in the interface

The options exposed in the resolver UI reflect several design principles:

phylogeny constrains lineage
databases provide taxonomic resolution
species calls should be conservative
all conflicts and overrides should remain transparent
outputs should be ready for downstream analysis

This is why the interface includes both high-level strategy cards and low-level expert options.

16. Summary

The T-BAS Taxonomy Resolver interface supports:

one or more T-BAS accessions
taxonomy consolidation across multiple sources
user-selectable taxonomy stringency
confidence-aware and conflict-aware taxonomy assignment
downstream exports linked to counts, metadata, and optionally trees
DESeq2, abundance, PopPhy-CNN, and Faith's PD workflows
expert control over canonical naming, source preference, confidence filtering, and rank padding
optional single-source comparison columns and source-specific downstream export

In short, the resolver is designed to help users move from placement runs to analysis-ready, biologically defensible taxonomy outputs.

What the resolver does
Interpreting resolver output columns
- Why these columns matter
- A worked example
Core taxonomy fields
- FeatureID
- best_taxonomy
- best_taxonomy_confidence_filtered
- deepest_confident_rank
- confidence
Tree-aware interpretation columns
- tbas_proxy_lineage
- tbas_proxy_rank
- tree_contains_tbas_genus
- tree_contains_database_genus
- tbas_forced_placement
Conflict and agreement columns
- sources_used
- agreement_sources
- conflict_flag
- conflict_rank
Change-tracking columns
- taxonomy_change_type
- change_reason_detail
- taxonomy_change_summary
- synonym_applied
Special example: database_genus_absent_from_tree
- What it means
- Why this matters
- Example interpretation
Single-source comparison columns
- Compact source-specific fields
Why the documentation must explain examples, not just field names
Supported reference taxonomies
Interface overview
Resolver workflow
1. Accession input
- Purpose
- How to use it
- Why multiple accessions are important
- Validation
2. Downstream analysis inputs and exports
- Why this section matters
- Counts file
- Metadata file
- Consolidated Newick tree (optional)
- Export types referenced in this section
3. DESeq2 analysis
- Run full differential analysis
- Descriptive analysis only
- Preset behavior
4. DESeq2 analysis settings
- Feature level (--feature_level)
- Grouping / treatment column (--treat_col)
- Reference level (--ref)
- Levels (--levels)
- Alpha / FDR (--alpha)
- Minimum total counts (--min_total)
- Size factors (--sf)
- Test (--test)
5. DESeq2 analysis toggles
- Run likelihood ratio test (global test) (--run_lrt)
- Descriptive-only mode (--descriptive_only)
- Include unassigned, uncultured, and unclassified taxa (--include-unassigned)
- Print parsed parameters at runtime (--print_params)
- Run core microbiome module (--run_core)
6. Core microbiome settings
- Core level (--core_level)
- Core prevalence (--core_prev)
- Core minimum relative abundance (--core_min_rel)
- Presence count (--core_presence_count)
- Max taxa in core heatmap (--core_max_taxa_heatmap)
- Core heatmap palette (--core_heatmap_palette)
7. DESeq2 abundance plots and figure outputs
- Generate abundance plots (--run_abundance_plots)
- Pool all samples into one abundance panel (--abund_single_group)
- Reverse core heatmap palette (--core_heatmap_reverse_palette)
- Abundance mode (--abund_mode)
- Abundance taxonomic level (--abund_level)
- Grouping column 1 (--abund_group1_col)
- Grouping column 2 (--abund_group2_col)
- Single-group label (--abund_single_group_label)
- Top taxa to display (--abund_top_n)
- Handling of non-top taxa (--abund_other_mode)
- Abundance palette (--abund_palette)
- Other color (--abund_other_color)
- Taxa color key filename (--abund_color_key_out)
- Reverse abundance palette (--abund_reverse_palette)
- Include unassigned taxa in abundance plots (--abund_include_unassigned)
- Write figure legends / README summaries
8. Annotation
9. Taxonomy strategy
- Best supported taxonomy (default)
- Enhanced resolution (putative species)
- Preset behavior in the HTML
10. Advanced / expert settings
- Canonical name policy (--canonical-policy)
- Canonical priorities
- Include confidence columns (--include-confidence)
- NCBI e-value cutoff (--ncbi-evalue-cutoff)
- Species mode (--species-mode)
- Output taxonomy source (--output-taxonomy-source)
- Prefer curated sources for canonical display (--prefer-curated-sources)
- Pad confidence-filtered missing ranks (--pad-confidence-filtered-missing-ranks)
- Missing-rank label (--missing-rank-label)
- Database info JSON (--db-info)
- T-BAS minimum LWR (--tbas-min-lwr)
- Low-LWR behavior (--tbas-low-lwr-mode)
- Source-only (--source-only)
- Prefer source (--prefer-source)
11. PopPhy-CNN labeling options
- Encoding mode
- PopPhy-CNN map column (--popphy-map-col)
- PopPhy-CNN label column name (--popphy-label-col)
- Labels file format / numeric labels (--popphy-labels-numeric)
- PopPhy-CNN other-mode (--popphy-other-mode)
- Zero value(s) (--popphy-zero)
- One value(s) (--popphy-one)
- Explicit mapping (--popphy-map)
12. Faith's PD mode and inputs
- Mode
- Rarefaction depth (--depth)
- Minimum reads (--min_reads)
- Alpha
- Block column (--block)
- Grouping / treatment column (--treatment)
13. Species resolution: what users need to understand
- Why full-length does not guarantee species
- Reasonable expectations by marker
- Why the resolver may stop at genus
- Single-marker matching is not definitive
- Single-locus vs multi-locus concept
14. Recommended usage patterns
- Use Higher species resolution when:
- Use Best supported taxonomy when:
- Upload counts and metadata when:
- Upload a consolidated Newick tree when:
- Use single-source comparison and export when:
15. Design principles reflected in the interface
16. Summary

What the resolver does​

Interpreting resolver output columns​

Why these columns matter​

A worked example​

Core taxonomy fields​

FeatureID​

best_taxonomy​

Example​

best_taxonomy_confidence_filtered​

Example​

deepest_confident_rank​

Example 1​

Example 2​

confidence​

Example​

Example​

Tree-aware interpretation columns​

tbas_proxy_lineage​

Example​

tbas_proxy_rank​

Example​

Example​

tree_contains_tbas_genus​

Example​

Example​

tree_contains_database_genus​

Example​

Example​

tbas_forced_placement​

Example​

Conflict and agreement columns​

sources_used​

Example​

agreement_sources​

Example​

conflict_flag​

Example​

Example​

conflict_rank​

Example​

Example​

Change-tracking columns​

taxonomy_change_type​

Example​

Example​

Example​

change_reason_detail​

Example​

Example​

Example​

taxonomy_change_summary​

Example​

Example​

synonym_applied​

Example​

Example​

Special example: database_genus_absent_from_tree​

What it means​

Why this matters​

Example interpretation​

Single-source comparison columns​

Compact source-specific fields​

Example​

Why the documentation must explain examples, not just field names​

Supported reference taxonomies​

Interface overview​

Resolver workflow​

1. Accession input​

Purpose​

How to use it​

Why multiple accessions are important​

Validation​

2. Downstream analysis inputs and exports​

Why this section matters​

Counts file​

Metadata file​

Consolidated Newick tree (optional)​

Export types referenced in this section​

3. DESeq2 analysis​

Run full differential analysis​

What the resolver does

Interpreting resolver output columns

Why these columns matter

A worked example

Core taxonomy fields

`FeatureID`

`best_taxonomy`

Example

`best_taxonomy_confidence_filtered`

Example

`deepest_confident_rank`

Example 1

Example 2

`confidence`

Example

Example

Tree-aware interpretation columns

`tbas_proxy_lineage`

Example

`tbas_proxy_rank`

Example

Example

`tree_contains_tbas_genus`

Example

Example

`tree_contains_database_genus`

Example

Example

`tbas_forced_placement`

Example

Conflict and agreement columns

`sources_used`

Example

`agreement_sources`

Example

`conflict_flag`

Example

Example

`conflict_rank`

Example

Example

Change-tracking columns

`taxonomy_change_type`

Example

Example

Example

`change_reason_detail`

Example

Example

Example

`taxonomy_change_summary`

Example

Example

`synonym_applied`

Example

Example

Special example: `database_genus_absent_from_tree`

What it means

Why this matters

Example interpretation

Single-source comparison columns

Compact source-specific fields

Example

Why the documentation must explain examples, not just field names

Supported reference taxonomies

Interface overview

Resolver workflow

1. Accession input

Purpose

How to use it

Why multiple accessions are important

Validation

2. Downstream analysis inputs and exports

Why this section matters

Counts file

Metadata file

Consolidated Newick tree (optional)

Export types referenced in this section

3. DESeq2 analysis

Run full differential analysis