Trees

Loading a tree from a file and visualizing it with ascii_art()

Writing a tree to a file

Getting the individual nodes of a tree by name

Getting the name of a node (or a tree)

The object type of a tree and its nodes is the same

Working with the nodes of a tree

Get all the nodes, tips and edges

only the terminal nodes (tips)

for internal nodes (edges) we can use Newick format to simplify the output

Getting the path between two tips or edges (connecting edges)

Getting the distance between two nodes

Getting the last common ancestor (LCA) for two nodes

Getting all the ancestors for a node

Getting all the children for a node

Getting all the distances for a tree

We also show how to select a subset of distances involving just one species.

Getting the two nodes that are farthest apart

Get the nodes within a given distance

Rerooting trees

At a named node

At the midpoint

Near a given tip

Tree representations

Newick format

XML format

Tree traversal

Here is the example tree for reference:

Preorder

Postorder

Selecting subtrees

One way to do it

Tree manipulation methods

Pruning the tree

Remove internal nodes with only one child. Create new connections and branch lengths (if tree is a PhyloNode) to reflect the change.

Create a full unrooted copy of the tree

Transform tree into a bifurcating tree

Add internal nodes so that every node has 2 or fewer children.

Transform tree into a balanced tree

Using a balanced tree can substantially improve performance of likelihood calculations. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterisation should be done using the “outgroup_name” argument.

Test two trees for same topology

Branch lengths don’t matter.

Measure topological distances between two trees

A number of topological tree distance metrics are available. They include:

  • The Robinson-Foulds Distance for rooted trees.

  • The Matching Cluster Distance for rooted trees.

  • The Robinson-Foulds Distance for unrooted trees.

  • The Lin-Rajan-Moret Distance for unrooted trees.

There are several variations of the Robinson-Foulds metric in the literature. The definition used by cogent3 is the cardinality of the symmetric difference of the sets of clades/splits in the two rooted/unrooted trees. Other definitions sometimes divide this by two, or normalise it to the unit interval.

The Robinson-Foulds distance is quick to compute, but is known to saturate quickly. Moving a single leaf in a tree can maximise this metric.

The Matching Cluster and Lin-Rajan-Moret are two matching-based distances that are more statistically robust. Unlike the Robinson-Foulds distance which counts how many of the splits/clades are not exactly same, the matching-based distances measures the degree by which the splits/clades are different. The matching-based distances solve a min-weight matching problem, which for large trees may take longer to compute.

Calculate each node’s maximum distance to a tip

Sets each node’s “TipDistance” attribute to be the distance from that node to its most distant tip.

Scale branch lengths in place to integers for ascii output

Get tip-to-tip distances

Get a distance matrix between all pairs of tips and a list of the tip nodes.

Compare two trees using tip-to-tip distance matrices

Score ranges from 0 (minimum distance) to 1 (maximum distance). The default is to use Pearson’s correlation, in which case a score of 0 means that the Pearson’s correlation was perfectly good (1), and a score of 1 means that the Pearson’s correlation was perfectly bad (-1).

Note: automatically strips out the names that don’t match.