treeflow package

TreeFlow: automatic differentiation and probabilistic modelling with phylogenetic trees

treeflow.parse_newick(newick_file: str, remove_zero_edges: bool = True, epsilon: float = 1e-06, subnewick_format=0) NumpyRootedTree

Read a rooted TreeFlow Numpy tree from a Newick file

Parameters:
  • newick_file (str) – File to read tree from

  • remove_zero_edges (bool) – Whether to expand zero-length edges (default: True)

  • epsilon (float) – Value to expand zero-length edges to (default: 1e-6)

  • subnewick_format (int) – Format passed to ete3.Tree (default: 0) (see documentation at http://etetoolkit.org/docs/latest/reference/reference_tree.html)

Returns:

Parsed TreeFlow tree composed of Numpy arrays

Return type:

NumpyRootedTree

treeflow.convert_tree_to_tensor(numpy_tree: NumpyRootedTree, height_dtype: DType = tf.float64) TensorflowRootedTree

Convert a TreeFlow tree composed to Numpy arrays to one composed of TensorFlow Tensors

Parameters:
  • numpy_tree (NumpyRootedTree) – Tree to convert

  • height_dtype (tf.DType) – TensorFlow datatype to use for tree times (defaults to TreeFlow default)

Returns:

Tree with arrays converted to Tensors

Return type:

TensorflowRootedTree

treeflow.float_constant(x: float | ndarray | Iterable[float])

Converts a floating point value or array to a constant Tensor with TreeFlow’s default data type

Parameters:

x – Value that can be converted to a Tensor

Returns:

Value converted to a constant tensor

Return type:

tf.Tensor

class treeflow.Alignment(fasta_file: str | bytes | PathLike | None = None, sequence_mapping: Mapping[str, Collection[str]] | None = None, format: AlignmentFormat = AlignmentFormat.FASTA, data_type: AlignmentType = AlignmentType.NUCLEOTIDE)

Bases: object

Class to represent a multiple sequence alignment

Either filename or sequence_mapping must be provided.

Parameters:
  • filename (Optional[PathLikeType]) – Filename of FASTA file that alignment is read from (optional) Filename is passed to open (so can bestring, path or buffer) (default: None)

  • sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)

get_encoded_sequence_array(taxon_names: Iterable[str]) ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

Returns:

One-hot encoded sequence NumPy array with shape [(n_sequences, 4)]

Return type:

np.ndarray

get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

Returns:

One-hot encoded codon-partioned sequence NumPy array with shape [(3, n_codons, 4)]

Return type:

np.ndarray

get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:
  • taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

  • dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(3, n_codons, 4)] and data dtype dtype

Return type:

tf.Tensor

get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:
  • taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

  • dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(n_sequences, 4)] and data dtype dtype

Return type:

tf.Tensor

get_compressed_alignment() WeightedAlignment

Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.

Returns:

The compressed alignment

Return type:

WeightedAlignment

property taxon_count

The number of taxa included in the alignment

property pattern_count

The number of sites in the alignment

class treeflow.WeightedAlignment(pattern_mapping: Mapping[str, Collection[str]], weights: Iterable[float], data_type: AlignmentType = AlignmentType.NUCLEOTIDE)

Bases: Alignment

Class to represent a multiple sequence alignment with numeric weights associated with the sites

Parameters:
  • sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)

  • weights (Iterable[float]) – Weights associated with positions in the sequences

get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

Returns:

One-hot encoded codon-partioned sequence NumPy array with shape [(3, n_codons, 4)]

Return type:

np.ndarray

get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:
  • taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

  • dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(n_sequences, 4)] and data dtype dtype

Return type:

tf.Tensor

get_compressed_alignment() WeightedAlignment

Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.

Returns:

The compressed alignment

Return type:

WeightedAlignment

get_encoded_sequence_array(taxon_names: Iterable[str]) ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

Returns:

One-hot encoded sequence NumPy array with shape [(n_sequences, 4)]

Return type:

np.ndarray

get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:
  • taxon_names (Iterable[str]) – Order of taxa to use in the encoded array

  • dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(3, n_codons, 4)] and data dtype dtype

Return type:

tf.Tensor

get_weights_array() ndarray

Get the site weights as a NumPy array

Returns:

Site weights array

Return type:

np.ndarray

property pattern_count

The number of sites in the alignment

property taxon_count

The number of taxa included in the alignment

get_weights_tensor(dtype=tf.float64) Tensor

Get the site weights as a TensorFlow Tensor

Parameters:

dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

Site weights constant Tensor

Return type:

tf.Tensor

class treeflow.AlignmentType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enum to represent the character data type of a sequence alignment

NUCLEOTIDE = 'nucleotide'
PROTEIN = 'protein'
class treeflow.AlignmentFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enum to represent the file format of a multiple sequence alignment

FASTA = 'fasta'
NEXUS = 'nexus'
NEXML = 'nexml'
PHYLIP = 'phylip'
class treeflow.PhyloModel(model_dict: Dict[str, str | Dict[str, object]])

Bases: object

Class to represent the configuration of a basic phylogenetic model

classmethod check_model_dict(model_dict: Dict[str, str | Dict[str, object]])
all_params() Dict[str, object]
free_params() Dict[str, Dict[str, object]]
relaxed_clock() bool
treeflow.phylo_model_to_joint_distribution(model: PhyloModel, initial_tree: TensorflowRootedTree, initial_alignment: Alignment, pattern_counts: Tensor | None = None) JointDistributionCoroutine
treeflow.write_tensor_trees(topology_file: str, branch_lengths: Tensor, output_file: str, branch_metadata: Mapping[str, Tensor] | None = None)

Write a collection of Tensor tree branch lengths, and possibly branch metadata, to a Nexus file

Parameters:
  • topology_file (str) – Newick file to read tree topology from

  • branch_lengths (Tensor) – Tensor of branch lengths with dimensions (num_samples, num_branches)

  • output_file (str) – File to write trees to in Nexus format

  • branch_metadata (Mapping[str, Tensor] (optional)) – Mapping from keys to Tensors with dimensions (num_samples, num_branches) containing branch metadata

Subpackages