treeflow package

TreeFlow: automatic differentiation and probabilistic modelling with phylogenetic trees

treeflow.parse_newick(newick_file: str, remove_zero_edges: bool = True, epsilon: float = 1e-06, subnewick_format=0) → NumpyRootedTree

Read a rooted TreeFlow Numpy tree from a Newick file

Parameters:

newick_file (str) – File to read tree from
remove_zero_edges (bool) – Whether to expand zero-length edges (default: True)
epsilon (float) – Value to expand zero-length edges to (default: 1e-6)
subnewick_format (int) – Format passed to ete3.Tree (default: 0) (see documentation at http://etetoolkit.org/docs/latest/reference/reference_tree.html)

Returns:

Parsed TreeFlow tree composed of Numpy arrays

Return type:

NumpyRootedTree

treeflow.convert_tree_to_tensor(numpy_tree: NumpyRootedTree, height_dtype: DType = tf.float64) → TensorflowRootedTree

Convert a TreeFlow tree composed to Numpy arrays to one composed of TensorFlow Tensors

Parameters:

numpy_tree (NumpyRootedTree) – Tree to convert
height_dtype (tf.DType) – TensorFlow datatype to use for tree times (defaults to TreeFlow default)

Returns:

Tree with arrays converted to Tensors

Return type:

TensorflowRootedTree

treeflow.float_constant(x: float | ndarray | Iterable[float])

Converts a floating point value or array to a constant Tensor with TreeFlow’s default data type

Parameters:: x – Value that can be converted to a Tensor
Returns:: Value converted to a constant tensor
Return type:: tf.Tensor

class treeflow.Alignment(fasta_file: str | bytes | PathLike | None = None, sequence_mapping: Mapping[str, Collection[str]] | None = None, format: AlignmentFormat = AlignmentFormat.FASTA, data_type: AlignmentType = AlignmentType.NUCLEOTIDE)

Bases: object

Class to represent a multiple sequence alignment

Either filename or sequence_mapping must be provided.

Parameters:

filename (Optional[PathLikeType]) – Filename of FASTA file that alignment is read from (optional) Filename is passed to open (so can bestring, path or buffer) (default: None)
sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)

get_encoded_sequence_array(taxon_names: Iterable[str]) → ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:: taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
Returns:: One-hot encoded sequence NumPy array with shape [(n_sequences, 4)]
Return type:: np.ndarray

get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) → ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:: taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
Returns:: One-hot encoded codon-partioned sequence NumPy array with shape [(3, n_codons, 4)]
Return type:: np.ndarray

get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) → Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(3, n_codons, 4)] and data dtype dtype

Return type:

tf.Tensor

get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) → Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(n_sequences, 4)] and data dtype dtype

Return type:

tf.Tensor

get_compressed_alignment() → WeightedAlignment

Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.

Returns:: The compressed alignment
Return type:: WeightedAlignment

property taxon_count: The number of taxa included in the alignment

property pattern_count: The number of sites in the alignment

class treeflow.WeightedAlignment(pattern_mapping: Mapping[str, Collection[str]], weights: Iterable[float], data_type: AlignmentType = AlignmentType.NUCLEOTIDE)

Bases: Alignment

Class to represent a multiple sequence alignment with numeric weights associated with the sites

Parameters:

sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)
weights (Iterable[float]) – Weights associated with positions in the sequences

get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) → ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:: taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
Returns:: One-hot encoded codon-partioned sequence NumPy array with shape [(3, n_codons, 4)]
Return type:: np.ndarray

get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) → Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(n_sequences, 4)] and data dtype dtype

Return type:

tf.Tensor

get_compressed_alignment() → WeightedAlignment

Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.

Returns:: The compressed alignment
Return type:: WeightedAlignment

get_encoded_sequence_array(taxon_names: Iterable[str]) → ndarray

Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.

Parameters:: taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
Returns:: One-hot encoded sequence NumPy array with shape [(n_sequences, 4)]
Return type:: np.ndarray

get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) → Tensor

Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.

Parameters:

taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)

Returns:

One-hot encoded sequence TensorFlow tensor with shape [(3, n_codons, 4)] and data dtype dtype

Return type:

tf.Tensor

get_weights_array() → ndarray

Get the site weights as a NumPy array

Returns:: Site weights array
Return type:: np.ndarray

property pattern_count: The number of sites in the alignment

property taxon_count: The number of taxa included in the alignment

get_weights_tensor(dtype=tf.float64) → Tensor

Get the site weights as a TensorFlow Tensor

Parameters:: dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
Returns:: Site weights constant Tensor
Return type:: tf.Tensor

class treeflow.AlignmentType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enum to represent the character data type of a sequence alignment

NUCLEOTIDE = 'nucleotide'

PROTEIN = 'protein'

class treeflow.AlignmentFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enum to represent the file format of a multiple sequence alignment

FASTA = 'fasta'

NEXUS = 'nexus'

NEXML = 'nexml'

PHYLIP = 'phylip'

class treeflow.PhyloModel(model_dict: Dict[str, str | Dict[str, object]])

Bases: object

Class to represent the configuration of a basic phylogenetic model

classmethod check_model_dict(model_dict: Dict[str, str | Dict[str, object]])

all_params() → Dict[str, object]

free_params() → Dict[str, Dict[str, object]]

relaxed_clock() → bool

treeflow.phylo_model_to_joint_distribution(model: PhyloModel, initial_tree: TensorflowRootedTree, initial_alignment: Alignment, pattern_counts: Tensor | None = None) → JointDistributionCoroutine

treeflow.write_tensor_trees(topology_file: str, branch_lengths: Tensor, output_file: str, branch_metadata: Mapping[str, Tensor] | None = None)

Write a collection of Tensor tree branch lengths, and possibly branch metadata, to a Nexus file

Parameters:

topology_file (str) – Newick file to read tree topology from
branch_lengths (Tensor) – Tensor of branch lengths with dimensions (num_samples, num_branches)
output_file (str) – File to write trees to in Nexus format
branch_metadata (Mapping[str, Tensor] (optional)) – Mapping from keys to Tensors with dimensions (num_samples, num_branches) containing branch metadata

treeflow package

Subpackages