treeflow package
TreeFlow: automatic differentiation and probabilistic modelling with phylogenetic trees
- treeflow.parse_newick(newick_file: str, remove_zero_edges: bool = True, epsilon: float = 1e-06, subnewick_format=0) NumpyRootedTree
Read a rooted TreeFlow Numpy tree from a Newick file
- Parameters:
newick_file (str) – File to read tree from
remove_zero_edges (bool) – Whether to expand zero-length edges (default: True)
epsilon (float) – Value to expand zero-length edges to (default: 1e-6)
subnewick_format (int) – Format passed to ete3.Tree (default: 0) (see documentation at http://etetoolkit.org/docs/latest/reference/reference_tree.html)
- Returns:
Parsed TreeFlow tree composed of Numpy arrays
- Return type:
- treeflow.convert_tree_to_tensor(numpy_tree: NumpyRootedTree, height_dtype: DType = tf.float64) TensorflowRootedTree
Convert a TreeFlow tree composed to Numpy arrays to one composed of TensorFlow Tensors
- Parameters:
numpy_tree (NumpyRootedTree) – Tree to convert
height_dtype (tf.DType) – TensorFlow datatype to use for tree times (defaults to TreeFlow default)
- Returns:
Tree with arrays converted to Tensors
- Return type:
- treeflow.float_constant(x: float | ndarray | Iterable[float])
Converts a floating point value or array to a constant Tensor with TreeFlow’s default data type
- Parameters:
x – Value that can be converted to a Tensor
- Returns:
Value converted to a constant tensor
- Return type:
tf.Tensor
- class treeflow.Alignment(fasta_file: str | bytes | PathLike | None = None, sequence_mapping: Mapping[str, Collection[str]] | None = None, format: AlignmentFormat = AlignmentFormat.FASTA, data_type: AlignmentType = AlignmentType.NUCLEOTIDE)
Bases:
objectClass to represent a multiple sequence alignment
Either
filenameorsequence_mappingmust be provided.- Parameters:
filename (Optional[PathLikeType]) – Filename of FASTA file that alignment is read from (optional) Filename is passed to
open(so can bestring, path or buffer) (default: None)sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)
- get_encoded_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded sequence NumPy array with shape
[(n_sequences, 4)]- Return type:
np.ndarray
- get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded codon-partioned sequence NumPy array with shape
[(3, n_codons, 4)]- Return type:
np.ndarray
- get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(3, n_codons, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(n_sequences, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_compressed_alignment() WeightedAlignment
Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.
- Returns:
The compressed alignment
- Return type:
- property taxon_count
The number of taxa included in the alignment
- property pattern_count
The number of sites in the alignment
- class treeflow.WeightedAlignment(pattern_mapping: Mapping[str, Collection[str]], weights: Iterable[float], data_type: AlignmentType = AlignmentType.NUCLEOTIDE)
Bases:
AlignmentClass to represent a multiple sequence alignment with numeric weights associated with the sites
- Parameters:
sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)
weights (Iterable[float]) – Weights associated with positions in the sequences
- get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded codon-partioned sequence NumPy array with shape
[(3, n_codons, 4)]- Return type:
np.ndarray
- get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(n_sequences, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_compressed_alignment() WeightedAlignment
Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.
- Returns:
The compressed alignment
- Return type:
- get_encoded_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded sequence NumPy array with shape
[(n_sequences, 4)]- Return type:
np.ndarray
- get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(3, n_codons, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_weights_array() ndarray
Get the site weights as a NumPy array
- Returns:
Site weights array
- Return type:
np.ndarray
- property pattern_count
The number of sites in the alignment
- property taxon_count
The number of taxa included in the alignment
- get_weights_tensor(dtype=tf.float64) Tensor
Get the site weights as a TensorFlow Tensor
- Parameters:
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
Site weights constant Tensor
- Return type:
tf.Tensor
- class treeflow.AlignmentType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
EnumEnum to represent the character data type of a sequence alignment
- NUCLEOTIDE = 'nucleotide'
- PROTEIN = 'protein'
- class treeflow.AlignmentFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
EnumEnum to represent the file format of a multiple sequence alignment
- FASTA = 'fasta'
- NEXUS = 'nexus'
- NEXML = 'nexml'
- PHYLIP = 'phylip'
- class treeflow.PhyloModel(model_dict: Dict[str, str | Dict[str, object]])
Bases:
objectClass to represent the configuration of a basic phylogenetic model
- classmethod check_model_dict(model_dict: Dict[str, str | Dict[str, object]])
- all_params() Dict[str, object]
- free_params() Dict[str, Dict[str, object]]
- relaxed_clock() bool
- treeflow.phylo_model_to_joint_distribution(model: PhyloModel, initial_tree: TensorflowRootedTree, initial_alignment: Alignment, pattern_counts: Tensor | None = None) JointDistributionCoroutine
- treeflow.write_tensor_trees(topology_file: str, branch_lengths: Tensor, output_file: str, branch_metadata: Mapping[str, Tensor] | None = None)
Write a collection of Tensor tree branch lengths, and possibly branch metadata, to a Nexus file
- Parameters:
topology_file (str) – Newick file to read tree topology from
branch_lengths (Tensor) – Tensor of branch lengths with dimensions (num_samples, num_branches)
output_file (str) – File to write trees to in Nexus format
branch_metadata (Mapping[str, Tensor] (optional)) – Mapping from keys to Tensors with dimensions (num_samples, num_branches) containing branch metadata