treeflow.evolution.seqio module
- class treeflow.evolution.seqio.AlignmentFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
EnumEnum to represent the file format of a multiple sequence alignment
- FASTA = 'fasta'
- NEXUS = 'nexus'
- NEXML = 'nexml'
- PHYLIP = 'phylip'
- class treeflow.evolution.seqio.AlignmentType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
EnumEnum to represent the character data type of a sequence alignment
- NUCLEOTIDE = 'nucleotide'
- PROTEIN = 'protein'
- class treeflow.evolution.seqio.Alignment(fasta_file: str | bytes | PathLike | None = None, sequence_mapping: Mapping[str, Collection[str]] | None = None, format: AlignmentFormat = AlignmentFormat.FASTA, data_type: AlignmentType = AlignmentType.NUCLEOTIDE)
Bases:
objectClass to represent a multiple sequence alignment
Either
filenameorsequence_mappingmust be provided.- Parameters:
filename (Optional[PathLikeType]) – Filename of FASTA file that alignment is read from (optional) Filename is passed to
open(so can bestring, path or buffer) (default: None)sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)
- get_encoded_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded sequence NumPy array with shape
[(n_sequences, 4)]- Return type:
np.ndarray
- get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded codon-partioned sequence NumPy array with shape
[(3, n_codons, 4)]- Return type:
np.ndarray
- get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(3, n_codons, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(n_sequences, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_compressed_alignment() WeightedAlignment
Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.
- Returns:
The compressed alignment
- Return type:
- property taxon_count
The number of taxa included in the alignment
- property pattern_count
The number of sites in the alignment
- class treeflow.evolution.seqio.WeightedAlignment(pattern_mapping: Mapping[str, Collection[str]], weights: Iterable[float], data_type: AlignmentType = AlignmentType.NUCLEOTIDE)
Bases:
AlignmentClass to represent a multiple sequence alignment with numeric weights associated with the sites
- Parameters:
sequence_mapping (Optional[Mapping[str, Collection[str]]]) – Mapping from names to sequences (optional) (default: None)
weights (Iterable[float]) – Weights associated with positions in the sequences
- get_codon_partitioned_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the array. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded codon-partioned sequence NumPy array with shape
[(3, n_codons, 4)]- Return type:
np.ndarray
- get_codon_partitioned_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering, and partioned into codon positions Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(n_sequences, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_compressed_alignment() WeightedAlignment
Compress an alignment by selecting sites where the mapping from taxa to characters are unique and weighting them by the number of times they occur.
- Returns:
The compressed alignment
- Return type:
- get_encoded_sequence_array(taxon_names: Iterable[str]) ndarray
Build a one-hot encoded NumPy array for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
- Returns:
One-hot encoded sequence NumPy array with shape
[(n_sequences, 4)]- Return type:
np.ndarray
- get_encoded_sequence_tensor(taxon_names: Iterable[str], dtype: DType = tf.float64) Tensor
Build a one-hot encoded TensorFlow Tensor constant for the alignment according to the provided taxon ordering Currently only supports nucleotide sequences, uses ACGT ordering. The codon positions are the first axis of the Tensor. If the number of sites is not a multiple of 3, the sequences are padded with gaps.
- Parameters:
taxon_names (Iterable[str]) – Order of taxa to use in the encoded array
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
One-hot encoded sequence TensorFlow tensor with shape
[(3, n_codons, 4)]and data dtypedtype- Return type:
tf.Tensor
- get_weights_array() ndarray
Get the site weights as a NumPy array
- Returns:
Site weights array
- Return type:
np.ndarray
- property pattern_count
The number of sites in the alignment
- property taxon_count
The number of taxa included in the alignment
- get_weights_tensor(dtype=tf.float64) Tensor
Get the site weights as a TensorFlow Tensor
- Parameters:
dtype (tf.DType) – TensorFlow data type for the returned array (defaults to package default)
- Returns:
Site weights constant Tensor
- Return type:
tf.Tensor