gpmap
¶
The Pandas DataFrame for genotype-phenotype (GP) map data.

The GenotypePhenotypeMap
is a core object for a suite of packages written
in the Harms Lab. It organizes and standardizes genotype-phenotype map data.
Basic Example¶
# Import the GenotypePhenotypeMap
from gpmap import GenotypePhenotypeMap
# The data
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.5, 0.2, 0.8]
stdeviations = [0.05, 0.05, 0.05, 0.05]
# Initialize a GenotypePhenotype object
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes,
stdeviations=stdeviations)
# Show the dataFrame
gpm.data

Documentation¶
Quick start¶
GPMap is a small Python package that subsets the pandas DataFrame to handle genotype-phenotype map data. The package include utilities to read/write data to/from disk, enumerate large sequence/genotype spaces efficiently, and compute various statistics from an arbitrary genotype-phenotype map.
GenotypePhenotypeMap
object¶
The main object in gpmap
is the GenotypePhenotypeMap
object. The object stores data as a Pandas DataFrame, which can be accessed through the .data
attribute. Your object will look something like this:
from gpmap import GenotypePhenotypeMap
# Data
wildtype = "AAA"
genotypes = ["AAA", "AAT", "ATA", "TAA", "ATT", "TAT", "TTA", "TTT"]
phenotypes = [0.1, 0.2, 0.2, 0.6, 0.4, 0.6, 1.0, 1.1]
stdeviations = [0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]
mutations = {
0: ["A", "T"],
1: ["A", "T"],
2: ["A", "T"]
}
# Initialize the object
gpm = GenotypePhenotypeMap(
wildtype,
genotypes,
phenotypes,
mutations=mutations,
stdeviations=stdeviations
)
# Check out the data.
gpm.data

The underlying DataFrame will have at least 5 columns: genotypes, phenotypes, stdeviations, n_replicates, and binary. The binary column is computed by the GenotypePhenotypeMap
object
mutations
dictionary¶
The mutations
dictionary tells GPMap what mutations, indels, etc. should be incorporated in the map. It is the most important data you pass to GPMap.
It is a regular Python dictionary and looks something like:
wildtype = "AA"
mutations = {
0: ["A", "B"],
1: ["A", "B"]
}
The key represents the position of site, and the value represents the states possible at each site. In this example, the sequences have two sites and each site is either “A” or “T”.
Non-changing sites
If a site doesn’t mutate, give it a value of None
.
wildtype = "AAA"
mutations = {
0: ["A", "T"],
1: ["A", "T"],
2: None # This site does not change
}
Here, site 2 does not change. All sequences will only have an “A” at that site.
Indels
You can incorporate indels using the gap character:
wildtype = "AAAA"
mutations = {
0: ["A", "T"],
1: ["A", "T"],
2: None,
3: ["A", "-"] # Sometimes, this site doesn't exist.
}
Here, site 3 will toggle between an “A” and a missing residue “-” (deletion).
Port to NetworkX¶
In many cases, you might be interested in porting a GenotypePhenotypeMap
to NetworkX. NetworkX provides powerful functions for analyzing and plotting complex graphs. We have written a separate package, named gpgraph
, to easily port GenotypePhenotypeMap
to NetworkX.
Helpful functions¶
GPMap comes with many helpful functions for enumerating genotype-phenotype maps. This page provides a simple list of those functions.
- Get all genotypes from mutations
- Get a list of missing genotypes from a list of genotypes
- Get mutations from a list of genotypes
- Get binary representation of genotypes
- Get mutation encoding table
Get all genotypes from mutations¶
from gpmap.utils import genotypes_to_mutations
wildtype = "AAA"
genotypes = [
"AAA",
"AAB",
"ABA",
"BAA",
"ABB",
"BAB",
"BBA",
"BBB"
]
mutations = genotypes_to_mutations(genotypes)
Get mutation encoding table¶
from gpmap.utils import get_encoding_table
wildtype = "AA"
mutations = {
0: ["A", "B"],
1: ["A", "B"]
}
get_encoding_table(wildtype, mutations)
binary_index_start | binary_index_stop | binary_repr | genotype_index | mutation_index | mutation_letter | wildtype_letter | |
---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 0 | NaN | A | A |
1 | 0 | 1 | 1 | 0 | 1 | B | A |
2 | 1 | 2 | 0 | 1 | NaN | A | A |
3 | 1 | 2 | 1 | 1 | 2 | B | A |
Get mutations from a list of genotypes¶
from gpmap.utils import mutations_to_genotypes
mutations = {0: ['A', 'B'], 1: ['A', 'B'], 2: ['A', 'B']}
mutations_to_genotypes(mutations)
# ['AAA', 'AAB', 'ABA', 'ABB', 'BAA', 'BAB', 'BBA', 'BBB']
Get binary representation of genotypes¶
from gpmap.utils import genotypes_to_binary, get_encoding_table
wildtype = 'AAA'
genotypes = [
"AAA",
"AAB",
"ABA",
"BAA",
"ABB",
"BAB",
"BBA",
"BBB"
]
mutations = {0: ['A', 'B'], 1: ['A', 'B'], 2: ['A', 'B']}
table = get_encoding_table(wildtype, mutations)
binary = genotypes_to_binary(genotypes, table)
# ['000', '001', '010', '100', '011', '101', '110', '111']
Get a list of missing genotypes from a list of genotypes¶
from gpmap.utils import get_missing_genotypes
genotypes = ["AAA","BBB"]
get_missing_genotypes(genotypes)
# ['BBA', 'BAB', 'ABB', 'ABA', 'AAB', 'BAA']
Simulating genotype-phenotype maps¶
The GPMap package comes with a suite of objects to simulate genotype-phenotype
maps following models in the literature. They are found in the gpmap.simulate
module.
All Simulation objects inherit the GenotypePhenotypeMap
object as their base
class. Thus, anything you can do with a GenotypePhenotypeMap, you can do with the
simulation objects.
NK landscape¶
Construct a genotype-phenotype map using Kauffman’s NK Model. [1] The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotypes.
For example, imagine an NK simulation with \(N=5\) and \(K=2\). To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table: “01”, “10”, “01”, “11”, “10”. Sum their values together.
# import the NKSimulation class
from gpmap.simulate import NKSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = NKSimulation.from_length(6, K=3)
House of Cards landscape¶
Construct a ‘House of Cards’ fitness landscape. This is a limit of the NK model where \(K=N\). It represents a fitness landscape with maximum roughness.
# import the HouseOfCardsSimulation class
from gpmap.simulate import HouseOfCardsSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = HouseOfCardsSimulation.from_length(6)
Mount Fuji landscape¶
Construct a genotype-phenotype map from a Mount Fuji model. [2]
A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength.
Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random roughness value to each genotype.
where \(\nu\) is the roughness parameter, \(c\) is the field strength, and \(d\) is the hamming distance between genotype \(g\) and the reference genotype.
# import the HouseOfCardsSimulation class
from gpmap.simulate import MountFujiSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = MountFujiSimulation.from_length(6
roughness_width=0.5,
roughness_dist='normal'
)
References¶
[1] | Kauffman, Stuart A., and Edward D. Weinberger. “The NK model of rugged fitness landscapes and its application to maturation of the immune response.” Journal of theoretical biology 141.2 (1989): 211-245. |
[2] | Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005. |
Reading/Writing¶
The GenotypePhenotypeMap
object is a Pandas DataFrame at its core. Most
tabular formats (i.e. Excel files, csv, tsv, …) can be read/written.
Excel Spreadsheets¶
Excel files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: Excel spreadsheet file (“data.xlsx”)
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the spreadsheet directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")
CSV File¶
CSV files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: CSV File
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the csv directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")
JSON Format¶
The only keys recognized by the json reader are:
- genotypes
- phenotypes
- stdeviations
- mutations
- n_replicates
All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.
{
"genotypes" : [
'000',
'001',
'010',
'011',
'100',
'101',
'110',
'111'
],
"phenotypes" : [
0.62344582,
0.87943151,
-0.11075798,
-0.59754471,
1.4314798,
1.12551439,
1.04859722,
-0.27145593
],
"stdeviations" : [
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
],
"mutations" : {
0 : ["0", "1"],
1 : ["0", "1"],
2 : ["0", "1"],
}
"n_replicates" : 12,
"title" : "my data",
"description" : "a really hard experiment"
}
API Documentation¶
The GenotypePhenotypeMap
is the main entry point to the gpmap package. Load
in your data using the read
methods attached to this object. The following
subpackages include various objects to analyze this object.
Subpackages¶
gpmap.errors module¶
-
class
gpmap.errors.
BaseErrorMap
(Map)¶ Bases:
object
Object to attach to seqspace objects for managing errors, standard deviations, and their log transforms.
If a lower bound is given, use it instead of -variances.
-
lower
¶ Get lower error bound.
-
upper
¶ Get upper error bound
-
wrapper
(bound, **kwargs)¶ Wrapper function that changes variances to whatever bound desired.
-
-
class
gpmap.errors.
StandardDeviationMap
(Map)¶ Bases:
gpmap.errors.BaseErrorMap
-
wrapper
(bounds, **kwargs)¶ Wrapper function to convert Variances if necessary
-
-
class
gpmap.errors.
StandardErrorMap
(Map)¶ Bases:
gpmap.errors.BaseErrorMap
-
wrapper
(bounds)¶ Wrapper function to convert Variances if necessary
-
-
gpmap.errors.
lower_transform
(mean, bound, logbase)¶ Log transformation scaling.
Examples
Untransformed data looks as so:
Yupper = Ymean + bound Ylower = Ymean - bound- We want log(bounds)
- ie.
- log(Yupper) - log(Ymean) log(Ylower) + log(Ymean)
- so log(bound) = log(1 + bound/Ymean)
- log(bound) = log(1 - bound/Ymean)
-
gpmap.errors.
upper_transform
(mean, bound, logbase)¶ Log transformation scaling.
Examples
Untransformed data looks as so:
Yupper = Ymean + bound Ylower = Ymean - bound- We want log(bounds)
- ie.
- log(Yupper) - log(Ymean) log(Ylower) + log(Ymean)
- so log(bound) = log(1 + bound/Ymean)
- log(bound) = log(1 - bound/Ymean)
gpmap.stats module¶
-
gpmap.stats.
c4_correction
(n_samples)¶ Return the correction scalar for calculating standard deviation from a normal distribution.
-
gpmap.stats.
corrected_std
(var, n_samples=2)¶ Calculate the unbiased standard deviation from a biased standard deviation.
-
gpmap.stats.
corrected_sterror
(var, n_samples=2)¶ Calculate an unbiased standard error from a BIASED standard deviation.
-
gpmap.stats.
coverage
(gpm)¶
-
gpmap.stats.
unbiased_std
(x, axis=None)¶ A correction to numpy’s standard deviation calculation. Calculate the unbiased estimation of standard deviation, which includes a correction factor for sample sizes < 100.
-
gpmap.stats.
unbiased_sterror
(x, axis=None)¶ Unbiased error.
-
gpmap.stats.
unbiased_var
(x, axis=None)¶ This enforces that the unbias estimate for variance is calculated
gpmap.utils module¶
Utility functions for managing genotype-phenotype map data and conversions.
Glossary:¶
- mutations : doct
- keys are site numbers in the genotypes. Values are alphabet of mutations at that sites
- encoding : dict
- keys are site numbers in genotype. Values are dictionaries mapping each mutation to its binary representation.
-
gpmap.utils.
farthest_genotype
(reference, genotypes)¶ Find the genotype in the system that differs at the most sites.
-
gpmap.utils.
find_differences
(s1, s2)¶ Return the index of differences between two sequences.
-
gpmap.utils.
genotypes_to_binary
(genotypes, encoding_table)¶ Using an encoding table (see get_encoding_table function), build a set of binary genotypes.
Parameters: - genotypes – List of the genotypes to encode.
- encoding_table – DataFrame that encodes the binary representation of each mutation in the list of genotypes. (See the get_encoding_table).
-
gpmap.utils.
genotypes_to_mutations
(genotypes)¶ Create mutations dictionary from a list of mutations.
-
gpmap.utils.
get_base
(logbase)¶ Get base from logbase :param logbase: logarithm function :type logbase: callable
Returns: base – returns base of logarithm. Return type: float
-
gpmap.utils.
get_encoding_table
(wildtype, mutations, site_labels=None)¶ This function constructs a lookup table (pandas.DataFrame) for mutations in a given mutations dictionary. This table encodes mutations with a binary representation.
-
gpmap.utils.
get_missing_genotypes
(genotypes, mutations=None)¶ Get a list of genotypes not found in the given genotypes list.
Parameters: - genotypes (list) – List of genotypes.
- mutations (dict (optional)) – Mutation dictionary
Returns: missing_genotypes – List of genotypes not found in genotypes list.
Return type: list
-
gpmap.utils.
hamming_distance
(s1, s2)¶ Return the Hamming distance between equal-length sequences
-
gpmap.utils.
ipywidgets_missing
(function)¶ Wrapper checks that ipython widgets are install before trying to render them.
-
gpmap.utils.
length_to_mutations
(length, alphabet=['0', '1'])¶ Build a mutations dictionary for a given alphabet
Parameters: - length (int) – length of the genotypes
- alphabet (list) – List of mutations at each site.
-
gpmap.utils.
list_binary
(length)¶ List all binary strings with given length.
-
gpmap.utils.
mutations_to_encoding
(wildtype, mutations)¶ Encoding map for genotype-to-binary
Parameters: - wildtype (str) – Wildtype sequence.
- mutations (dict) – Mapping of each site’s mutation alphabet. {site-number: [alphabet]}
Returns: encode – Encoding dictionary that maps site number to mutation-binary map
Return type: OrderedDict of OrderDicts
Examples
{ <site-number> : {<mutation>: <binary>} }
-
gpmap.utils.
mutations_to_genotypes
(mutations, wildtype=None)¶ Use a mutations dictionary to construct an array of genotypes composed of those mutations.
Parameters: - mutations (dict) – A mapping dict with site numbers as keys and lists of mutations as values.
- wildtype (str) – wildtype genotype (as string).
Returns: genotypes – list of genotypes comprised of mutations in given dictionary.
Return type: list
-
gpmap.utils.
sample_phenotypes
(phenotypes, errors, n=1)¶ Generate n phenotypes from from normal distributions.
gpmap.simulate¶
gpmap.simulate.base module¶
-
class
gpmap.simulate.base.
BaseSimulation
(wildtype, mutations, *args, **kwargs)¶ Bases:
gpmap.gpm.GenotypePhenotypeMap
Build a simulated GenotypePhenotypeMap. Generates random phenotypes.
-
build
()¶
-
classmethod
from_length
(length, alphabet_size=2, *args, **kwargs)¶ Create a simulate genotype-phenotype map from a given genotype length.
Parameters: - length (int) – length of genotypes
- alphabet_size (int (optional)) – alphabet size
Returns: self
Return type: GenotypePhenotypeSimulation
-
set_stdeviations
(sigma)¶ Add standard deviations to the simulated phenotypes, which can then be used for sampling error in the genotype-phenotype map.
Parameters: sigma (float or array-like) – Adds standard deviations to the phenotypes. If float, all phenotypes are given the same stdeviations. Else, array must be same length as phenotypes and will be assigned to each phenotype.
-
-
gpmap.simulate.base.
random_mutation_set
(length, alphabet_size=2, type='AA')¶ Generate a random mutations dictionary for simulations.
Parameters: - length (length of genotypes) –
- alphabet_size (int or list) – alphabet size at each site. if list is given, will make site i have size alphab_size[i].
- type ('AA' or "DNA') – Use amino acid alphabet or DNA alphabet
gpmap.simulate.fuji module¶
-
class
gpmap.simulate.fuji.
MountFujiSimulation
(wildtype, mutations, field_strength=1, roughness_width=None, roughness_dist='normal', *args, **kwargs)¶ Bases:
gpmap.simulate.base.BaseSimulation
Constructs a genotype-phenotype map from a Mount Fuji model. [1]_
A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength. Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random
\[f(g) = \nu (g) + c \cdot d(g_0, g)\]where $nu$ is the roughness parameter, $c$ is the field strength, and $d$ is the hamming distance between genotype $g$ and the reference genotype.
Parameters: - wildtype (str) – reference genotype to put the
- mutations (dict) – mutations alphabet for each site
- field_strength (float) – field strength
- roughness_width (float) – Width of roughness distribution
- roughness_dist (str, 'normal') – Distribution used to create noise around phenotypes.
References
- _ [1] Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness
- landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005.
-
build
()¶ Construct phenotypes using a rough Mount Fuji model.
-
field_strength
¶
-
classmethod
from_length
(length, field_strength=1, roughness_width=None, roughness_dist='normal', *args, **kwargs)¶ Constructs a genotype-phenotype map from a Mount Fuji model. [1]_
A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength. Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random
\[f(g) = \nu (g) + c \cdot d(g_0, g)\]where $nu$ is the roughness parameter, $c$ is the field strength, and $d$ is the hamming distance between genotype $g$ and the reference genotype.
Parameters: - length (int) – length of the genotypes.
- field_strength (float) – field strength
- roughness_width (float) – Width of roughness distribution
- roughness_dist (str, 'normal') – Distribution used to create noise around phenotypes.
-
hamming
¶ Hamming distance from reference
-
roughess_dist
¶ Roughness distribution.
-
roughness
¶ Array of roughness values for all genotypes
-
roughness_dist
¶ Roughness distribution.
-
roughness_width
¶
-
scale
¶ Mt. Fuji phenotypes without noise.
gpmap.simulate.hoc module¶
-
class
gpmap.simulate.hoc.
HouseOfCardsSimulation
(wildtype, mutations, k_range=(0, 1), *args, **kwargs)¶ Bases:
gpmap.simulate.nk.NKSimulation
Construct a ‘House of Cards’ fitness landscape.
gpmap.simulate.nk module¶
-
class
gpmap.simulate.nk.
NKSimulation
(wildtype, mutations, K, k_range=(0, 1), *args, **kwargs)¶ Bases:
gpmap.simulate.base.BaseSimulation
Generate genotype-phenotype map from NK fitness model. Creates a table with binary sub-sequences that determine the order of epistasis in the model.
The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotype.
For example, imagine an NK simulation with N=5 and K=2. To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table “01”, “10”, “01”, “11”, “10”. Sum their values together.
-
nk_table
¶ table with binary sub-sequences as keys which are used to construct phenotypes following an NK routine
Type: dict
-
keys
¶ array of keys in NK table.
Type: array
-
values
¶ array of values in the NK table.
Type: array
-
build
()¶ Build phenotypes from NK table.
-
keys
NK table keys.
-
nk_table
NK table mapping binary sequence to value.
-
set_order
(K)¶ Set the order (K) of the NK model.
-
set_random_values
(k_range=(0, 1))¶ Set the values of the NK table by drawing from a uniform distribution between the given k_range.
-
set_table_values
(values)¶ Set the values of the NK table from a list/array of values.
-
values
NK table values
-
Module contents¶
GenotypePhenotypeMap¶
-
class
gpmap.gpm.
GenotypePhenotypeMap
(wildtype, genotypes, phenotypes=None, stdeviations=None, mutations=None, site_labels=None, n_replicates=1, **kwargs)¶ Bases:
object
Object for containing genotype-phenotype map data.
Parameters: - wildtype (string) – wildtype sequence.
- genotypes (array-like) – list of all genotypes
- phenotypes (array-like) – List of phenotypes in the same order as genotypes. If None, all genotypes are assigned a phenotype = np.nan.
- mutations (dict) – Dictionary that maps each site indice to their possible substitution alphabet.
- site_labels (array-like) – list of labels to apply to sites. If this is not specified, the first site is assigned a label 0, the next 1, etc. If specified, sites are assigned labels in the order given. For example, if the genotypes specify mutations at positions 12 and 75, this would be a list [12,75].
- n_replicates (int) – number of replicate measurements comprising the mean phenotypes
- include_binary (bool (default=True)) – Construct a binary representation of the space.
-
data
¶ The core data object. Columns are ‘genotypes’, ‘phenotypes’, ‘n_replicates’, ‘stdeviations’, and (option) ‘binary’.
Type: pandas.DataFrame
-
complete_data
¶ A dataframe mapping the complete set of genotypes possible, given the mutations dictionary. Contains all columns in data. Any missing data is reported as NaN.
Type: pandas.DataFrame (optional, created by BinaryMap)
-
missing_data
¶ A dataframe containing the set of missing genotypes; complte_data - data. Two columns: ‘genotypes’ and ‘binary’.
Type: pandas.DataFrame (optional, created by BinaryMap)
-
binary
¶ object that gives you (the user) access to the binary representation of the map.
Type: BinaryMap
-
encoding_table
¶ Pandas DataFrame showing how mutations map to binary representation.
-
add_binary
()¶ Build a binary representation of set of genotypes.
Add as a column to the main DataFrame.
-
add_n_mutations
()¶ Build a column with the number of mutations in each genotype.
Add as a column to the main DataFrame.
-
binary
Binary representation of genotypes.
-
classmethod
from_dict
(metadata)¶
-
classmethod
from_json
(json_str)¶ Load a genotype-phenotype map directly from a json. The JSON metadata must include the following attributes
Note
Keyword arguments override input that is loaded from the JSON file.
-
genotypes
¶ Get the genotypes of the system.
-
get_all_possible_genotypes
()¶ Get the complete set of genotypes possible. There is no particular order to the genotypes. Consider sorting.
-
get_missing_genotypes
()¶ Get all genotypes missing from the complete genotype-phenotype map.
-
index
¶ Return numpy array of genotypes position.
-
length
¶ Get length of the genotypes.
-
map
(attr1, attr2)¶ Dictionary that maps attr1 to attr2.
-
mutant
¶ Get the farthest mutant in genotype-phenotype map.
-
mutations
¶ Get the furthest genotype from the wildtype genotype.
-
n
¶ Get number of genotypes, i.e. size of the genotype-phenotype map.
-
n_replicates
¶ Return the number of replicate measurements made of the phenotype
-
phenotypes
¶ Get the phenotypes of the system.
-
classmethod
read_csv
(fname, wildtype, **kwargs)¶
-
classmethod
read_dataframe
(dataframe, wildtype, **kwargs)¶ Construct a GenotypePhenotypeMap from a dataframe.
-
classmethod
read_excel
(fname, wildtype, **kwargs)¶
-
classmethod
read_json
(filename, **kwargs)¶ Load a genotype-phenotype map directly from a json file. The JSON metadata must include the following attributes
Note
Keyword arguments override input that is loaded from the JSON file.
-
classmethod
read_pickle
(filename, **kwargs)¶ Read GenotypePhenotypeMap from pickle
-
stdeviations
¶ Get stdeviations
-
to_csv
(filename=None, **kwargs)¶ Write genotype-phenotype map to csv spreadsheet.
Keyword arguments are passed directly to Pandas dataframe to_csv method.
Parameters: filename (str) – Name of file to write out.
-
to_dict
(complete=False)¶ Write genotype-phenotype map to dict.
-
to_excel
(filename=None, **kwargs)¶ Write genotype-phenotype map to excel spreadsheet.
Keyword arguments are passed directly to Pandas dataframe to_excel method.
Parameters: filename (str) – Name of file to write out.
-
to_json
(filename=None, complete=False)¶ Write genotype-phenotype map to json file. If no filename is given returns
-
to_pickle
(filename, **kwargs)¶ Write GenotypePhenotypeMap object to a pickle file.
-
wildtype
¶ Get reference genotypes for interactions.