gpmap

The Pandas DataFrame for genotype-phenotype (GP) map data.

_images/gpm.png

The GenotypePhenotypeMap is a core object for a suite of packages written in the Harms Lab. It organizes and standardizes genotype-phenotype map data.

Basic Example

# Import the GenotypePhenotypeMap
from gpmap import GenotypePhenotypeMap

# The data
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.5, 0.2, 0.8]
stdeviations = [0.05, 0.05, 0.05, 0.05]

# Initialize a GenotypePhenotype object
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes,
                           stdeviations=stdeviations)

# Show the dataFrame
gpm.data
_images/basic-example-df1.png

Documentation

Quick start

GPMap is a small Python package that subsets the pandas DataFrame to handle genotype-phenotype map data. The package include utilities to read/write data to/from disk, enumerate large sequence/genotype spaces efficiently, and compute various statistics from an arbitrary genotype-phenotype map.

GenotypePhenotypeMap object

The main object in gpmap is the GenotypePhenotypeMap object. The object stores data as a Pandas DataFrame, which can be accessed through the .data attribute. Your object will look something like this:

from gpmap import GenotypePhenotypeMap

# Data
wildtype = "AAA"
genotypes = ["AAA", "AAT", "ATA", "TAA", "ATT", "TAT", "TTA", "TTT"]
phenotypes = [0.1, 0.2, 0.2, 0.6, 0.4, 0.6, 1.0, 1.1]
stdeviations = [0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]
mutations = {
  0: ["A", "T"],
  1: ["A", "T"],
  2: ["A", "T"]
}

# Initialize the object
gpm = GenotypePhenotypeMap(
    wildtype,
    genotypes,
    phenotypes,
    mutations=mutations,
    stdeviations=stdeviations
)

# Check out the data.
gpm.data
_images/basic-example-df.png

The underlying DataFrame will have at least 5 columns: genotypes, phenotypes, stdeviations, n_replicates, and binary. The binary column is computed by the GenotypePhenotypeMap object

mutations dictionary

The mutations dictionary tells GPMap what mutations, indels, etc. should be incorporated in the map. It is the most important data you pass to GPMap.

It is a regular Python dictionary and looks something like:

wildtype = "AA"
mutations = {
    0: ["A", "B"],
    1: ["A", "B"]
}

The key represents the position of site, and the value represents the states possible at each site. In this example, the sequences have two sites and each site is either “A” or “T”.

Non-changing sites

If a site doesn’t mutate, give it a value of None.

wildtype = "AAA"
mutations = {
    0: ["A", "T"],
    1: ["A", "T"],
    2: None           # This site does not change
}

Here, site 2 does not change. All sequences will only have an “A” at that site.

Indels

You can incorporate indels using the gap character:

wildtype = "AAAA"
mutations = {
    0: ["A", "T"],
    1: ["A", "T"],
    2: None,
    3: ["A", "-"]      # Sometimes, this site doesn't exist.
}

Here, site 3 will toggle between an “A” and a missing residue “-” (deletion).

Port to NetworkX

In many cases, you might be interested in porting a GenotypePhenotypeMap to NetworkX. NetworkX provides powerful functions for analyzing and plotting complex graphs. We have written a separate package, named gpgraph, to easily port GenotypePhenotypeMap to NetworkX.


Helpful functions

GPMap comes with many helpful functions for enumerating genotype-phenotype maps. This page provides a simple list of those functions.

Get all genotypes from mutations

from gpmap.utils import genotypes_to_mutations

wildtype = "AAA"
genotypes = [
    "AAA",
    "AAB",
    "ABA",
    "BAA",
    "ABB",
    "BAB",
    "BBA",
    "BBB"
]

mutations = genotypes_to_mutations(genotypes)

Get mutation encoding table

from gpmap.utils import get_encoding_table

wildtype = "AA"
mutations = {
    0: ["A", "B"],
    1: ["A", "B"]
}
get_encoding_table(wildtype, mutations)
binary_index_start binary_index_stop binary_repr genotype_index mutation_index mutation_letter wildtype_letter
0 0 1 0 0 NaN A A
1 0 1 1 0 1 B A
2 1 2 0 1 NaN A A
3 1 2 1 1 2 B A

Get mutations from a list of genotypes

from gpmap.utils import mutations_to_genotypes

mutations = {0: ['A', 'B'], 1: ['A', 'B'], 2: ['A', 'B']}

mutations_to_genotypes(mutations)
# ['AAA', 'AAB', 'ABA', 'ABB', 'BAA', 'BAB', 'BBA', 'BBB']

Get binary representation of genotypes

from gpmap.utils import genotypes_to_binary, get_encoding_table

wildtype = 'AAA'

genotypes = [
    "AAA",
    "AAB",
    "ABA",
    "BAA",
    "ABB",
    "BAB",
    "BBA",
    "BBB"
]

mutations = {0: ['A', 'B'], 1: ['A', 'B'], 2: ['A', 'B']}
table = get_encoding_table(wildtype, mutations)
binary = genotypes_to_binary(genotypes, table)
# ['000', '001', '010', '100', '011', '101', '110', '111']

Get a list of missing genotypes from a list of genotypes

from gpmap.utils import get_missing_genotypes

genotypes = ["AAA","BBB"]

get_missing_genotypes(genotypes)
# ['BBA', 'BAB', 'ABB', 'ABA', 'AAB', 'BAA']

Simulating genotype-phenotype maps

The GPMap package comes with a suite of objects to simulate genotype-phenotype maps following models in the literature. They are found in the gpmap.simulate module.

All Simulation objects inherit the GenotypePhenotypeMap object as their base class. Thus, anything you can do with a GenotypePhenotypeMap, you can do with the simulation objects.

NK landscape

Construct a genotype-phenotype map using Kauffman’s NK Model. [1] The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotypes.

For example, imagine an NK simulation with \(N=5\) and \(K=2\). To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table: “01”, “10”, “01”, “11”, “10”. Sum their values together.

# import the NKSimulation class
from gpmap.simulate import NKSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = NKSimulation.from_length(6, K=3)

House of Cards landscape

Construct a ‘House of Cards’ fitness landscape. This is a limit of the NK model where \(K=N\). It represents a fitness landscape with maximum roughness.

# import the HouseOfCardsSimulation class
from gpmap.simulate import HouseOfCardsSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = HouseOfCardsSimulation.from_length(6)

Mount Fuji landscape

Construct a genotype-phenotype map from a Mount Fuji model. [2]

A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength.

Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random roughness value to each genotype.

\[f(g) = \nu (g) + c \cdot d(g_0, g)\]

where \(\nu\) is the roughness parameter, \(c\) is the field strength, and \(d\) is the hamming distance between genotype \(g\) and the reference genotype.

# import the HouseOfCardsSimulation class
from gpmap.simulate import MountFujiSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = MountFujiSimulation.from_length(6
    roughness_width=0.5,
    roughness_dist='normal'
)

References

[1]Kauffman, Stuart A., and Edward D. Weinberger. “The NK model of rugged fitness landscapes and its application to maturation of the immune response.” Journal of theoretical biology 141.2 (1989): 211-245.
[2]Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005.

Reading/Writing

The GenotypePhenotypeMap object is a Pandas DataFrame at its core. Most tabular formats (i.e. Excel files, csv, tsv, …) can be read/written.

Excel Spreadsheets

Excel files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: Excel spreadsheet file (“data.xlsx”)

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the spreadsheet directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")

CSV File

CSV files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: CSV File

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the csv directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")

JSON Format

The only keys recognized by the json reader are:

  1. genotypes
  2. phenotypes
  3. stdeviations
  4. mutations
  5. n_replicates

All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.

{
    "genotypes" : [
        '000',
        '001',
        '010',
        '011',
        '100',
        '101',
        '110',
        '111'
    ],
    "phenotypes" : [
        0.62344582,
        0.87943151,
        -0.11075798,
        -0.59754471,
        1.4314798,
        1.12551439,
        1.04859722,
        -0.27145593
    ],
    "stdeviations" : [
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
    ],
    "mutations" : {
        0 : ["0", "1"],
        1 : ["0", "1"],
        2 : ["0", "1"],
    }
    "n_replicates" : 12,
    "title" : "my data",
    "description" : "a really hard experiment"
}

API Documentation

The GenotypePhenotypeMap is the main entry point to the gpmap package. Load in your data using the read methods attached to this object. The following subpackages include various objects to analyze this object.

Subpackages

gpmap.errors module
class gpmap.errors.BaseErrorMap(Map)

Bases: object

Object to attach to seqspace objects for managing errors, standard deviations, and their log transforms.

If a lower bound is given, use it instead of -variances.

lower

Get lower error bound.

upper

Get upper error bound

wrapper(bound, **kwargs)

Wrapper function that changes variances to whatever bound desired.

class gpmap.errors.StandardDeviationMap(Map)

Bases: gpmap.errors.BaseErrorMap

wrapper(bounds, **kwargs)

Wrapper function to convert Variances if necessary

class gpmap.errors.StandardErrorMap(Map)

Bases: gpmap.errors.BaseErrorMap

wrapper(bounds)

Wrapper function to convert Variances if necessary

gpmap.errors.lower_transform(mean, bound, logbase)

Log transformation scaling.

Examples

Untransformed data looks as so:

Yupper = Ymean + bound Ylower = Ymean - bound
We want log(bounds)
ie.
log(Yupper) - log(Ymean) log(Ylower) + log(Ymean)
so log(bound) = log(1 + bound/Ymean)
log(bound) = log(1 - bound/Ymean)
gpmap.errors.upper_transform(mean, bound, logbase)

Log transformation scaling.

Examples

Untransformed data looks as so:

Yupper = Ymean + bound Ylower = Ymean - bound
We want log(bounds)
ie.
log(Yupper) - log(Ymean) log(Ylower) + log(Ymean)
so log(bound) = log(1 + bound/Ymean)
log(bound) = log(1 - bound/Ymean)
gpmap.stats module
gpmap.stats.c4_correction(n_samples)

Return the correction scalar for calculating standard deviation from a normal distribution.

gpmap.stats.corrected_std(var, n_samples=2)

Calculate the unbiased standard deviation from a biased standard deviation.

gpmap.stats.corrected_sterror(var, n_samples=2)

Calculate an unbiased standard error from a BIASED standard deviation.

gpmap.stats.coverage(gpm)
gpmap.stats.unbiased_std(x, axis=None)

A correction to numpy’s standard deviation calculation. Calculate the unbiased estimation of standard deviation, which includes a correction factor for sample sizes < 100.

gpmap.stats.unbiased_sterror(x, axis=None)

Unbiased error.

gpmap.stats.unbiased_var(x, axis=None)

This enforces that the unbias estimate for variance is calculated

gpmap.utils module

Utility functions for managing genotype-phenotype map data and conversions.

Glossary:
mutations : doct
keys are site numbers in the genotypes. Values are alphabet of mutations at that sites
encoding : dict
keys are site numbers in genotype. Values are dictionaries mapping each mutation to its binary representation.
gpmap.utils.farthest_genotype(reference, genotypes)

Find the genotype in the system that differs at the most sites.

gpmap.utils.find_differences(s1, s2)

Return the index of differences between two sequences.

gpmap.utils.genotypes_to_binary(genotypes, encoding_table)

Using an encoding table (see get_encoding_table function), build a set of binary genotypes.

Parameters:
  • genotypes – List of the genotypes to encode.
  • encoding_table – DataFrame that encodes the binary representation of each mutation in the list of genotypes. (See the get_encoding_table).
gpmap.utils.genotypes_to_mutations(genotypes)

Create mutations dictionary from a list of mutations.

gpmap.utils.get_base(logbase)

Get base from logbase :param logbase: logarithm function :type logbase: callable

Returns:base – returns base of logarithm.
Return type:float
gpmap.utils.get_encoding_table(wildtype, mutations, site_labels=None)

This function constructs a lookup table (pandas.DataFrame) for mutations in a given mutations dictionary. This table encodes mutations with a binary representation.

gpmap.utils.get_missing_genotypes(genotypes, mutations=None)

Get a list of genotypes not found in the given genotypes list.

Parameters:
  • genotypes (list) – List of genotypes.
  • mutations (dict (optional)) – Mutation dictionary
Returns:

missing_genotypes – List of genotypes not found in genotypes list.

Return type:

list

gpmap.utils.hamming_distance(s1, s2)

Return the Hamming distance between equal-length sequences

gpmap.utils.ipywidgets_missing(function)

Wrapper checks that ipython widgets are install before trying to render them.

gpmap.utils.length_to_mutations(length, alphabet=['0', '1'])

Build a mutations dictionary for a given alphabet

Parameters:
  • length (int) – length of the genotypes
  • alphabet (list) – List of mutations at each site.
gpmap.utils.list_binary(length)

List all binary strings with given length.

gpmap.utils.mutations_to_encoding(wildtype, mutations)

Encoding map for genotype-to-binary

Parameters:
  • wildtype (str) – Wildtype sequence.
  • mutations (dict) – Mapping of each site’s mutation alphabet. {site-number: [alphabet]}
Returns:

encode – Encoding dictionary that maps site number to mutation-binary map

Return type:

OrderedDict of OrderDicts

Examples

{ <site-number> : {<mutation>: <binary>} }

gpmap.utils.mutations_to_genotypes(mutations, wildtype=None)

Use a mutations dictionary to construct an array of genotypes composed of those mutations.

Parameters:
  • mutations (dict) – A mapping dict with site numbers as keys and lists of mutations as values.
  • wildtype (str) – wildtype genotype (as string).
Returns:

genotypes – list of genotypes comprised of mutations in given dictionary.

Return type:

list

gpmap.utils.sample_phenotypes(phenotypes, errors, n=1)

Generate n phenotypes from from normal distributions.

gpmap.simulate
gpmap.simulate.base module
class gpmap.simulate.base.BaseSimulation(wildtype, mutations, *args, **kwargs)

Bases: gpmap.gpm.GenotypePhenotypeMap

Build a simulated GenotypePhenotypeMap. Generates random phenotypes.

build()
classmethod from_length(length, alphabet_size=2, *args, **kwargs)

Create a simulate genotype-phenotype map from a given genotype length.

Parameters:
  • length (int) – length of genotypes
  • alphabet_size (int (optional)) – alphabet size
Returns:

self

Return type:

GenotypePhenotypeSimulation

set_stdeviations(sigma)

Add standard deviations to the simulated phenotypes, which can then be used for sampling error in the genotype-phenotype map.

Parameters:sigma (float or array-like) – Adds standard deviations to the phenotypes. If float, all phenotypes are given the same stdeviations. Else, array must be same length as phenotypes and will be assigned to each phenotype.
gpmap.simulate.base.random_mutation_set(length, alphabet_size=2, type='AA')

Generate a random mutations dictionary for simulations.

Parameters:
  • length (length of genotypes) –
  • alphabet_size (int or list) – alphabet size at each site. if list is given, will make site i have size alphab_size[i].
  • type ('AA' or "DNA') – Use amino acid alphabet or DNA alphabet
gpmap.simulate.fuji module
class gpmap.simulate.fuji.MountFujiSimulation(wildtype, mutations, field_strength=1, roughness_width=None, roughness_dist='normal', *args, **kwargs)

Bases: gpmap.simulate.base.BaseSimulation

Constructs a genotype-phenotype map from a Mount Fuji model. [1]_

A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength. Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random

\[f(g) = \nu (g) + c \cdot d(g_0, g)\]

where $nu$ is the roughness parameter, $c$ is the field strength, and $d$ is the hamming distance between genotype $g$ and the reference genotype.

Parameters:
  • wildtype (str) – reference genotype to put the
  • mutations (dict) – mutations alphabet for each site
  • field_strength (float) – field strength
  • roughness_width (float) – Width of roughness distribution
  • roughness_dist (str, 'normal') – Distribution used to create noise around phenotypes.

References

_ [1] Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness
landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005.
build()

Construct phenotypes using a rough Mount Fuji model.

field_strength
classmethod from_length(length, field_strength=1, roughness_width=None, roughness_dist='normal', *args, **kwargs)

Constructs a genotype-phenotype map from a Mount Fuji model. [1]_

A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength. Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random

\[f(g) = \nu (g) + c \cdot d(g_0, g)\]

where $nu$ is the roughness parameter, $c$ is the field strength, and $d$ is the hamming distance between genotype $g$ and the reference genotype.

Parameters:
  • length (int) – length of the genotypes.
  • field_strength (float) – field strength
  • roughness_width (float) – Width of roughness distribution
  • roughness_dist (str, 'normal') – Distribution used to create noise around phenotypes.
hamming

Hamming distance from reference

roughess_dist

Roughness distribution.

roughness

Array of roughness values for all genotypes

roughness_dist

Roughness distribution.

roughness_width
scale

Mt. Fuji phenotypes without noise.

gpmap.simulate.hoc module
class gpmap.simulate.hoc.HouseOfCardsSimulation(wildtype, mutations, k_range=(0, 1), *args, **kwargs)

Bases: gpmap.simulate.nk.NKSimulation

Construct a ‘House of Cards’ fitness landscape.

gpmap.simulate.nk module
class gpmap.simulate.nk.NKSimulation(wildtype, mutations, K, k_range=(0, 1), *args, **kwargs)

Bases: gpmap.simulate.base.BaseSimulation

Generate genotype-phenotype map from NK fitness model. Creates a table with binary sub-sequences that determine the order of epistasis in the model.

The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotype.

For example, imagine an NK simulation with N=5 and K=2. To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table “01”, “10”, “01”, “11”, “10”. Sum their values together.

nk_table

table with binary sub-sequences as keys which are used to construct phenotypes following an NK routine

Type:dict
keys

array of keys in NK table.

Type:array
values

array of values in the NK table.

Type:array
build()

Build phenotypes from NK table.

keys

NK table keys.

nk_table

NK table mapping binary sequence to value.

set_order(K)

Set the order (K) of the NK model.

set_random_values(k_range=(0, 1))

Set the values of the NK table by drawing from a uniform distribution between the given k_range.

set_table_values(values)

Set the values of the NK table from a list/array of values.

values

NK table values

Module contents

GenotypePhenotypeMap

class gpmap.gpm.GenotypePhenotypeMap(wildtype, genotypes, phenotypes=None, stdeviations=None, mutations=None, site_labels=None, n_replicates=1, **kwargs)

Bases: object

Object for containing genotype-phenotype map data.

Parameters:
  • wildtype (string) – wildtype sequence.
  • genotypes (array-like) – list of all genotypes
  • phenotypes (array-like) – List of phenotypes in the same order as genotypes. If None, all genotypes are assigned a phenotype = np.nan.
  • mutations (dict) – Dictionary that maps each site indice to their possible substitution alphabet.
  • site_labels (array-like) – list of labels to apply to sites. If this is not specified, the first site is assigned a label 0, the next 1, etc. If specified, sites are assigned labels in the order given. For example, if the genotypes specify mutations at positions 12 and 75, this would be a list [12,75].
  • n_replicates (int) – number of replicate measurements comprising the mean phenotypes
  • include_binary (bool (default=True)) – Construct a binary representation of the space.
data

The core data object. Columns are ‘genotypes’, ‘phenotypes’, ‘n_replicates’, ‘stdeviations’, and (option) ‘binary’.

Type:pandas.DataFrame
complete_data

A dataframe mapping the complete set of genotypes possible, given the mutations dictionary. Contains all columns in data. Any missing data is reported as NaN.

Type:pandas.DataFrame (optional, created by BinaryMap)
missing_data

A dataframe containing the set of missing genotypes; complte_data - data. Two columns: ‘genotypes’ and ‘binary’.

Type:pandas.DataFrame (optional, created by BinaryMap)
binary

object that gives you (the user) access to the binary representation of the map.

Type:BinaryMap
encoding_table

Pandas DataFrame showing how mutations map to binary representation.

add_binary()

Build a binary representation of set of genotypes.

Add as a column to the main DataFrame.

add_n_mutations()

Build a column with the number of mutations in each genotype.

Add as a column to the main DataFrame.

binary

Binary representation of genotypes.

classmethod from_dict(metadata)
classmethod from_json(json_str)

Load a genotype-phenotype map directly from a json. The JSON metadata must include the following attributes

Note

Keyword arguments override input that is loaded from the JSON file.

genotypes

Get the genotypes of the system.

get_all_possible_genotypes()

Get the complete set of genotypes possible. There is no particular order to the genotypes. Consider sorting.

get_missing_genotypes()

Get all genotypes missing from the complete genotype-phenotype map.

index

Return numpy array of genotypes position.

length

Get length of the genotypes.

map(attr1, attr2)

Dictionary that maps attr1 to attr2.

mutant

Get the farthest mutant in genotype-phenotype map.

mutations

Get the furthest genotype from the wildtype genotype.

n

Get number of genotypes, i.e. size of the genotype-phenotype map.

n_replicates

Return the number of replicate measurements made of the phenotype

phenotypes

Get the phenotypes of the system.

classmethod read_csv(fname, wildtype, **kwargs)
classmethod read_dataframe(dataframe, wildtype, **kwargs)

Construct a GenotypePhenotypeMap from a dataframe.

classmethod read_excel(fname, wildtype, **kwargs)
classmethod read_json(filename, **kwargs)

Load a genotype-phenotype map directly from a json file. The JSON metadata must include the following attributes

Note

Keyword arguments override input that is loaded from the JSON file.

classmethod read_pickle(filename, **kwargs)

Read GenotypePhenotypeMap from pickle

stdeviations

Get stdeviations

to_csv(filename=None, **kwargs)

Write genotype-phenotype map to csv spreadsheet.

Keyword arguments are passed directly to Pandas dataframe to_csv method.

Parameters:filename (str) – Name of file to write out.
to_dict(complete=False)

Write genotype-phenotype map to dict.

to_excel(filename=None, **kwargs)

Write genotype-phenotype map to excel spreadsheet.

Keyword arguments are passed directly to Pandas dataframe to_excel method.

Parameters:filename (str) – Name of file to write out.
to_json(filename=None, complete=False)

Write genotype-phenotype map to json file. If no filename is given returns

to_pickle(filename, **kwargs)

Write GenotypePhenotypeMap object to a pickle file.

wildtype

Get reference genotypes for interactions.

Indices and tables