Reading/Writing

The GenotypePhenotypeMap object is a Pandas DataFrame at its core. Most tabular formats (i.e. Excel files, csv, tsv, …) can be read/written.

Excel Spreadsheets

Excel files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: Excel spreadsheet file (“data.xlsx”)

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the spreadsheet directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")

CSV File

CSV files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: CSV File

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the csv directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")

JSON Format

The only keys recognized by the json reader are:

  1. genotypes
  2. phenotypes
  3. stdeviations
  4. mutations
  5. n_replicates

All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.

{
    "genotypes" : [
        '000',
        '001',
        '010',
        '011',
        '100',
        '101',
        '110',
        '111'
    ],
    "phenotypes" : [
        0.62344582,
        0.87943151,
        -0.11075798,
        -0.59754471,
        1.4314798,
        1.12551439,
        1.04859722,
        -0.27145593
    ],
    "stdeviations" : [
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
    ],
    "mutations" : {
        0 : ["0", "1"],
        1 : ["0", "1"],
        2 : ["0", "1"],
    }
    "n_replicates" : 12,
    "title" : "my data",
    "description" : "a really hard experiment"
}