Reading/Writing¶
The GenotypePhenotypeMap
object is a Pandas DataFrame at its core. Most
tabular formats (i.e. Excel files, csv, tsv, …) can be read/written.
Excel Spreadsheets¶
Excel files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: Excel spreadsheet file (“data.xlsx”)
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the spreadsheet directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")
CSV File¶
CSV files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: CSV File
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the csv directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")
JSON Format¶
The only keys recognized by the json reader are:
- genotypes
- phenotypes
- stdeviations
- mutations
- n_replicates
All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.
{
"genotypes" : [
'000',
'001',
'010',
'011',
'100',
'101',
'110',
'111'
],
"phenotypes" : [
0.62344582,
0.87943151,
-0.11075798,
-0.59754471,
1.4314798,
1.12551439,
1.04859722,
-0.27145593
],
"stdeviations" : [
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
],
"mutations" : {
0 : ["0", "1"],
1 : ["0", "1"],
2 : ["0", "1"],
}
"n_replicates" : 12,
"title" : "my data",
"description" : "a really hard experiment"
}