Reading/Writing
===========
The ``GenotypePhenotypeMap`` object is a Pandas DataFrame at its core. Most
tabular formats (i.e. Excel files, csv, tsv, ...) can be read/written.
Excel Spreadsheets
------------------
Excel files are supported through the ``read_excel`` method. This method requires
`genotypes` and `phenotypes` columns, and can include `n_replicates` and
`stdeviations` as optional columns. All other columns are ignored.
**Example**: Excel spreadsheet file ("data.xlsx")
.. raw:: html
|
genotypes |
phenotypes |
stdeviations |
n_replicates |
0 |
PTEE |
0.243937 |
0.013269 |
1 |
1 |
PTEY |
0.657831 |
0.055803 |
1 |
2 |
PTFE |
0.104741 |
0.013471 |
1 |
3 |
PTFY |
0.683304 |
0.081887 |
1 |
4 |
PIEE |
0.774680 |
0.069631 |
1 |
5 |
PIEY |
0.975995 |
0.059985 |
1 |
6 |
PIFE |
0.500215 |
0.098893 |
1 |
7 |
PIFY |
0.501697 |
0.025082 |
1 |
8 |
RTEE |
0.233230 |
0.052265 |
1 |
9 |
RTEY |
0.057961 |
0.036845 |
1 |
10 |
RTFE |
0.365238 |
0.050948 |
1 |
11 |
RTFY |
0.891505 |
0.033239 |
1 |
12 |
RIEE |
0.156193 |
0.085638 |
1 |
13 |
RIEY |
0.837269 |
0.070373 |
1 |
14 |
RIFE |
0.599639 |
0.050125 |
1 |
15 |
RIFY |
0.277137 |
0.072571 |
1 |
Read the spreadsheet directly into the GenotypePhenotypeMap.
.. code-block:: python
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")
CSV File
--------
CSV files are supported through the ``read_excel`` method. This method requires
`genotypes` and `phenotypes` columns, and can include `n_replicates` and
`stdeviations` as optional columns. All other columns are ignored.
**Example**: CSV File
.. raw:: html
|
genotypes |
phenotypes |
stdeviations |
n_replicates |
0 |
PTEE |
0.243937 |
0.013269 |
1 |
1 |
PTEY |
0.657831 |
0.055803 |
1 |
2 |
PTFE |
0.104741 |
0.013471 |
1 |
3 |
PTFY |
0.683304 |
0.081887 |
1 |
4 |
PIEE |
0.774680 |
0.069631 |
1 |
5 |
PIEY |
0.975995 |
0.059985 |
1 |
6 |
PIFE |
0.500215 |
0.098893 |
1 |
7 |
PIFY |
0.501697 |
0.025082 |
1 |
8 |
RTEE |
0.233230 |
0.052265 |
1 |
9 |
RTEY |
0.057961 |
0.036845 |
1 |
10 |
RTFE |
0.365238 |
0.050948 |
1 |
11 |
RTFY |
0.891505 |
0.033239 |
1 |
12 |
RIEE |
0.156193 |
0.085638 |
1 |
13 |
RIEY |
0.837269 |
0.070373 |
1 |
14 |
RIFE |
0.599639 |
0.050125 |
1 |
15 |
RIFY |
0.277137 |
0.072571 |
1 |
Read the csv directly into the GenotypePhenotypeMap.
.. code-block:: python
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")
JSON Format
-----------
The only keys recognized by the json reader are:
1. `genotypes`
2. `phenotypes`
3. `stdeviations`
4. `mutations`
5. `n_replicates`
All other keys are ignored in the epistasis models. You can keep other metadata
stored in the JSON, but it won't be appended to the epistasis model object.
.. code-block:: javascript
{
"genotypes" : [
'000',
'001',
'010',
'011',
'100',
'101',
'110',
'111'
],
"phenotypes" : [
0.62344582,
0.87943151,
-0.11075798,
-0.59754471,
1.4314798,
1.12551439,
1.04859722,
-0.27145593
],
"stdeviations" : [
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
],
"mutations" : {
0 : ["0", "1"],
1 : ["0", "1"],
2 : ["0", "1"],
}
"n_replicates" : 12,
"title" : "my data",
"description" : "a really hard experiment"
}