Reading/Writing¶

The GenotypePhenotypeMap object is a Pandas DataFrame at its core. Most tabular formats (i.e. Excel files, csv, tsv, …) can be read/written.

Excel Spreadsheets¶

Excel files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: Excel spreadsheet file (“data.xlsx”)

	genotypes	phenotypes	stdeviations	n_replicates
0	PTEE	0.243937	0.013269	1
1	PTEY	0.657831	0.055803	1
2	PTFE	0.104741	0.013471	1
3	PTFY	0.683304	0.081887	1
4	PIEE	0.774680	0.069631	1
5	PIEY	0.975995	0.059985	1
6	PIFE	0.500215	0.098893	1
7	PIFY	0.501697	0.025082	1
8	RTEE	0.233230	0.052265	1
9	RTEY	0.057961	0.036845	1
10	RTFE	0.365238	0.050948	1
11	RTFY	0.891505	0.033239	1
12	RIEE	0.156193	0.085638	1
13	RIEY	0.837269	0.070373	1
14	RIFE	0.599639	0.050125	1
15	RIFY	0.277137	0.072571	1

Read the spreadsheet directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")

CSV File¶

CSV files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: CSV File

	genotypes	phenotypes	stdeviations	n_replicates
0	PTEE	0.243937	0.013269	1
1	PTEY	0.657831	0.055803	1
2	PTFE	0.104741	0.013471	1
3	PTFY	0.683304	0.081887	1
4	PIEE	0.774680	0.069631	1
5	PIEY	0.975995	0.059985	1
6	PIFE	0.500215	0.098893	1
7	PIFY	0.501697	0.025082	1
8	RTEE	0.233230	0.052265	1
9	RTEY	0.057961	0.036845	1
10	RTFE	0.365238	0.050948	1
11	RTFY	0.891505	0.033239	1
12	RIEE	0.156193	0.085638	1
13	RIEY	0.837269	0.070373	1
14	RIFE	0.599639	0.050125	1
15	RIFY	0.277137	0.072571	1

Read the csv directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")

JSON Format¶

The only keys recognized by the json reader are:

genotypes

phenotypes

stdeviations

mutations

n_replicates

All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.

{
    "genotypes" : [
        '000',
        '001',
        '010',
        '011',
        '100',
        '101',
        '110',
        '111'
    ],
    "phenotypes" : [
        0.62344582,
        0.87943151,
        -0.11075798,
        -0.59754471,
        1.4314798,
        1.12551439,
        1.04859722,
        -0.27145593
    ],
    "stdeviations" : [
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
    ],
    "mutations" : {
        0 : ["0", "1"],
        1 : ["0", "1"],
        2 : ["0", "1"],
    }
    "n_replicates" : 12,
    "title" : "my data",
    "description" : "a really hard experiment"
}

Reading/Writing¶

Excel Spreadsheets¶

CSV File¶

JSON Format¶

gpmap

Navigation

Related Topics