Saturday, December 11, 2021

An easy to convert list to long table

 Say you have a list with different lengths of vectors, e.g. 

> head(genesets_list)

$KEGG_GLYCOLYSIS_GLUCONEOGENESIS

 [1] "ACSS2"   "GCK"     "PGK2"    "PGK1"    "PDHB"    "PDHA1"   "PDHA2"   "PGM2"    "TPI1"    "ACSS1"   "FBP1"    "ADH1B"   "HK2"     "ADH1C"   "HK1"     "HK3"     "ADH4"    "PGAM2"   "ADH5"    "PGAM1"   "ADH1A"   "ALDOC" "ALDH7A1" "LDHAL6B" "PKLR"    "LDHAL6A" "ENO1"    "PKM"     "PFKP"    "BPGM"    "PCK2"    "PCK1"    "ALDH1B1" "ALDH2"   "ALDH3A1" "AKR1A1"  "FBP2"    "PFKM"    "PFKL"    "LDHC"    "GAPDH"   "ENO3"    "ENO2"    "PGAM4" "ADH7"    "ADH6"    "LDHB"    "ALDH1A3" "ALDH3B1" "ALDH3B2" "ALDH9A1" "ALDH3A2" "GALM"    "ALDOA"   "DLD"     "DLAT"    "ALDOB"   "G6PC2"   "LDHA"    "G6PC"    "PGM1"    "GPI"    

$KEGG_CITRATE_CYCLE_TCA_CYCLE

 [1] "IDH3B"    "DLST"     "PCK2"     "CS"       "PDHB"     "PCK1"     "PDHA1"    "PDHA2"    "SUCLG2P2" "FH"       "SDHD"     "OGDH"     "SDHB"     "IDH3A"    "SDHC"     "IDH2"     "IDH1"     "ACO1"     "ACLY"     "MDH2" "DLD"      "MDH1"     "DLAT"     "OGDHL"    "PC"       "SDHA"     "SUCLG1"   "SUCLA2"   "SUCLG2"   "IDH3G"    "ACO2"    

$KEGG_PENTOSE_PHOSPHATE_PATHWAY

 [1] "RPE"     "RPIA"    "PGM2"    "PGLS"    "PRPS2"   "FBP2"    "PFKM"    "PFKL"    "TALDO1"  "TKT"     "FBP1"    "TKTL2"   "PGD"     "RBKS"    "ALDOA"   "ALDOC"   "ALDOB"   "H6PD"    "RPEL1"   "PRPS1L1" "PRPS1"   "DERA"  "G6PD"    "PGM1"    "TKTL1"   "PFKP"    "GPI"    

We want to convert it to a long table, with two columns (e.g. pathway ID as the first column and gene name as the 2nd column). There are various solutions (e.g. https://stackoverflow.com/questions/4227223/convert-a-list-to-a-data-frame), but none of them really works for my need. 

Martin Stingl posted a relevant solution for using map_dfr: https://rstats-tips.net/2021/02/07/converting-lists-of-lists-of-lists-to-data-frames-or-tibbles/, but didn't solve the row name problem. 

Here is my one-liner solution:

genesets_df = data.frame(pathwayID=rep(names(genesets_list), lengths(genesets_list)), geneSymbol=genesets_list %>% map_dfr(as_tibble))

Update:

An even simpler solution:

df = data.frame(ID=names(unlist(genesets_list)), geneName=unlist(genesets_list))

No comments:

Post a Comment