Say you have a list with different lengths of vectors, e.g.
> head(genesets_list)
$KEGG_GLYCOLYSIS_GLUCONEOGENESIS
[1] "ACSS2" "GCK" "PGK2" "PGK1" "PDHB" "PDHA1" "PDHA2" "PGM2" "TPI1" "ACSS1" "FBP1" "ADH1B" "HK2" "ADH1C" "HK1" "HK3" "ADH4" "PGAM2" "ADH5" "PGAM1" "ADH1A" "ALDOC" "ALDH7A1" "LDHAL6B" "PKLR" "LDHAL6A" "ENO1" "PKM" "PFKP" "BPGM" "PCK2" "PCK1" "ALDH1B1" "ALDH2" "ALDH3A1" "AKR1A1" "FBP2" "PFKM" "PFKL" "LDHC" "GAPDH" "ENO3" "ENO2" "PGAM4" "ADH7" "ADH6" "LDHB" "ALDH1A3" "ALDH3B1" "ALDH3B2" "ALDH9A1" "ALDH3A2" "GALM" "ALDOA" "DLD" "DLAT" "ALDOB" "G6PC2" "LDHA" "G6PC" "PGM1" "GPI"
$KEGG_CITRATE_CYCLE_TCA_CYCLE
[1] "IDH3B" "DLST" "PCK2" "CS" "PDHB" "PCK1" "PDHA1" "PDHA2" "SUCLG2P2" "FH" "SDHD" "OGDH" "SDHB" "IDH3A" "SDHC" "IDH2" "IDH1" "ACO1" "ACLY" "MDH2" "DLD" "MDH1" "DLAT" "OGDHL" "PC" "SDHA" "SUCLG1" "SUCLA2" "SUCLG2" "IDH3G" "ACO2"
$KEGG_PENTOSE_PHOSPHATE_PATHWAY
[1] "RPE" "RPIA" "PGM2" "PGLS" "PRPS2" "FBP2" "PFKM" "PFKL" "TALDO1" "TKT" "FBP1" "TKTL2" "PGD" "RBKS" "ALDOA" "ALDOC" "ALDOB" "H6PD" "RPEL1" "PRPS1L1" "PRPS1" "DERA" "G6PD" "PGM1" "TKTL1" "PFKP" "GPI"
We want to convert it to a long table, with two columns (e.g. pathway ID as the first column and gene name as the 2nd column). There are various solutions (e.g. https://stackoverflow.com/questions/4227223/convert-a-list-to-a-data-frame), but none of them really works for my need.
Martin Stingl posted a relevant solution for using map_dfr: https://rstats-tips.net/2021/02/07/converting-lists-of-lists-of-lists-to-data-frames-or-tibbles/, but didn't solve the row name problem.
Here is my one-liner solution:
genesets_df = data.frame(pathwayID=rep(names(genesets_list), lengths(genesets_list)), geneSymbol=genesets_list %>% map_dfr(as_tibble))
Update:
An even simpler solution:
df = data.frame(ID=names(unlist(genesets_list)), geneName=unlist(genesets_list))
No comments:
Post a Comment