Tuesday, July 15, 2014

reshape2: convert table from wide to long format

I found this elegant note about reshape2 from Sean Anderson's blog:
http://seananderson.ca/2013/10/19/reshape.html

Basically,

reshape2 is based around two key functions: melt and cast:
melt takes wide-format data and melts it into long-format data.
cast takes long-format data and casts it into wide-format data.

For example, this is wide format:

> head(fpkm)
ID                  FPKM.SRR1069188 FPKM.SRR1070986 FPKM.SRR1071289
ENSG00000240361.1      1.00000000        1.000000        1.000000
ENSG00000186092.4      1.00000000        1.000000        1.000000
ENSG00000237613.2      1.00000000        1.000000        1.000000
ENSG00000239906.1      0.05888838        5.139312        5.055983
ENSG00000241860.1      1.20237363        1.160175        1.085992
ENSG00000222623.1      1.00000000        1.000000        1.000000

Using melt to change it into long format:

>require('reshape2')
>head(melt(fpkm))
No id variables; using all as measure variables
         variable      value
1 FPKM.SRR1069188 1.00000000
2 FPKM.SRR1069188 1.00000000
3 FPKM.SRR1069188 1.00000000
4 FPKM.SRR1069188 0.05888838
5 FPKM.SRR1069188 1.20237363
6 FPKM.SRR1069188 1.00000000

or, you can set the column name by

> head(melt(fpkm, variable.name = "Sample",value.name ="FPKM"))
No id variables; using all as measure variables
           Sample       FPKM
1 FPKM.SRR1069188 1.00000000
2 FPKM.SRR1069188 1.00000000
3 FPKM.SRR1069188 1.00000000
4 FPKM.SRR1069188 0.05888838
5 FPKM.SRR1069188 1.20237363
6 FPKM.SRR1069188 1.00000000

if you want, you can also keep some of columns as ID in the long format, for example, I want to keep the gene ID in the long format:

>head(melt(fpkm, variable.name = "Sample",value.name ="FPKM", id="ID"))
                 ID          Sample       FPKM
1 ENSG00000240361.1 FPKM.SRR1069188 1.00000000
2 ENSG00000186092.4 FPKM.SRR1069188 1.00000000
3 ENSG00000237613.2 FPKM.SRR1069188 1.00000000
4 ENSG00000239906.1 FPKM.SRR1069188 0.05888838
5 ENSG00000241860.1 FPKM.SRR1069188 1.20237363
6 ENSG00000222623.1 FPKM.SRR1069188 1.00000000

I will do the long-->wide example when I have a good case to show... :)

Update: see this post of the long-->wide conversion. 

No comments:

Post a Comment