Thursday, October 07, 2021

A bug related to R factor

Note a bug in my code today. Sometimes you need to put a certain level (e.g. healthy control) in the first position for your covariance. 

Here is my old code:

dds[[variable]]=factor(dds[[variable]])
levels(dds[[variable]])= union(variable_REF, levels(dds[[variable]])))

Note that this can cause problem. For example, you have two levels: HC and AD in your diagnosis. By default, setting the covariance to factor() will make the levels sorting in alphabetic order (e.g. AD then HC). In the 2nd line, if you force to change the levels to c("HC", "AD"), you are actually doing something like c(AD = "HC", HC = "AD"). This is not what you want. 

What you wanted is actually this:

dds[[variable]]=factor(dds[[variable]], levels= union(variable_REF, unique(dds[[variable]]))) 

* Too bad that blogger does not adapt to markdown language yet. 

No comments:

Post a Comment