Wednesday, June 24, 2015

median filter in AWK

Here is what median filter does from wikipedia:
To demonstrate, using a window size of three with one entry immediately preceding and following each entry, a median filter will be applied to the following simple 1D signal:
x = [2 80 6 3]
So, the median filtered output signal y will be:
y[1] = Median[2 2 80] = 2
y[2] = Median[2 80 6] = Median[2 6 80] = 6
y[3] = Median[80 6 3] = Median[3 6 80] = 6
y[4] = Median[6 3 3] = Median[3 3 6] = 3
i.e. y = [2 6 6 3].
Note that, in the example above, because there is no entry preceding the first value, the first value is repeated, as with the last value, to obtain enough entries to fill the window. This is one way of handling missing window entries at the boundaries of the signal.
Here is my awk code to implement this:
#!/bin/awk -f
# awk script to filter noise by sliding a window and taking mean or median per window
# Authos: Xianjun Dong
# Date: 2015-06-23
# Usage: _filter.awk values.txt
# bigWigSummary input.bigwig chr10 101130293 101131543 104 | _filter.awk -vW=5 -vtype=median
BEGIN{
  if(W=="") W=5; 
  if(type=="") type="median";
  half=(W-1)/2;

{
  for(i=1;i<=NF;i++) {
    array[half+1]=$i
    for(k=1;k<=half;k++){
      array[half+1-k]=(i<=k)?$1:$(i-k);
      array[half+1+k]=((i+k)>NF)?$NF:$(i+k);
    }
    if(type=="median") {asort(array); x=array[half+1];}
    if(type=="mean") {x=0; for(j=1;j<=W;j++) x+=array[j]; x=x/W;}
    printf("%s\t", x);
  }
  print "";
}

No comments:

Post a Comment