My learning notes for R, Unix, Perl, statistics, tools/resources, biology etc. everything about Bioinformatics
Tuesday, September 23, 2008
Wednesday, September 10, 2008
AWK learning note
1. first update the internal variables when reading one line:
Steve D.S. Algorithm Arch.
Wang Discrete Graphics O.S.
Lisa Graphics A.I. Lily Discrete Algorithm
END{
for(coursein Number)
printf("\%-10s %d\n", course, Number[course] )
}
a. NF=4 in this case, line number
b. END is a AWK之保留字, 為{ Pattern}之一種, like BEGIN. The only difference is END only run after all lines are proceeded, while BEGIN works initially before the script, and only one time (both BEGIN and END).
c. $i represents the ith elements in the line array, which is different from Perl program (in which, the $i is a variable name, in AWK, variable name cannot begin with $.)
4. Shell command and awk command
for example:
while ( "who" | getline ) n++
print n
}
5. Filename in the script should be quoted by "",
for example,
print `` ID Number Arrival Time'' > ``today_rpt1''
print ``==========================='' > ``today_rpt1''
}
{ printf(" %s %s\n", $1,$2 ) > "today_rpt1" }
Note:
a. if today_rpt1 is not quoted by "", then it will be taken as a variable (which default value is 0, or Null String in AWK.)
b. the redirection mark is '>', not '>>‘, even you want to append to the end of the file. The only difference between them is, for '>>', it will append to the end of the file if it's open first time and the file exists. For '>', AWK will create a new file when it occurs first time, then append to the end (like '>>'). This is little bit different from Unix.
6. Input and output command in Awk
AWK input command: getline
AWK output command: print, printf
7. three ways to run awk
a. $awk '{print}' file1.txt file2.txt
b. $awk -f myscript.awk file1.txt file2.txt
save {print} into a file(myscript.awk) first
c. $myshell file1.txt file2.txt
save awk '{print}' $* into a shell file(named myshell. Here $* means all parameters after the shell command. You also can use $1 represents the first parameter, and $2 the second one.
8. FS(Field Separator) and RS(Record Separator)
By default, the FS is any empty character (space, \t, ), RS is newline '\n'. But they can be changed, like
FS = "\n"
RS = ""
split( "一. 二. 三. 四. 五. 六. 七. 八. 九.", C_Number, " " )
}
{
printf("\n%s 報告人 : %s \n",C_Number[NR],$1)
for( i=2; i<= NF; i++)
printf(" %d. %s\n", i-1, $i)
}
ARGV[0]= "awk"
ARGV[1]="data1"
ARGV[2]="data2"
- 當 AWK 從資料檔中讀取一筆資料列時, AWK 會使用內建變數$0 予以記錄.
- AWK 會立刻重新分析 $0 的欄位情況, 並將 $0 上各欄位的資料用 $1, $2, ..予以記錄.
例如 : AWK 從資料檔 emp.dat 中讀入第一筆資料列
"A125 Jenny 100 210" 之後, 程式中:
$0 之值將是 "A125 Jenny 100 210"
$1 之值為 "A125" $2 之值為 "Jenny"
$3 之值為 100 $4 之值為 210
NF 之值為 4 $NF 之值為 210
NR 之值為 1 FILENAME 之值為 ``emp.dat''where NF: Number of Fields in current $0
NR: Number of Records of currently having been read.
FILENAMEAWK: filename of current proceeding
2. 'PATTERN{ACTION}' or -f script.awk
the following two ways are same:
$awk -f pay1.awk emp.dat
$awk ' { print $2, $3 * $4 } ' emp.dat
if you save the script into a file named pay1.awk.
讀者可使用``-f''參數,讓AWK主程式使用其它僅含 AWK函數 的
檔案中的函數
其語法如下:
awk -f AWK主程式檔名 -f AWK函數檔名 資料檔檔名
Mary O.S. Arch. Discrete3. BEGIN/END and array in AWK
for example, we have a data file like:
Steve D.S. Algorithm Arch.
Wang Discrete Graphics O.S.
Lisa Graphics A.I. Lily Discrete Algorithm
{for( i=2; i<>---------------------------------------
END{
for(coursein Number)
printf("\%-10s %d\n", course, Number[course] )
}
comment:---------------------------------------
a. NF=4 in this case, line number
b. END is a AWK之保留字, 為{ Pattern}之一種, like BEGIN. The only difference is END only run after all lines are proceeded, while BEGIN works initially before the script, and only one time (both BEGIN and END).
c. $i represents the ith elements in the line array, which is different from Perl program (in which, the $i is a variable name, in AWK, variable name cannot begin with $.)
4. Shell command and awk command
for example:
BEGIN {---------------------------------------
while ( "who" | getline ) n++
print n
}
---------------------------------------
where the who is a system command used in shell, and the getline is an awk command for input;
5. Filename in the script should be quoted by "",
for example,
BEGIN {---------------------------------------
print `` ID Number Arrival Time'' > ``today_rpt1''
print ``==========================='' > ``today_rpt1''
}
{ printf(" %s %s\n", $1,$2 ) > "today_rpt1" }
$awk -f reformat1.awk arr.dat---------------------------------------
Note:
a. if today_rpt1 is not quoted by "", then it will be taken as a variable (which default value is 0, or Null String in AWK.)
b. the redirection mark is '>', not '>>‘, even you want to append to the end of the file. The only difference between them is, for '>>', it will append to the end of the file if it's open first time and the file exists. For '>', AWK will create a new file when it occurs first time, then append to the end (like '>>'). This is little bit different from Unix.
6. Input and output command in Awk
AWK input command: getline
AWK output command: print, printf
7. three ways to run awk
a. $awk '{print}' file1.txt file2.txt
b. $awk -f myscript.awk file1.txt file2.txt
save {print} into a file(myscript.awk) first
c. $myshell file1.txt file2.txt
save awk '{print}' $* into a shell file(named myshell. Here $* means all parameters after the shell command. You also can use $1 represents the first parameter, and $2 the second one.
8. FS(Field Separator) and RS(Record Separator)
By default, the FS is any empty character (space, \t, ), RS is newline '\n'. But they can be changed, like
BEGIN {--------------------------------------- make_report.awk -------------------------
FS = "\n"
RS = ""
split( "一. 二. 三. 四. 五. 六. 七. 八. 九.", C_Number, " " )
}
{
printf("\n%s 報告人 : %s \n",C_Number[NR],$1)
for( i=2; i<= NF; i++)
printf(" %d. %s\n", i-1, $i)
}
ARGC=3--------------------------------------- week.rpt ------------------------------
張長弓
GNUPLOT 入門
吳國強
Latex 簡介
VAST-2 使用手冊
mathematica 入門
李小華
AWK Tutorial Guide Regular Expression--------------------------------------- Output ------------------------[xianjund@douglasgran data]$ awk -f make_report week.rpt
一. 報告人 : 張長弓
1. GNUPLOT 入門
二. 報告人 : 吳國強
1. Latex 簡介
2. VAST-2 使用手冊
3. mathematica 入門
三. 報告人 : 李小華
1. AWK Tutorial Guide Regular Expression9. ARGC and ARGV[]---------------------------------------
like C, but
a. ARGC does not include the -v, -f and their options. for example, in
$awk -vx=36 -f program1 data1 data2
or
$awk '{ print $1 ,$2 }' data1 data2
ARGV[0]= "awk"
ARGV[1]="data1"
ARGV[2]="data2"
Tuesday, September 09, 2008
png to ico
How to convert PNG to ICO icon file?
1. download the png2ico source code
2. install
tar, cd, then make
3. prepare resized png, for example
convert image.png -resize 32x32 image.icon.png
4. png2ico
png2ico favicon.ico image.icon.png
You also can make an icon with multiple solution images, use
png2ico favicon.ico image.16x16.icon.png image.32x32.icon.png
reverse lines of file
use rev filename. For example,
[xianjund@douglasgran hypotest]$ head test11.data
ENSG00000007372 ENSG00000109911 ENSG00000121690
ENSG00000007372 ENSG00000149100 ENSG00000170959
ENSG00000043355 ENSG00000175198 ENSG00000102452
ENSG00000172845 ENSG00000115840 ENSG00000091428
[xianjund@douglasgran hypotest]$ rev test11.data
09612100000GSNE 11990100000GSNE 27370000000GSNE
95907100000GSNE 00194100000GSNE 27370000000GSNE
25420100000GSNE 89157100000GSNE 55334000000GSNE
82419000000GSNE 04851100000GSNE 54827100000GSNE
It's cool, hmm?
to show multi line around grep result
You can grep multiple lines before or after matching the keywords. Here is a simple tips, that what I discover grep capable of. A is after, B is before.
For example, test.data is like this:
ENSG00000007372 ENSG00000109911 ENSG00000121690
ENSG00000007372 ENSG00000149100 ENSG00000170959
ENSG00000043355 ENSG00000175198 ENSG00000102452
ENSG00000172845 ENSG00000115840 ENSG00000091428
ENSG00000172845 ENSG00000138430 ENSG00000128708
ENSG00000103449 ENSG00000103494 ENSG00000121274
ENSG00000104313 ENSG00000182674 ENSG00000140396
ENSG00000117707 ENSG00000136643 ENSG00000143499
ENSG00000121297 ENSG00000105176 ENSG00000178904
>grep ENSG00000138430 test.data
ENSG00000172845 ENSG00000115840 ENSG00000091428
while,
>grep ENSG00000138430 test.data -B1 -A3
ENSG00000172845 ENSG00000115840 ENSG00000091428
ENSG00000172845 ENSG00000138430 ENSG00000128708
ENSG00000103449 ENSG00000103494 ENSG00000121274
ENSG00000104313 ENSG00000182674 ENSG00000140396
ENSG00000117707 ENSG00000136643 ENSG00000143499
grep based on multiple words
For example, grep "you" and "me" in the file,
grep "you\|me" filename
You need to put ESCAPE STRING ( \ ) for OR ( | ), else it will treat it as a simbol you want to search instead of regular expression symbol. or
grep -E "you|me" filename
or
egrep "you|me" filename
But, to get line with both "you" and "me", you can use
grep "you" filename | grep "me"
or,
egrep "you.*me" filename
but this will include those lines like "your lovely meebo", which is not what we want sometime. So, to get the exact words matched, use
egrep "\.*\" filename
For more info about egrep, use "man egrep"
The caret ^ and the dollar sign $ are metacharacters that respectively match the empty string at the beginning and end of a line. The symbols \<> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string provided it not at the edge of a word.
replace word in a file
To replace a word in a file, use
perl -pi -e 's/abc/def/;' xyz
or
sed -e 's/abc/def/;' xyz > xyz_new
To make the replacement in place, use sed -i filename. then the file will be replaced after the command.
sed -e 's/abc/def/;' -i xyz
Labels:
linux,
perl_command,
sed,
unix
How to redirect output to a file as well as display it out?
Credit of the following content to: http://linux.byexamples.com/archives/349/how-to-redirect-output-to-a-file-as-well-as-display-it-out/
echo "hello world" > test.txt
But what if I want to display it out as well as store into a file? Answer: tee
But what if I want to display it out as well as store into a file? Answer: tee
echo "hello world" | tee test.txt
Okay it seems very easy, how about append?
Okay it seems very easy, how about append?
To append the standard output to a file, you do this:
echo"hello world" >> test.txt
Append to file and display it out as well?
Append to file and display it out as well?
echo"hello world" | tee -a test.txt
Okay, how about dealing with standard output(stdout) and standard error(stderr)?
There are two different output stream, one is stdout and another one is stderr. Normal print usually goes to stdout and error related message will goes to stderr. Lets make a simple python script to print 1 line to stdout and 1 line to stderr.
Okay, how about dealing with standard output(stdout) and standard error(stderr)?
There are two different output stream, one is stdout and another one is stderr. Normal print usually goes to stdout and error related message will goes to stderr. Lets make a simple python script to print 1 line to stdout and 1 line to stderr.
#!/usr/bin/env python
import sys
sys.stdout.write("I am stdout\n")
sys.stderr.write("I am stderr\n")
Ok, lets save the python script as sout.py and try to redirect the output to a file.
Ok, lets save the python script as sout.py and try to redirect the output to a file.
$ ./sout.py > test.txt I am stderr
Standard output is redirect to test.txt but stderr is print out.
What if I want stderr to be redirect and display the stdout?
Standard output is redirect to test.txt but stderr is print out.
What if I want stderr to be redirect and display the stdout?
./sout.py 2> test.txt
I want both stored into the file.
I want both stored into the file.
./sout.py 2&> test.txt
At last, I want both display and redirect to a file:
At last, I want both display and redirect to a file:
./sout.py 2>&1 | tee test.txt
Interesting isn’t it?
Interesting isn’t it?
Openning
Hope I can insist on sharing my learning points with all, in HTML, Perl, Unix/Linux and Bioinformatics :)
// below is what appeared in the About page, which I think is better to move here now.
| Francis Bacon once said, "reading maketh a full man, conference a ready man, and writing an exact man", and I would say "sharing makes a happy man". |
I've started writing blogs since since I was an undergraduate student. At that time, I took it mostly as a diary for personal life, sharing news and fun stuff with friends. I started to write my first technical blog when I started my PhD in 2005. At that time, I was facing a totally new environment, both geographically and scientifically. I have to learn Perl and Linux from scratch (I was mainly working in Windows and program with C and Visual C++ before). There were so much to learn and I felt enjoyable most time but also quite frustrated sometime. Writing note is the best way to learning new stuff efficiently. I began with using Google Notebook to make notes (by the way, I love Google Notebook so much because there was a very friendly extension in Firefox so that I can select text and right-click to send to my notebook just on browsing page). Google Notebook was shut down recently, sadly. But anyway, I got a good habit of making learning notes while learning new knowledge. After that, I switched to Blogger. Hopefully it won't shut down like Notebook. But who knows? From the trend of internet development, messages become shorter and shorter, and spread faster and faster. I guess not so many people write and read blogs these days. Google may close the Blogger in a day. But nevertheless, I would insist on writing daily (ideally). I think it can help not only myself, but might also to those who read it.
So do you, I wish!
So do you, I wish!
If you are also a person like writing and sharing, please show your support and/or join me in the writing. We can exchange links or you can contribute to this blog if it's also your interest field.☺
2008.09.09
Subscribe to:
Posts (Atom)
