One Tip Per Day: September 2008

Tuesday, September 23, 2008

Wednesday, September 10, 2008

AWK learning note

1. first update the internal variables when reading one line:

當 AWK 從資料檔中讀取一筆資料列時, AWK 會使用內建變數$0 予以記錄.
AWK 會立刻重新分析 $0 的欄位情況, 並將 $0 上各欄位的資料用 $1, $2, ..予以記錄.


例如 : AWK 從資料檔 emp.dat 中讀入第一筆資料列
"A125 Jenny 100 210" 之後, 程式中:
$0 之值將是 "A125 Jenny 100 210"
$1 之值為 "A125" $2 之值為 "Jenny"
$3 之值為 100 $4 之值為 210
NF 之值為 4 $NF 之值為 210
NR 之值為 1 FILENAME 之值為 ``emp.dat''
where NF: Number of Fields in current $0
NR: Number of Records of currently having been read.
FILENAMEAWK: filename of current proceeding

2. 'PATTERN{ACTION}' or -f script.awk
the following two ways are same:

$awk -f pay1.awk emp.dat
$awk ' { print $2, $3 * $4 } ' emp.dat

if you save the script into a file named pay1.awk.
讀者可使用``-f''參數,讓AWK主程式使用其它僅含 AWK函數 的
檔案中的函數
其語法如下:
awk -f AWK主程式檔名 -f AWK函數檔名 資料檔檔名

3. BEGIN/END and array in AWK
for example, we have a data file like:

Mary O.S. Arch. Discrete
Steve D.S. Algorithm Arch.
Wang Discrete Graphics O.S.
Lisa Graphics A.I. Lily Discrete Algorithm

---------------------------------------

{for( i=2; i<>
END{
for(coursein Number)
printf("\%-10s %d\n", course, Number[course] )
}

---------------------------------------

comment:
a. NF=4 in this case, line number
b. END is a AWK之保留字, 為{ Pattern}之一種, like BEGIN. The only difference is END only run after all lines are proceeded, while BEGIN works initially before the script, and only one time (both BEGIN and END).
c. $i represents the ith elements in the line array, which is different from Perl program (in which, the $i is a variable name, in AWK, variable name cannot begin with $.)

4. Shell command and awk command
for example:

---------------------------------------

BEGIN {
while ( "who" | getline ) n++
print n
}

---------------------------------------
where the who is a system command used in shell, and the getline is an awk command for input;

5. Filename in the script should be quoted by "",
for example,

---------------------------------------

BEGIN {
print `` ID Number Arrival Time'' > ``today_rpt1''
print ``==========================='' > ``today_rpt1''
}

{ printf(" %s %s\n", $1,$2 ) > "today_rpt1" }

---------------------------------------

$awk -f reformat1.awk arr.dat

Note:
a. if today_rpt1 is not quoted by "", then it will be taken as a variable (which default value is 0, or Null String in AWK.)
b. the redirection mark is '>', not '>>‘, even you want to append to the end of the file. The only difference between them is, for '>>', it will append to the end of the file if it's open first time and the file exists. For '>', AWK will create a new file when it occurs first time, then append to the end (like '>>'). This is little bit different from Unix.

6. Input and output command in Awk
AWK input command: getline
AWK output command: print, printf

7. three ways to run awk
a. $awk '{print}' file1.txt file2.txt
b. $awk -f myscript.awk file1.txt file2.txt
save {print} into a file(myscript.awk) first
c. $myshell file1.txt file2.txt
save awk '{print}' $* into a shell file(named myshell. Here $* means all parameters after the shell command. You also can use $1 represents the first parameter, and $2 the second one.

8. FS(Field Separator) and RS(Record Separator)
By default, the FS is any empty character (space, \t, ), RS is newline '\n'. But they can be changed, like

--------------------------------------- make_report.awk -------------------------

BEGIN {
FS = "\n"
RS = ""
split( "一. 二. 三. 四. 五. 六. 七. 八. 九.", C_Number, " " )
}
{
printf("\n%s 報告人 : %s \n",C_Number[NR],$1)
for( i=2; i<= NF; i++)
printf(" %d. %s\n", i-1, $i)
}

--------------------------------------- week.rpt ------------------------------

張長弓

GNUPLOT 入門



吳國強

Latex 簡介

VAST-2 使用手冊

mathematica 入門









李小華

AWK Tutorial Guide Regular Expression
--------------------------------------- Output ------------------------
[xianjund@douglasgran data]$ awk -f make_report week.rpt

一. 報告人 : 張長弓
1. GNUPLOT    入門

二. 報告人 : 吳國強
1. Latex 簡介
2. VAST-2 使用手冊
3. mathematica 入門

三. 報告人 : 李小華
1. AWK Tutorial Guide Regular Expression
---------------------------------------
9. ARGC and ARGV[]

like C, but
a. ARGC does not include the -v, -f and their options. for example, in 
$awk -vx=36 -f program1 data1 data2
or
$awk '{ print $1 ,$2 }' data1 data2

ARGC=3
ARGV[0]= "awk"
ARGV[1]="data1"
ARGV[2]="data2"

Tuesday, September 09, 2008

png to ico

How to convert PNG to ICO icon file?

1. download the png2ico source code

http://www.winterdrache.de/freeware/png2ico/

2. install

tar, cd, then make

3. prepare resized png, for example

convert image.png -resize 32x32 image.icon.png

4. png2ico

png2ico favicon.ico image.icon.png

You also can make an icon with multiple solution images, use

png2ico favicon.ico image.16x16.icon.png image.32x32.icon.png

reverse lines of file

use rev filename. For example,

[xianjund@douglasgran hypotest]$ head test11.data

ENSG00000007372 ENSG00000109911 ENSG00000121690

ENSG00000007372 ENSG00000149100 ENSG00000170959

ENSG00000043355 ENSG00000175198 ENSG00000102452

ENSG00000172845 ENSG00000115840 ENSG00000091428

[xianjund@douglasgran hypotest]$ rev test11.data

09612100000GSNE 11990100000GSNE 27370000000GSNE

95907100000GSNE 00194100000GSNE 27370000000GSNE

25420100000GSNE 89157100000GSNE 55334000000GSNE

82419000000GSNE 04851100000GSNE 54827100000GSNE

It's cool, hmm?

to show multi line around grep result

You can grep multiple lines before or after matching the keywords. Here is a simple tips, that what I discover grep capable of. A is after, B is before.

For example, test.data is like this:

ENSG00000007372 ENSG00000109911 ENSG00000121690

ENSG00000007372 ENSG00000149100 ENSG00000170959

ENSG00000043355 ENSG00000175198 ENSG00000102452

ENSG00000172845 ENSG00000115840 ENSG00000091428

ENSG00000172845 ENSG00000138430 ENSG00000128708

ENSG00000103449 ENSG00000103494 ENSG00000121274

ENSG00000104313 ENSG00000182674 ENSG00000140396

ENSG00000117707 ENSG00000136643 ENSG00000143499

ENSG00000121297 ENSG00000105176 ENSG00000178904

>grep ENSG00000138430 test.data

ENSG00000172845 ENSG00000115840 ENSG00000091428

while,

>grep ENSG00000138430 test.data -B1 -A3

ENSG00000172845 ENSG00000115840 ENSG00000091428

ENSG00000172845 ENSG00000138430 ENSG00000128708

ENSG00000103449 ENSG00000103494 ENSG00000121274

ENSG00000104313 ENSG00000182674 ENSG00000140396

ENSG00000117707 ENSG00000136643 ENSG00000143499

grep based on multiple words

For example, grep "you" and "me" in the file,

grep "you\|me" filename

You need to put ESCAPE STRING ( \ ) for OR ( | ), else it will treat it as a simbol you want to search instead of regular expression symbol. or

grep -E "you|me" filename

egrep "you|me" filename

But, to get line with both "you" and "me", you can use

grep "you" filename | grep "me"

or,

egrep "you.*me" filename

but this will include those lines like "your lovely meebo", which is not what we want sometime. So, to get the exact words matched, use

egrep "\.*\" filename

For more info about egrep, use "man egrep"

The caret ^ and the dollar sign $ are metacharacters that respectively match the empty string at the beginning and end of a line. The symbols \<> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string provided it not at the edge of a word.

replace word in a file

To replace a word in a file, use

perl -pi -e 's/abc/def/;' xyz

sed -e 's/abc/def/;' xyz > xyz_new

To make the replacement in place, use sed -i filename. then the file will be replaced after the command.

sed -e 's/abc/def/;' -i xyz

How to redirect output to a file as well as display it out?

Credit of the following content to: http://linux.byexamples.com/archives/349/how-to-redirect-output-to-a-file-as-well-as-display-it-out/

To redirect standard output to a file is easy, you just need to use the redirection symbol, for example:

echo "hello world" > test.txt

But what if I want to display it out as well as store into a file? Answer: tee

echo "hello world" | tee test.txt

Okay it seems very easy, how about append?

To append the standard output to a file, you do this:

echo"hello world" >> test.txt

Append to file and display it out as well?

echo"hello world" | tee -a test.txt

Okay, how about dealing with standard output(stdout) and standard error(stderr)?
There are two different output stream, one is stdout and another one is stderr. Normal print usually goes to stdout and error related message will goes to stderr. Lets make a simple python script to print 1 line to stdout and 1 line to stderr.

#!/usr/bin/env python

import sys

sys.stdout.write("I am stdout\n")

sys.stderr.write("I am stderr\n")

Ok, lets save the python script as sout.py and try to redirect the output to a file.

$ ./sout.py > test.txt I am stderr

Standard output is redirect to test.txt but stderr is print out.

What if I want stderr to be redirect and display the stdout?

./sout.py 2> test.txt

I want both stored into the file.

./sout.py 2&> test.txt

At last, I want both display and redirect to a file:

./sout.py 2>&1 | tee test.txt

Interesting isn’t it?

Openning

Hope I can insist on sharing my learning points with all, in HTML, Perl, Unix/Linux and Bioinformatics :)

// below is what appeared in the About page, which I think is better to move here now.

Francis Bacon once said, "reading maketh a full man, conference a ready man, and writing an exact man", and I would say "sharing makes a happy man".

I've started writing blogs since since I was an undergraduate student. At that time, I took it mostly as a diary for personal life, sharing news and fun stuff with friends. I started to write my first technical blog when I started my PhD in 2005. At that time, I was facing a totally new environment, both geographically and scientifically. I have to learn Perl and Linux from scratch (I was mainly working in Windows and program with C and Visual C++ before). There were so much to learn and I felt enjoyable most time but also quite frustrated sometime. Writing note is the best way to learning new stuff efficiently. I began with using Google Notebook to make notes (by the way, I love Google Notebook so much because there was a very friendly extension in Firefox so that I can select text and right-click to send to my notebook just on browsing page). Google Notebook was shut down recently, sadly. But anyway, I got a good habit of making learning notes while learning new knowledge. After that, I switched to Blogger. Hopefully it won't shut down like Notebook. But who knows? From the trend of internet development, messages become shorter and shorter, and spread faster and faster. I guess not so many people write and read blogs these days. Google may close the Blogger in a day. But nevertheless, I would insist on writing daily (ideally). I think it can help not only myself, but might also to those who read it.

So do you, I wish!

If you are also a person like writing and sharing, please show your support and/or join me in the writing. We can exchange links or you can contribute to this blog if it's also your interest field.☺

2008.09.09

One Tip Per Day

Pages

Tuesday, September 23, 2008

Illustration of linux command

Wednesday, September 10, 2008

AWK learning note

Tuesday, September 09, 2008

png to ico

reverse lines of file

to show multi line around grep result

grep based on multiple words

replace word in a file

How to redirect output to a file as well as display it out?

Openning