One Tip Per Day: Basic knowledge for a bioinformatician

Very often, esp. when I was interviewed for a job or talk with a knowledgable guy like Xiaopeng, I feel there are full of "holes" in the mass body of my knowledge. How awkward it is! Guess I am not the only one who feels the same.

As a senior-in-age-but-not-senior-in-knowledge bioinformatian, I would seriously recommend who will like to work in this field to have basic knowledge in the following subjects I can think of:
1. probability and statistics (not everyone know the difference between them)
2. machine learning (the 4-elements circle: data + algorithm + model + criteria)
3. programming design (knowing how to write script does not mean you know how to program; a good programer should learn the concept of how to write code in a inheritable manner).
4. algorithm and data structure (many know some algorithm, but to truly understand it is not a easy task. Binindex is a good example of using the concept of binary tree to store/query genomic coordinate in a super fast way.)
5. know how to appreciate a scientific work. (A paper can be good in way of (i) data sources (2) method and/or (3) idea. For sure it's also important to tell good paper from junk papers. I feel it's so important to enhance the sensitivity of 'smelling' a paper)

I found this nice reading list from Hendrik's page (http://www.liacs.nl/~hoogeboo/mcb/nature_primer.html)

How to apply de Bruijn graphs to genome assembly
(Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler)
November 2011, Vol 29, No 11; pp 987 - 991
doi: 10.1038/nbt.2023 (?)
Analyzing 'omics data using hierarchical models
(Hongkai Ji & X Shirley Liu)
April 2010, Vol 28, No 4; pp 337 - 340
doi: 10.1038/nbt.1619 (?)
What is flux balance analysis?
(Jeffrey D Orth, Ines Thiele & Bernhard Ø Palsson)
March 2010, Vol 28, No 3; pp 245 - 248
doi: 10.1038/nbt.1614 (?)
How does multiple testing correction work?
(William S Noble)
December 2009, Vol 27, No 12 ; pp 1135 - 1137
doi: 10.1038/nbt1209-1135 (?)
How to visually interpret biological data using networks
(Daniele Merico, David Gfeller & Gary D Bader)
October 2009, Vol 27 No 10 ; pp 921 - 924
doi: 10.1038/nbt.1567 (?)
How to map billions of short reads onto genomes
(Cole Trapnell & Steven L Salzberg)
May 2009, Vol 27, No 5; pp 455 - 457
doi: 10.1038/nbt0509-455 (?)
SNP imputation in association studies
(Eran Halperin & Dietrich A Stephan)
April 2009, Vol 27, No 4; pp 349 - 351
doi: 10.1038/nbt0409-349 (?)
Maximizing power in association studies
(Eran Halperin & Dietrich A Stephan)
March 2009, Vol 27, No 3; pp 255 - 256
doi: 10.1038/nbt0309-255 (?)
Understanding genome browsing
(Melissa S Cline & W James Kent)
February 2009, Vol 27, No 2; pp 153 - 155
doi: 10.1038/nbt0209-153 (?)
What are decision trees?
(Carl Kingsford & Steven L Salzberg)
September 2008, Volume 26, No 9; pp 1011 - 1013
doi: 10.1038/nbt0908-1011 (?)
What is the expectation maximization algorithm?
(Chuong B Do & Serafim Batzoglou)
August 2008, Volume 26 No 8; pp 897 - 899
doi: 10.1038/nbt1406 (?)
What is principal component analysis?
(Markus Ringnér)
March 2008, Volume 26, No 3; pp 303 - 304
doi: 10.1038/nbt0308-303 (?)
What are artificial neural networks?
(Anders Krogh)
February 2008, Volume 26, No 2; pp 195 - 197
doi: 10.1038/nbt1386 (?)

How does eukaryotic gene prediction work?
(Michael R Brent)
August 2007, Volume 25, No 8; pp 883 - 885
doi: 10.1038/nbt0807-883 (?)
How do shotgun proteomics algorithms identify proteins?
(Edward M Marcotte)
July 2007, Volume 25, No 7; pp 755 - 757
doi: 10.1038/nbt0707-755 (?)
What is a support vector machine?
(William S Noble)
December 2006, Volume 24, No 12; pp 1565 - 1567
doi: 10.1038/nbt1206-1565 (?)
How does DNA sequence motif discovery work?
(Patrik D'haeseleer)
August 2006, Volume 24, No 8; pp 959 - 961
doi: 10.1038/nbt0806-959 (?)
What are DNA sequence motifs?
(Patrik D'haeseleer)
April 2006, Volume 24, No 4; pp 423 - 425
doi: 10.1038/nbt0406-423 (?)
Inference in Bayesian networks
(Chris J Needham, James R Bradford, Andrew J Bulpitt & David R Westhead)
January 2006, Volume 24, No 1; pp 51 - 53
doi: 10.1038/nbt0106-51 (?)
How does gene expression clustering work?
(Patrik D'haeseleer)
December 2005, Volume 23, No 12; pp 1499 - 1501
doi: 10.1038/nbt1205-1499 (?)
How do RNA folding algorithms work?
(Sean R Eddy)
November 2004, Volume 22, No 11; pp 1457 - 1458
doi: 10.1038/nbt1104-1457 (?)
What is a hidden Markov model?
(Sean R Eddy)
October 2004, Volume 22, No 10; pp 1315 - 1316
doi: 10.1038/nbt1004-1315 (?)
What is Bayesian statistics?
(Sean R Eddy)
September 2004, Volume 22, No 9; pp 1177 - 1178
doi: 10.1038/nbt0904-1177 (?)
Where did the BLOSUM62 alignment score matrix come from?
(Sean R Eddy)
August 2004, Volume 22, No 8; pp 1035 - 1036
doi: 10.1038/nbt0804-1035 (?)
What is dynamic programming?
(Sean R Eddy)
July 2004, Volume 22, No 7; pp 909 - 910
doi: 10.1038/nbt0704-909 (?)

Getting Started in ...

Getting Started in Gene Orthology and Functional Analysis
(Fang G, Bhardwaj N, Robilotto R, Gerstein MB)
PLoS Comput Biol (2010) 6(3): e1000703;
doi: 10.1371/journal.pcbi.1000703 (?)
Getting Started in Structural Phylogenomics
(Sjölander K )
PLoS Comput Biol (2010) 6(1): e1000621 ;
doi: 10.1371/journal.pcbi.1000621 (?)
Getting Started in Gene Expression Microarray Analysis
(Slonim DK, Yanai I)
PLoS Comput Biol (2009) 5(10): e1000543;
doi: 10.1371/journal.pcbi.1000543 (?)
Getting Started in Text Mining: Part Two.
(Rzhetsky A, Seringhaus M, Gerstein MB)
PLoS Comput Biol (2009) 5(7): e1000411. ;
doi: 10.1371/journal.pcbi.1000411 (?)
Getting Started in Computational Mass Spectrometry-Based Proteomics.
(Vitek O)
PLoS Comput Biol (2009) 5(5): e1000366. ;
doi: 10.1371/journal.pcbi.1000366 (?)

Getting Started in Computational Immunology.
(Kleinstein SH )
PLoS Comput Biol (2008) 4(8): e1000128;
doi: 10.1371/journal.pcbi.1000128 (?)
Getting Started in Biological Pathway Construction and Analysis.
(Viswanathan GA, Seto J, Patil S, Nudelman G, Sealfon SC )
PLoS Comput Biol (2008) 4(2): e16;
doi: 10.1371/journal.pcbi.0040016 (?)
Getting Started in Text Mining
(Cohen KB, Hunter L)
PLoS Comput Biol (2008) 4(1): e20;
doi: 10.1371/journal.pcbi.0040020 (?)
Getting Started in Probabilistic Graphical Models.
(Airoldi EM )
PLoS Comput Biol (2007) 3(12): e252. ;
doi: 10.1371/journal.pcbi.0030252 (?)
Getting Started in Tiling Microarray Analysis
(Liu XS)
PLoS Comput Biol (2007) 3(10): e183;
doi: 10.1371/journal.pcbi.0030183 (?)

Ten Simple Rules

Also the Ten Simple Rules series of editorials has a separate page at the PLoS journal. A link is now all you need to read about 'Ten Simple Rules for Getting Published' or '...for a Good Poster Presentation', etc.
On the Process of Becoming a Great Scientist
(Giddings MC)
PLoS Comput Biol (2008) 4(2): e33;
doi: 10.1371/journal.pcbi.0040033 (?)