43个生物信息学“事实”
名称来历
GCG, the old bioinformatics package, was named after the authors kept high-fiving each other, shouting “goodcodeguys!”. (GCG is a software package for the analyses of gene and protein sequences.)Bowtieis named so because “it is almost impossible to tie”, referring to code to avoid a “race condition” when using multiple processors.TopHatis named do because it was the first spliced RNA-Seq aligner, and when it worked first time, the authors shouted `Top that!``.Velvetis so named because @dzerbino worevelvet gloves(天鹅绒手套) when coding it (via @pathogenomenick)Tuxedo suit is so named that only ‘privileged’ (特权阶层) know how to use it ! #bioinformaticsfun (via @harshinamdar)
@BenLangmead wrote Bowtie while wearing a tuxedo but he did all the testing in zip-up onesie batman pajamas (via @coletrapnell)
Heng Liwrites all his code in x86 assembly language, and uses a C decompiler before releasing it. @lh3lh3 (via @torstenseemann)The
SRA(short read archive) is the best known of the archives, and not many people know or use theMRA(medium read archive), theKLRA(kinda long read archive) and theLRA(long read archive). (SRA: sequence read archive)EBI(FBI) actually stands for “European bureau of investigation”. It’s a front of the EU secret service, collecting genomic info (via @klmr)Illuminais short forIlluminati(光明会), the shadowy organisation that controls sequencing worldwide. (via @neilfws)The
HMMerpackage was so named when someone asked how it worked, and the developers saidHmmmm… errr…. (via @mgollery)Hidden Markov Modelsare like the recipe for Kentucky Fried Chicken. There are onlythreepeople in the world who understand small parts of how HMMs work, and only when they get together do they know thefullpicture.
随性调侃
BLAST is so fast, the authors had to deliberately slow down the code so it doesn’t overheat the servers.
The
HGAPassembler is actually an elaborate front-end hidingthree thousand slave laborersall runningGAP4(via @IanGoodhead)The @PacBio machines are so large because inside’s an
Illumina machine+a bioinformaticianrunning assemblies (via @gedankenstuecke)NCBI’s bacterial annotation takes 6 weeks because it’s done manually by work experience students
pasting ORFsinto web BLAST (via @torstenseemann)The
pinp-valueactually stands for p-otentially interesting! (via @jessenleon)The
ein e-value stands forexcellent, as in “that’s an excellent BLAST hit”The
EBIis an elaborate front-end to NCBI services. (现在EBI也做的越来越好,国内也有了更多越来越好的数据平台)Europe PubMed Central has only ever been accessed by people accidentally clicking on links. 100% of visitors immediately bounce to
pubmed.com.
一些数字
The
number of replicatesneeded for your RNA-seq experiment equals theimpact factorof the journal you want to publish in (via @torstenseemann)99.5% of people who cite
Altschul et alhave never read the paper. (发表了BLAST的那篇文章)Over 1 billion people have searched the NCBI
proteindatabase for their ownname.The word
ELVISappears 35 times in human peps (GRCh38).ELVISLIVESappears 0 times. The king (猫王Elvis) has left the genome #slowday (via @rdemes) -A single anonymous donor,
RP11,accounts for72percent of the human reference genome (via CanGenom)There are now more journals than papers.
It has been calculated that there are twice as many
data formatsas there are Bioinformaticians (via @mgollery)FASTA
80character line wrapping was invented to standardise data sharing using MS Word (via @IanGoodhead)Nine out of ten Bioinformaticians prefer
Excel(via @CIgenomics)
关于测序工厂
BGI exclusively publish in
Naturejournals because their papers are first rejected byGigascience.BGI actually only have
oneHiSeq but made to look like hundreds by a set up of mirrors, like that bit inEnter the Dragon(via @froggleston) (现在我们都用BGI系列了)If you stand in front of a mirror and say
HiSeq3 times, Illumina staff member will show up holding theHiSeq X Tensystem (via @nazeetafatima)Illumina reads are
shortas before the development of Basespace they were delivered viaTwitter(via @RoyChaudhuri)Base qualities are called
Phredscores in honour ofFred Sangerwho developed DNA sequencing. #101bioinfofunfacts (via @tostenseemann)
关于生信工作者
CriMapwas called CriMap because users do an awful lot of crying before they get a half decent map. (via @dj_de_koning)If you amass the
de-bugging tearsof a bioinformatician it is enough to fillan Olympic size swimming poolannually (via @paulhoskisson)The majority of bioinformaticians can’t pronounce
de BruijnproperlyThe consumption rate of coffee (+ beer ) among Bioinformaticians from around the world is increasing every year. TRUE FACT! (via @NazeefaFatima)
In a recent public survey of the
100most desirable jobs,bioinformaticianwas a close second toastronaut(via @dynomics)Pet Bioinformaticians are paid with
cuddling(via @riccombeni)Spike-ins are like gold (via @nomad421)
It’s easy! You only have to download this database in which all the genes have only one ID and you can retrieve the IDs in the most important databases (via @jorjial)
If you’ve never shown the
NIH sequencing costs plotin talk/lecture you’re not a real bioinformatician (via @AliciaOshlack)

https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
http://www.opiniomics.org/101-bioinformatics-facts/
往期精品(点击图片直达文字对应教程)
后台回复“生信宝典福利第一波”或点击阅读原文获取教程合集

(请备注姓名-学校/企业-职务等)



























