43个生物信息学“事实”
名称来历
GCG
, the old bioinformatics package, was named after the authors kept high-fiving each other, shouting “g
oodc
odeg
uys!”. (GCG is a software package for the analyses of gene and protein sequences.)Bowtie
is named so because “it is almost impossible to tie”, referring to code to avoid a “race condition” when using multiple processors.TopHat
is named do because it was the first spliced RNA-Seq aligner, and when it worked first time, the authors shouted `Top that!``.Velvet
is so named because @dzerbino worevelvet gloves
(天鹅绒手套) when coding it (via @pathogenomenick)Tuxedo suit is so named that only ‘privileged’ (特权阶层) know how to use it ! #bioinformaticsfun (via @harshinamdar)
@BenLangmead wrote Bowtie while wearing a tuxedo but he did all the testing in zip-up onesie batman pajamas (via @coletrapnell)
Heng Li
writes all his code in x86 assembly language, and uses a C decompiler before releasing it. @lh3lh3 (via @torstenseemann)The
SRA
(short read archive) is the best known of the archives, and not many people know or use theMRA
(medium read archive), theKLRA
(kinda long read archive) and theLRA
(long read archive). (SRA: sequence read archive)EBI
(FBI) actually stands for “European bureau of investigation”. It’s a front of the EU secret service, collecting genomic info (via @klmr)Illumina
is short forIlluminati
(光明会), the shadowy organisation that controls sequencing worldwide. (via @neilfws)The
HMMer
package was so named when someone asked how it worked, and the developers saidHmmmm… errr…
. (via @mgollery)Hidden Markov Models
are like the recipe for Kentucky Fried Chicken. There are onlythree
people in the world who understand small parts of how HMMs work, and only when they get together do they know thefull
picture.
随性调侃
BLAST is so fast, the authors had to deliberately slow down the code so it doesn’t overheat the servers.
The
HGAP
assembler is actually an elaborate front-end hidingthree thousand slave laborers
all runningGAP4
(via @IanGoodhead)The @PacBio machines are so large because inside’s an
Illumina machine
+a bioinformatician
running assemblies (via @gedankenstuecke)NCBI’s bacterial annotation takes 6 weeks because it’s done manually by work experience students
pasting ORFs
into web BLAST (via @torstenseemann)The
p
inp-value
actually stands for p-otentially interesting! (via @jessenleon)The
e
in e-value stands forexcellent
, as in “that’s an excellent BLAST hit”The
EBI
is an elaborate front-end to NCBI services. (现在EBI也做的越来越好,国内也有了更多越来越好的数据平台)Europe PubMed Central has only ever been accessed by people accidentally clicking on links. 100% of visitors immediately bounce to
pubmed.com
.
一些数字
The
number of replicates
needed for your RNA-seq experiment equals theimpact factor
of the journal you want to publish in (via @torstenseemann)99.5% of people who cite
Altschul et al
have never read the paper. (发表了BLAST的那篇文章)Over 1 billion people have searched the NCBI
protein
database for their ownname
.The word
ELVIS
appears 35 times in human peps (GRCh38).ELVISLIVES
appears 0 times. The king (猫王Elvis) has left the genome #slowday (via @rdemes) -A single anonymous donor,
RP11,
accounts for72
percent of the human reference genome (via CanGenom)There are now more journals than papers.
It has been calculated that there are twice as many
data formats
as there are Bioinformaticians (via @mgollery)FASTA
80
character line wrapping was invented to standardise data sharing using MS Word (via @IanGoodhead)Nine out of ten Bioinformaticians prefer
Excel
(via @CIgenomics)
关于测序工厂
BGI exclusively publish in
Nature
journals because their papers are first rejected byGigascience
.BGI actually only have
one
HiSeq but made to look like hundreds by a set up of mirrors, like that bit inEnter the Dragon
(via @froggleston) (现在我们都用BGI
系列了)If you stand in front of a mirror and say
HiSeq
3 times, Illumina staff member will show up holding theHiSeq X Ten
system (via @nazeetafatima)Illumina reads are
short
as before the development of Basespace they were delivered viaTwitter
(via @RoyChaudhuri)Base qualities are called
Phred
scores in honour ofFred Sanger
who developed DNA sequencing. #101bioinfofunfacts (via @tostenseemann)
关于生信工作者
CriMap
was called CriMap because users do an awful lot of crying before they get a half decent map. (via @dj_de_koning)If you amass the
de-bugging tears
of a bioinformatician it is enough to fillan Olympic size swimming pool
annually (via @paulhoskisson)The majority of bioinformaticians can’t pronounce
de Bruijn
properlyThe consumption rate of coffee (+ beer ) among Bioinformaticians from around the world is increasing every year. TRUE FACT! (via @NazeefaFatima)
In a recent public survey of the
100
most desirable jobs,bioinformatician
was a close second toastronaut
(via @dynomics)Pet Bioinformaticians are paid with
cuddling
(via @riccombeni)Spike-ins are like gold (via @nomad421)
It’s easy! You only have to download this database in which all the genes have only one ID and you can retrieve the IDs in the most important databases (via @jorjial)
If you’ve never shown the
NIH sequencing costs plot
in talk/lecture you’re not a real bioinformatician (via @AliciaOshlack)
https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
http://www.opiniomics.org/101-bioinformatics-facts/
往期精品(点击图片直达文字对应教程)
后台回复“生信宝典福利第一波”或点击阅读原文获取教程合集
(请备注姓名-学校/企业-职务等)