Python也能画漂亮的complex heatmap?

生信宝典

共 15546字,需浏览 32分钟

 ·

2023-01-05 15:23

微信公众号:「Computational Epigenetics」

关注生物信息学和计算表观遗传学。问题或建议,请公众号留言。

对于经常用R语言来画图的科研工作者来说, 应该对 ComplexHeatmap (https://jokergoo.github.io/ComplexHeatmap-reference/book/)很 熟悉了吧。 这个包画的热图,既专业又漂亮。

可惜的是,在python中,一直没能出现一个可以画出好看complex heatmap的包,由于我们在用python做机器学习或者处理大数据的时候,也需要画热图,而在python和R中来回切换,也比较麻烦而且没有效率。

今天,给大家介绍一款可以在python中画出类似于R中ComplexHeatmap效果的包: PyComplexHeatmap (https://github.com/DingWB/PyComplexHeatmap)。 直接看下面的代码和图吧(教程来自: https://github.com/DingWB/PyComplexHeatmap/blob/main/examples.ipynb):

1. 导入相关包
      
      import os,sys
import PyComplexHeatmap
from PyComplexHeatmap import *
%matplotlib inline
import matplotlib.pylab as plt
plt.rcParams['figure.dpi'] = 120
plt.rcParams['savefig.dpi']=300
2. 快速入门
      
      #Generate example dataset
df = pd.DataFrame(['AAAA1'] * 5 + ['BBBBB2'] * 5, columns=['AB'])
df['CD'] = ['C'] * 3 + ['D'] * 3 + ['G'] * 4
df['EF'] = ['E'] * 6 + ['F'] * 2 + ['H'] * 2
df['F'] = np.random.normal(0110)
df.index = ['sample' + str(i) for i in range(1, df.shape[0] + 1)]
df_box = pd.DataFrame(np.random.randn(104), columns=['Gene' + str(i) for i in range(15)])
df_box.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar = pd.DataFrame(np.random.uniform(010, (102)), columns=['TMB1''TMB2'])
df_bar.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_scatter = pd.DataFrame(np.random.uniform(01010), columns=['Scatter'])
df_scatter.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_heatmap = pd.DataFrame(np.random.randn(5010), columns=['sample' + str(i) for i in range(111)])
df_heatmap.index = ["Fea" + str(i) for i in range(1, df_heatmap.shape[0] + 1)]
df_heatmap.iloc[12] = np.nan

plt.figure(figsize=(612))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                           AB=anno_simple(df.AB,add_text=True),axis=1,
                           CD=anno_simple(df.CD, colors={'C''red''D''yellow''G''green'},add_text=True),
                           Exp=anno_boxplot(df_box, cmap='turbo'),
                           Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar),
                           )
cm = ClusterMapPlotter(data=df_heatmap, top_annotation=row_ha, col_split=2, row_split=3, col_split_gap=0.5,
                     row_split_gap=1,col_dendrogram=False,plot=True,
                     tree_kws={'col_cmap''Set1''row_cmap''Dark2'})
plt.savefig("example1_heatmap.pdf", bbox_inches='tight')
plt.show()
885bb0d1bf3dcc94d0605cdd6598e516.webp 3. 画行/列注释

3.1 仅画行/列的注释信息

      
      plt.figure(figsize=(64))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                            AB=anno_simple(df.AB,add_text=True,legend=True), axis=1,
                            CD=anno_simple(df.CD, colors={'C''red''D''gray''G''yellow'},
                                           add_text=True,legend=True),
                            Exp=anno_boxplot(df_box, cmap='turbo',legend=True),
                            Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar,legend=True),
                           plot=True,legend=True,legend_gap=5
                            )
plt.savefig("col_annotation.pdf", bbox_inches='tight')
plt.show()
88846c9d96da46c0bfd2bdaaa2c40088.webp

anno_label:

anno_label是用来将行/列注释信息(比如样本的性别、分组、亚型等)单独添加为一行文本(比如上图中倾斜的AAAA1和BBBBB2),merge参数控制是否将相邻两个或者多个单元格的注释信息合并为一个(如果相邻单元格的标签相同的话) 如果 merge != True, 那么,每一列的列标签都会被单独加上去(有时看起来会比较拥挤)。

anno_simple:

anno_simple是用来添加一个简单注释的函数(比如上图中的AB和CD那两列colorbar),cmap参数可以是分类型(categorical) (比如Set1, Dark2, tab10等) ,也可以是连续的 (比如jet, turbo, parula等)。 参数add_text 控制是否添加文本到单元格上面(比如上图中CD行单元格上面的文字C、D、G和AB列上面的注释文字)。如果颜色和字体大小没有被指定,函数会自动决定。比如,如果背景颜色是深色,那么文字颜色就会是浅色,否则字体颜色就是深色(比如CD行中的文字G就是被自动设定为黑色)。文字的颜色也可以通过参数text_kws={'color':your_color}来改变,比如:

      
      plt.figure(figsize=(54))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                            AB=anno_simple(df.AB,add_text=True,legend=True,text_kws={'color':'gold'}), axis=1,
                            CD=anno_simple(df.CD,add_text=True,legend=True,text_kws={'color':'purple'}),
                            Exp=anno_boxplot(df_box, cmap='turbo',legend=True),
                            Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar,legend=True),
                           plot=True,legend=True,legend_gap=5)
plt.show()
534a1e09a0b144fdef9adf52e43d1203.webp

只需要一个python数据框dataframe就可以快速添加各类注释

当数据框df被给定时,该dataframe中的所有列都被单独作为anno_simple注释。比如,下面一个数据框df中有4列:AB、CD、EF、F,所有4列都会被自动画成列注释图。如果某一列不是连续型,而是字符等分类型变量,也可以用anno_boxplot或者anno_scatterplot等添加箱线图或者散点图作为列(比如样本)的信息注释(比如肿瘤样本的某种打分、某些基因表达的箱线图分布等)。

      
      plt.figure(figsize=(33))
row_ha = HeatmapAnnotation(df=df,plot=True,legend=True)
plt.show()
1e7c2dde4e64ac9182bc72c8af0cdd27.webp

3.2 将图和图例分开

有时,我们可能会只需要图,不需要图例,也可能是要将图例单独画出来,PyComplexHeatmap可以实现这个功能,只需要让plot_legend=False,然后再新建一个图,执行 row_ha.plot_legends就可以单独画图例了。

      只需要
      plt.figure(figsize=(64))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True),
                            AB=anno_simple(df.AB,add_text=True,legend=True), axis=1,
                            CD=anno_simple(df.CD,add_text=True,legend=True),
                            Exp=anno_boxplot(df_box, cmap='turbo',legend=True),
                            Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar,legend=True),
                           plot=True,legend=True,plot_legend=False,
                           legend_gap=5
                            )
plt.savefig("col_annotation.pdf", bbox_inches='tight')
plt.show()

plt.figure()
row_ha.plot_legends()
plt.savefig("legend.pdf",bbox_inches='tight')
plt.show()
40cfbc036af135777a77350b2a4a8b56.webp
      No ax was provided, using plt.gca()
5ccf7e2dba7c978f060fa55edef570d3.webp 4. 画聚类图加行/列注释信息

我们这里使用 PyComplexHeatmap包中提供的example数据集:

      
      !wget https://github.com/DingWB/PyComplexHeatmap/raw/main/data/influence_of_snp_on_beta.pickle
      --2022-05-05 22:37:43--  https://github.com/DingWB/pyclustermap/raw/main/data/influence_of_snp_on_beta.pickle
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-05-05 22:37:43 ERROR 404: Not Found.
      
      import pickle
import urllib
f=open("influence_of_snp_on_beta.pickle",'rb')
data=pickle.load(f)
f.close()
beta,snp,df_row,df_col,col_colors_dict,row_colors_dict=data
      
      # beta is DNA methylation beta values matrix, df_row and df_col are row and columns annotation respectively, col_colors_dict and row_colors_dict are color for annotation
print(beta.iloc[:,list(range(5))].head(5))
print(df_row.head(5))
print(df_col.head(5))
beta=beta.sample(2000)
snp=snp.loc[beta.index.tolist()]
df_row=df_row.loc[beta.index.tolist()]
                       204875570030_R01C02  204875570030_R04C01  \
cg30848532_TC21 0.525089 0.419515
cg30147375_BC21 0.803776 0.585928
cg46239718_BC21 0.443958 0.517514
cg36100119_BC21 0.351977 0.528846
cg42738582_BC21 0.783958 0.724901

204875570030_R05C01 204875570030_R06C01 204875570035_R05C02
cg30848532_TC21 0.483276 0.460750 0.390317
cg30147375_BC21 0.510269 0.831463 0.550146
cg46239718_BC21 0.535909 0.450167 0.564107
cg36100119_BC21 0.524896 0.374422 0.551200
cg42738582_BC21 0.802178 0.848621 0.850481
chr Target CpG ExtensionBase ProbeDesign CON mapFlag \
cg30848532_TC21 chr12 1 1 0 II C 16
cg30147375_BC21 chr11 0 0 0 II C 0
cg46239718_BC21 chr8 1 1 0 II C 0
cg36100119_BC21 chr19 1 1 0 II C 16
cg42738582_BC21 chr5 0 0 0 II C 16

Group \
cg30848532_TC21 Suboptimal hybridization
cg30147375_BC21 No Effect
cg46239718_BC21 Artificial low meth. reading
cg36100119_BC21 Suboptimal hybridization
cg42738582_BC21 Suboptimal hybridization

Type
cg30848532_TC21 1-1-0-CG-GG-II-C-16-GA-chr12-79760438
cg30147375_BC21 0-0-0-ca-ac-II-C-0-AG-chr11-109557651
cg46239718_BC21 1-1-0-cg-gt-II-C-0-GA-chr8-117860829
cg36100119_BC21 1-1-0-CG-GG-II-C-16-GA-chr19-5877949
cg42738582_BC21 0-0-0-AA-AA-II-C-16-AG-chr5-122031379
Strain Tissue Sex
204875570030_R01C02 MOLF_EiJ Frontal Lobe Brain Female
204875570030_R04C01 CAST_EiJ Frontal Lobe Brain Male
204875570030_R05C01 CAST_EiJ Frontal Lobe Brain Female
204875570030_R06C01 MOLF_EiJ Frontal Lobe Brain Male
204875570035_R05C02 CAST_EiJ Liver Male
      
      row_ha = HeatmapAnnotation(Target=anno_simple(df_row.Target,colors=row_colors_dict['Target'],rasterized=True),
                               Group=anno_simple(df_row.Group,colors=row_colors_dict['Group'],rasterized=True),
                               axis=0)
col_ha= HeatmapAnnotation(label=anno_label(df_col.Strain,merge=True,rotation=15),
                          Strain=anno_simple(df_col.Strain,add_text=True),
                          Tissue=df_col.Tissue,Sex=df_col.Sex,axis=1#df=df_col.loc[:,['Strain','Tissue','Sex']]
plt.figure(figsize=(610))
cm = ClusterMapPlotter(data=beta, top_annotation=col_ha, left_annotation=row_ha,
                     show_rownames=False,show_colnames=False,
                     row_dendrogram=False,col_dendrogram=False,
                     row_split=df_row.loc[:, ['Target''Group']],
                     col_split=df_col['Strain'],cmap='parula',
                     rasterized=True,row_split_gap=1,legend=True,
                     tree_kws={'col_cmap':'Set1'})
plt.savefig("clustermap.pdf", bbox_inches='tight')
plt.show()
57fc0b7326acf248588a68d0b84d663f.webp

Key features:

用户可以通过row_split和col_split将所有的行和列按照标签分割成不同的模块,row_split and col_split 可以是数字(分成几个subgroup)、pandas dataframe或者是Series (每个样本对应的类别信息)。

5. 将多个热图[聚类图]水平或者垂直拼接起来
      
      row_ha = HeatmapAnnotation(Target=anno_simple(df_row.Target, colors=row_colors_dict['Target'], rasterized=True),
                               Group=anno_simple(df_row.Group, colors=row_colors_dict['Group'], rasterized=True),
                               axis=0)
col_ha = HeatmapAnnotation(label=anno_label(df_col.Strain, merge=True, rotation=15),
                           Strain=anno_simple(df_col.Strain, add_text=True),
                           Tissue=df_col.Tissue, Sex=df_col.Sex,
                           axis=1)  # df=df_col.loc[:,['Strain','Tissue','Sex']]

cm1 = ClusterMapPlotter(data=beta, top_annotation=col_ha, left_annotation=row_ha,
                       show_rownames=False, show_colnames=False,
                       row_dendrogram=False, col_dendrogram=False,
                       row_split=df_row.loc[:, ['Target''Group']],
                       col_split=df_col['Strain'], cmap='parula',
                       rasterized=True, row_split_gap=1, legend=True,
                        plot=False,label='beta',
                       tree_kws={'col_cmap''Set1'})  #

cm2 = ClusterMapPlotter(data=snp, top_annotation=col_ha, left_annotation=row_ha,
                        show_rownames=False, show_colnames=False,
                        row_dendrogram=False, col_dendrogram=False,
                        col_cluster_method='ward',row_cluster_method='ward',
                        col_cluster_metric='jaccard',row_cluster_metric='jaccard',
                        row_split=df_row.loc[:, ['Target''Group']],
                        col_split=df_col['Strain'],
                        rasterized=True, row_split_gap=1, legend=True,
                        plot=False,cmap='Greys',label='SNP',
                        tree_kws={'col_cmap''Set1'})  #

cmlist=[cm1,cm2]

plt.figure(figsize=(10,12))
composite(cmlist=cmlist, main=1,legendpad=0,legend_y=0.8)
plt.savefig("beta_snp.pdf", bbox_inches='tight')
plt.show()
a72d659ff0f2750061b860c005011871.webp

希望这篇文章能对大家有帮助!扫描文末二维码或者搜索关注 Computational Epigenetics 公众号,我们会经常分享生物信息学和计算表观遗传学相关的文章。

往期精品(点击图片直达文字对应教程)

6e1e083ece89ab46c7ddd16e91b70456.webp

804aa9509260f716052b1ff0fe34e101.webp

aa8a3af02e85870893893d04f4f97f19.webp

8e56ecc56227d9aad25d5e09a5ee55e2.webp

05169e48afc573313def83d8eee2a263.webp

6a6e6b75651b00344bd77440ad1cfb81.webp

85efec9c7df0d76be012b892bc53222c.webp

fc5f7aee32ec3a1f99bf1af5fd840b8a.webp

51250f62a64229355e9142bb38de0e8e.webp

18bfe0665c120a6a6e808e226227f167.webp

3020415f93dd0003e2beae729cc836a4.webp

3fdb7cc277bf7af673ca954321f955f9.webp

40e48df8fea731a29b66f4311098105b.webp

892685c19f6e778c306524bf29f843cd.webp

6acffcb749cdb5ffb6c99b8f1980b13e.webp

bd188eff8ce4837ad89a41c676a38c00.webp

0d8b4ae60180a0083e9f744c52007610.webp

d6cc38029c8f8d1265f004176910fafb.webp

09bd2302902fe321dcb7f8378185fbb4.webp

51930e0eb527d3e6cce8e764df098692.webp

45ff2802bf47afa0851a2f4b6ca4eebd.webp

8e2607c3ca4e2d275448fec918e0e700.webp

99a3c9b904f2d6e48a794d2d0f0d0181.webp

7893c0160c236de638524b7de2c0c02e.webp

d19d1dea0f4ef61698526fcefe1f6cf6.webp

37402447c09dad6793a86314b2a29d4b.webp

9893ee8b73ec205c6aa10e0a99168eba.webp

d2cfb28968e18e5ffc6be4e4cf3d30f3.webp

机器学习

后台回复“ 生信宝典福利第一波 ”或点击 阅读原文 获取教程合集

62555c524c94dea3c9a6ee187570334b.webp


浏览 55
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报
评论
图片
表情
推荐
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报