「Python实用秘技04」pdf文件批量添加文字水印-技术圈

添加微信号"CNFeffery"加入技术交流群

❝
本文完整示例代码及文件已上传至我的Github仓库https://github.com/CNFeffery/PythonPracticalSkills
❞

这是我的系列文章「Python实用秘技」的第4期，本系列立足于笔者日常工作中使用Python积累的心得体会，每一期为大家带来一个3分钟即可学会的简单小技巧。

作为系列第4期，我们即将学习的是：为pdf文件批量添加文字水印。

有些情况下我们需要为单个或多个pdf文件添加文字水印，尤其是那种需要在每一页按照一定间距铺满的文字水印。而借助reportlab和pikepdf这两个实用的pdf文件操作库，我们就可以很方便地实现批量文字水印添加工作。

利用pip install reportlab pikepdf完成安装后，我们就可以按照步骤来实现需要的功能：

生成指定的文本水印pdf文件

为了向目标pdf文件添加水印，我们首先需要有单独的pdf格式的文本水印文件，我用reportlab编写了一个方便易用的函数来生成水印文件，你可以通过注释来仔细学习其中的步骤，也可以直接调用即可：

from typing import Union, Tuple
from reportlab.lib import units
from reportlab.pdfgen import canvas
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

# 注册字体，这里的字体是我从windows的字体目录下复制过来的
pdfmetrics.registerFont(TTFont('msyh', r'./msyh.ttc'))

def create_watermark(content: str,
                     filename: str, 
                     width: Union[int, float], 
                     height: Union[int, float], 
                     font: str, 
                     fontsize: int,
                     angle: Union[int, float] = 45,
                     text_stroke_color_rgb: Tuple[int, int, int] = (0, 0, 0),
                     text_fill_color_rgb: Tuple[int, int, int] = (0, 0, 0),
                     text_fill_alpha: Union[int, float] = 1) -> None:
    '''
    用于生成包含content文字内容的水印pdf文件
    content: 水印文本内容
    filename: 导出的水印文件名
    width: 画布宽度，单位：mm
    height: 画布高度，单位：mm
    font: 对应注册的字体代号
    fontsize: 字号大小
    angle: 旋转角度
    text_stroke_color_rgb: 文字轮廓rgb色
    text_fill_color_rgb: 文字填充rgb色
    text_fill_alpha: 文字透明度
    '''

    # 创建pdf文件，指定文件名及尺寸，这里以像素单位为例
    c = canvas.Canvas(f"{filename}.pdf", pagesize = (width*units.mm, height*units.mm))
    
    # 进行轻微的画布平移保证文字的完整
    c.translate(0.1*width*units.mm, 0.1*height*units.mm)
    
    # 设置旋转角度
    c.rotate(angle)
    
    # 设置字体及字号大小
    c.setFont(font, fontsize)
    
    # 设置文字轮廓色彩
    c.setStrokeColorRGB(*text_stroke_color_rgb)
    
    # 设置文字填充色
    c.setFillColorRGB(*text_fill_color_rgb)
    
    # 设置文字填充色透明度
    c.setFillAlpha(text_fill_alpha)
    
    # 绘制文字内容
    c.drawString(0, 0, content)
    
    # 保存水印pdf文件
    c.save()

下面我们就利用这个函数来生成水印文件：

# 制造示例文字水印pdf文件
create_watermark(content='公众号【Python大数据分析】作者：费弗里', 
                 filename='水印示例', 
                 width=200,
                 height=200, 
                 font='msyh', 
                 fontsize=35,
                 text_fill_alpha=0.3)

看看效果，非常的不错，具体使用时，你可以自己动手调参以找到大小以及画幅都令你满意的水印导出结果：

将水印文件批量覆盖到目标pdf文件中

搞定了文本水印文件的生成之后，接下来我们就可以把现成的水印文件插入到目标pdf文件中，这里我们使用pikepdf中的相关功能就可以轻松实现，我写了一个简单的函数，大家在调用时只需要传入几个必要参数即可：

from typing import List
from pikepdf import Pdf, Page, Rectangle

def add_watermark(target_pdf_path: str,
                  watermark_pdf_path: str,
                  nrow: int,
                  ncol: int,
                  skip_pages: List[int] = []) -> None:
    '''
    向目标pdf文件中添加平铺水印
    target_pdf_path: 目标pdf文件的路径+文件名
    watermark_pdf_path: 水印pdf文件的路径+文件名
    nrow: 水印平铺的行数
    ncol：水印平铺的列数
    skip_pages: 需要跳过不添加水印的页面序号（从0开始）
    '''
    
    # 读入需要添加水印的pdf文件
    target_pdf = Pdf.open(target_pdf_path)
    
    # 读入水印pdf文件并提取水印页
    watermark_pdf = Pdf.open(watermark_pdf_path)
    watermark_page = watermark_pdf.pages[0]
    
    # 遍历目标pdf文件中的所有页（排除skip_pages指定的若干页）
    for idx, target_page in enumerate(target_pdf.pages):
        
        if idx not in skip_pages:
            for x in range(ncol):
                for y in range(nrow):
                    # 向目标页指定范围添加水印
                    target_page.add_overlay(watermark_page, Rectangle(target_page.trimbox[2] * x / ncol, 
                                                                      target_page.trimbox[3] * y / nrow,
                                                                      target_page.trimbox[2] * (x + 1) / ncol, 
                                                                      target_page.trimbox[3] * (y + 1) / nrow))
                    
    # 将添加完水印后的结果保存为新的pdf
    target_pdf.save(target_pdf_path[:-4]+'_已添加水印.pdf')

下面我们直接调用这个函数，对示例文件【吴恩达】机器学习训练秘籍-中文版.pdf中除了封面以外的每一页，按照3行2列的平铺密度，添加上我们的示例水印：

add_watermark(target_pdf_path='./【吴恩达】机器学习训练秘籍-中文版.pdf',
              watermark_pdf_path='./水印示例.pdf',
              nrow=3,
              ncol=2,
              skip_pages=[0])

效果杠杠的，读者朋友们可以自己多试试，得到更多心得体会~

本期分享结束，咱们下回见~👋

加入知识星球【我们谈论数据科学】

400+小伙伴一起学习！

· 推荐阅读 ·

地图可视化：geopandas绘制拓扑着色地图

盘点2021最佳数据可视化作品

聊聊我常用的5款动态数据可视化工具