CodeGeeX2更强大的多语言代码生成模型-技术圈

CodeGeeX2 是多语言代码生成模型 CodeGeeX (KDD’23) 的第二代模型。不同于一代 CodeGeeX（完全在国产华为昇腾芯片平台训练），CodeGeeX2 是基于 ChatGLM2 架构注入代码实现，得益于 ChatGLM2 的更优性能，CodeGeeX2 在多项指标上取得性能提升（+107% > CodeGeeX；仅60亿参数即超过150亿参数的 StarCoder-15B 近10%），更多特性包括：

更强大的代码能力：基于 ChatGLM2-6B 基座语言模型，CodeGeeX2-6B 进一步经过了 600B 代码数据预训练，相比一代模型，在代码能力上全面提升，HumanEval-X 评测集的六种编程语言均大幅提升 (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%)，在Python上达到 35.9% 的 Pass@1 一次通过率，超越规模更大的 StarCoder-15B。
更优秀的模型特性：继承 ChatGLM2-6B 模型特性，CodeGeeX2-6B 更好支持中英文输入，支持最大 8192 序列长度，推理速度较一代 CodeGeeX-13B 大幅提升，量化后仅需6GB显存即可运行，支持轻量级本地化部署。
更全面的AI编程助手：CodeGeeX插件（VS Code, Jetbrains）后端升级，支持超过100种编程语言，新增上下文补全、跨文件补全等实用功能。结合 Ask CodeGeeX 交互式AI编程助手，支持中英文对话解决各种编程问题，包括且不限于代码解释、代码翻译、代码纠错、文档生成等，帮助程序员更高效开发。
更开放的协议：CodeGeeX2-6B 权重对学术研究完全开放，填写问卷申请商业使用。

AI 编程助手

开发了支持 VS Code、 IntelliJ IDEA、PyCharm、GoLand、WebStorm、Android Studio 等IDE的 CodeGeeX 插件。在插件中，可以更直接地体验到 CodeGeeX2 模型在代码生成与补全、添加注释、代码翻译及技术问答方面的能力为开发效率带来的提升。详情见CodeGeeX主页。

快速开始

使用transformers快速调用CodeGeeX2-6B：

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()

# remember adding a language tag for better performance
prompt = "# language: python\n# write a bubble sort function\n"
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=256, top_k=1)
response = tokenizer.decode(outputs[0])

>>> print(response)
# language: python
# write a bubble sort function


def bubble_sort(list):
    for i in range(len(list) - 1):
        for j in range(len(list) - 1):
            if list[j] > list[j + 1]:
                list[j], list[j + 1] = list[j + 1], list[j]
    return list


print(bubble_sort([5, 2, 4, 6, 1, 3]))

代码能力评测

CodeGeeX2 作为一个多语言代码生成基座模型，代码能力较上一代大幅提升，以下是在 HumanEval，HumanEval-X, DS1000 基准上的评测结果（评价指标 Pass@k 定义与论文中一致）：

HumanEval (Pass@1,10,100)

Model	Pass@1	Pass@10	Pass@100
CodeGen-16B-multi	19.2	34.6	55.2
CodeGeeX-13B	22.9	39.6	60.9
Codex-12B	28.8	46.8	72.3
CodeT5Plus-16B-mono	30.9	51.6	76.7
Code-Cushman-001	33.5	54.3	77.4
LLaMA-65B	23.7	-	79.3
LLaMA2-70B	29.9	-	-
CodeGen2.5-7B-mono	33.4	58.4	82.7
StarCoder-15B	33.2	61.0	84.7
CodeGeeX2-6B	35.9	62.6	88.3

Pass@1 使用 n=20, t=0.2, top_p=0.95；Pass@10,Pass@100 使用 n=200, t=0.8, top_p=0.95。

HumanEval-X (Pass@1)

Model	Python	C++	Java	JavaScript	Go	Rust	Overall
CodeGen-16B-multi	19.2	18.1	15.0	18.4	13.0	1.8	14.2
CodeGeeX-13B	22.9	17.1	20.0	17.6	14.4	4.3	16.0
Replit-code-v1-3B	22.0	20.1	20.1	20.1	12.2	8.6	17.2
CodeGen2.5-7B-multi	30.6	24.3	29.0	27.5	18.9	20.1	25.1
StarCoder-15B	35.5	28.2	31.5	33.2	21.3	17.8	27.9
CodeGeeX2-6B	35.9	29.3	30.8	32.2	22.5	18.1	28.1

Pass@1 使用 n=20, t=0.2, top_p=0.95。

以上结果可使用脚本scripts/run_humanevalx.sh复现。环境配置和说明参见评测环境。

DS1000 (Pass@1)

Model	Matplotlib	Numpy	Pandas	Pytorch	SciPy	Scikit-learn	TensorFlow	Overall
# Samples	155	220	291	68	106	115	45	1000
CodeGen-16B-Mono	31.7	10.9	3.4	7.0	9.0	10.8	15.2	11.7
code-cushman-001	40.7	21.8	7.9	12.4	11.3	18.0	12.2	18.1
Codex-001	41.8	26.6	9.4	9.7	15.0	18.5	17.2	20.2
CodeGeeX2-6B	40.5	25.5	14.5	17.3	19.3	24.0	23.0	23.1
StarCoder-15B	51.7	29.7	11.4	21.4	20.2	29.5	24.5	26.0
Codex-002	57.0	43.1	26.5	41.8	31.8	44.8	39.3	39.2

Pass@1 使用 n=40, t=0.2, top_p=0.5。

以上结果可使用DS1000评测代码复现。

量化推理性能

CodeGeeX2 与上一代相比，对部署更加友好。得益于使用 Multi-Query Attention 和 Flash Attention，推理速度更快，且量化后仅需6GB显存即可运行：

量化

Model	FP16/BF16	INT8	INT4
CodeGeeX-13B	26.9 GB	14.7 GB	-
CodeGeeX2-6B	13.1 GB	8.2 GB	5.5 GB

基于 PyTorch 2.0 测试，利用torch.nn.functional.scaled_dot_product_attention实现高效的 Attention 计算。

推理

Model	推理速度 (字符/秒)
CodeGeeX-13B	32
CodeGeeX2-6B	94

batch_size=1, max_length=2048，均使用加速框架，测试硬件为GeForce RTX-3090。