基于Python的语料库数据处理（三）-技术圈

△ 是新朋友吗？记得先点数据科学与人工智能关注我哦～

《Python玩转语料库数据》专栏·第3篇

文 | 段洵

1393字 | 5 分钟阅读

【数据科学与人工智能】已开通Python语言社群，学用Python，玩弄数据，求解问题，以创价值。喜乐入群者，请加微信号shushengya360，或扫描文末二维码，添加为好友，同时附上Python-入群。有朋自远方来，不亦乐乎，并诚邀入群，以达相互学习和进步之美好心愿。

一起来学习用Python进行语料库数据处理吧！

一、条件判断

在执行某个语句前，我们可能需要对某个条件进行判断，并根据条件判断的结果来决定是否执行该语句。这时就需要使用条件判断if。

条件判断if的基本句法为：

if :

在用Python进行语料库数据处理时，常用的条件判断操作符有“<”、“>”、“<=”、“>=”、“==”以及“！=”。

示例：

str1 =  'Life is short, we use Python.'

if len(str1) > 10:
    print('The string has more than 10 characters.')   # Print the sentence

str2 = 'Python'
if str2.startswith('p'):
    print(str2)

str1 = 'Life is short, we use Python.'

if len(str1) > 30:
    print('The string has more than 30 characters.')
else:
    print('The string has less than 30 characters.')  # Print this sentence

str2 = 'Python'
if str2.startswith('p'):
    print('Yeah!')
else:
    print('Oh, no!')

str1 = 'Python_N'

if str1.endswith('V'):
    print('This is a verb.')          # Pass
elif str1.endswith('N'):
    print('This is a noun.')          # Print 'This is a noun.'
elif str1.endswith('A'):
    print('This is an adjective.')    # Pass
elif str1.endswith('R'):
    print('This is an adverb.')       # Pass
else:
    print('This is a function word')  # Pass

二、while循环

在编程时，可能需要重复执行某个语句，这就需要使用循环。如果需要对某个条件进行判断，以重复执行某个语句，则需要用到while循环。

while循环的基本句法为：

while :

示例：

i = 1

while i <= 10:
 print(i)
 i += 1

三、for...in循环

如果我们需要对某个序列中的每一个元素重复执行某个语句，则需要用到for...in循环，for...in循环的基本句法为：

for i in :

示例：

word = 'Python'

for letter in word:
    print(letter)

word = 'Python'

for letter in word:
    print(letter.upper())  #将字母变为大写

prefix = "A"
start = 2011001
end = 2011101

for i in range(start, end):
    print(prefix + str(i))

四、读写单个文本

语料库数据大多为文本文件。在进行语料库处理时，首先要对文本进行读取。读取文本需要使用open函数。open函数读取文本的基本句法如下：

file_handle = open("file_name","r")

file_handle.close()

open()函数有多个参数，第一个参数是目标文件的路径与文件名，文件名可以是绝对地址路径或相对地址路径；第二个参数是“r”，表示读取文本（read），close()关闭文件句柄。

示例：

file_in = open("../texts/ge.txt", "r")

for line in file_in:
    print(line)              #对读取的文件进行逐行打印输出

file_in.close()

file_in = open("../texts/ge.txt", "r")

for line in file_in.readlines():       #readlines()函数可将文本读取成一个列表
    print(line)
file_in.close()

我们将文本或其他语料进行处理后，可能需要将处理结果写成文本文件。写出并保存文本依然使用open()函数。

示例：

file_in = open("../texts/ge.txt", "r")
file_out = open("../ge_lower.txt", "a") # "a"表示不会删除原文本内容，而是将新内容追加到末尾。"w"会删除原始内容，生成新文件。

for line in file_in.readlines():
    line_new = line.lower()         #变为小写
    file_out.write(line_new)        #写出

file_in.close()
file_out.close()

推荐阅读：基于Python的语料库数据处理（一）

基于Python的语料库数据处理（二）

公众号推荐

数据思践

数据思践公众号记录和分享数据人思考和践行的内容与故事。

Python语言群

诚邀您加入

请扫下方二维码加我为好友，备注Python-入群。有朋自远方来，不亦乐乎，并诚邀入群，以达相互学习和进步之美好心愿。