一、模块概述

1、模块作用

官方说法：collections模块实现了特定目标的容器，以提供Python标准内建容器dict ,list , set , 和tuple的替代选择。

通俗说法：Python内置的数据类型和方法，collections模块在这些内置类型的基础提供了额外的高性能数据类型，比如最常用的统计词频most_common()函数，又如基础的字典是不支持顺序的，collections模块的OrderedDict类构建的字典可以支持顺序，collections模块的这些扩展的类用处非常大，熟练掌握该模块，可以大大简化Python代码，提高Python代码逼格和效率，高手入门必备。

2、模块资料

关于该模块，官方的参考资料写的非常详细，也很有价值，大家可以参考

中文文档：https://docs.python.org/zh-cn/3/library/collections.html#module-collections

英文文档：https://docs.python.org/3/library/collections.html#module-collections

3、模块子类

用collections.__all__查看所有的子类，一共包含9个

import collectionsprint(collections.__all__)['deque', 'defaultdict', 'namedtuple', 'UserDict', 'UserList', 'UserString', 'Counter', 'OrderedDict', 'ChainMap']

这个模块实现了特定目标的容器，以提供Python标准内建容器dict , list , set , 和tuple 的替代选择。

namedtuple()	创建命名元组子类，生成可以使用名字来访问元素内容的tuple子类
deque	类似列表(list)的容器，实现了在两端快速添加(append)和弹出(pop)
ChainMap	类似字典(dict)的容器类，将多个映射集合到一个视图里面
Counter	字典的子类，提供了可哈希对象的计数功能
OrderedDict	字典的子类，保存了他们被添加的顺序，有序字典
defaultdict	字典的子类，提供了一个工厂函数，为字典查询提供一个默认值
UserDict	封装了字典对象，简化了字典子类化
UserList	封装了列表对象，简化了列表子类化
UserString	封装了字符串对象，简化了字符串子类化（中文版翻译有误）

二、计数器-Counter

一个计数器工具提供快速和方便的计数，Counter是一个dict的子类，用于计数可哈希对象。它是一个集合，元素像字典键(key)一样存储，它们的计数存储为值。计数可以是任何整数值，包括0和负数，Counter类有点像其他语言中的bags或multisets。简单说，就是可以统计计数，来几个例子看看就清楚了，比如

计算top10的单词

from collections import Counterimport retext = 'remove an existing key one level down remove an existing key one level down'words = re.findall(r'\w+', text)Counter(words).most_common(10)[('remove', 2),('an', 2),('existing', 2),('key', 2),('one', 2)('level', 2),('down', 2)] 
#计算列表中单词的个数cnt = Counter()for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:    cnt[word] += 1cntCounter({'red': 2, 'blue': 3, 'green': 1})#上述这样计算有点嘛，下面的方法更简单，直接计算就行L = ['red', 'blue', 'red', 'green', 'blue', 'blue'] Counter(L)Counter({'red': 2, 'blue': 3, 'green': 1})

元素从一个iterable 被计数或从其他的mapping (or counter)初始化

from collections import Counter#字符串计数Counter('gallahad') Counter({'g': 1, 'a': 3, 'l': 2, 'h': 1, 'd': 1})
#字典计数Counter({'red': 4, 'blue': 2})  Counter({'red': 4, 'blue': 2})
#是个啥玩意计数Counter(cats=4, dogs=8)Counter({'cats': 4, 'dogs': 8})
Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])Counter({'red': 2, 'blue': 3, 'green': 1})

计数器对象除了字典方法以外，还提供了三个其他的方法：

1、elements()

描述：返回一个迭代器，其中每个元素将重复出现计数值所指定次。元素会按首次出现的顺序返回。如果一个元素的计数值小于1，elements() 将会忽略它。

语法：elements( )

参数：无

c = Counter(a=4, b=2, c=0, d=-2)list(c.elements())['a', 'a', 'a', 'a', 'b', 'b']sorted(c.elements())['a', 'a', 'a', 'a', 'b', 'b']c = Counter(a=4, b=2, c=0, d=5)list(c.elements())['a', 'a', 'a', 'a', 'b', 'b', 'd', 'd', 'd', 'd', 'd']

2、most_common()

返回一个列表，其中包含n个最常见的元素及出现次数，按常见程度由高到低排序。如果 n 被省略或为None，most_common() 将返回计数器中的所有元素，计数值相等的元素按首次出现的顺序排序：

经常用来计算top词频的词语。

Counter('abracadabra').most_common(3)[('a', 5), ('b', 2), ('r', 2)]
Counter('abracadabra').most_common(5)[('a', 5), ('b', 2), ('r', 2), ('c', 1), ('d', 1)]

3、subtract()

从迭代对象或映射对象减去元素。像dict.update() 但是是减去，而不是替换。输入和输出都可以是0或者负数。

c = Counter(a=4, b=2, c=0, d=-2)d = Counter(a=1, b=2, c=3, d=4)c.subtract(d)cCounter({'a': 3, 'b': 0, 'c': -3, 'd': -6})#减去一个abcdstr0 = Counter('aabbccdde')str0Counter({'a': 2, 'b': 2, 'c': 2, 'd': 2, 'e': 1})
str0.subtract('abcd')str0Counter({'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}

4、字典方法

通常字典方法都可用于Counter对象，除了有两个方法工作方式与字典并不相同。

fromkeys(iterable)

这个类方法没有在Counter中实现。

update([iterable-or-mapping])

从迭代对象计数元素或者从另一个映射对象 (或计数器) 添加。像 dict.update() 但是是加上，而不是替换。另外，迭代对象应该是序列元素，而不是一个 (key, value) 对。

sum(c.values())                 # total of all countsc.clear()                       # reset all countslist(c)                         # list unique elementsset(c)                          # convert to a setdict(c)                         # convert to a regular dictionaryc.items()                       # convert to a list of (elem, cnt) pairsCounter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairsc.most_common()[:-n-1:-1]       # n least common elements+c                              # remove zero and negative counts

5、数学操作

这个功能非常强大，提供了几个数学操作，可以结合 Counter 对象，以生产 multisets (计数器中大于0的元素）。加和减，结合计数器，通过加上或者减去元素的相应计数。交集和并集返回相应计数的最小或最大值。每种操作都可以接受带符号的计数，但是输出会忽略掉结果为零或者小于零的计数。

c = Counter(a=3, b=1)d = Counter(a=1, b=2)c + d                       # add two counters together:  c[x] + d[x]Counter({'a': 4, 'b': 3})c - d                       # subtract (keeping only positive counts)Counter({'a': 2})c & d                       # intersection:  min(c[x], d[x]) Counter({'a': 1, 'b': 1})c | d                       # union:  max(c[x], d[x])Counter({'a': 3, 'b': 2})

单目加和减（一元操作符）意思是从空计数器加或者减去。

c = Counter(a=2, b=-4)+cCounter({'a': 2})-cCounter({'b': 4})

写一个计算文本相似的算法，加权相似

def str_sim(str_0,str_1,topn):    topn = int(topn)    collect0 = Counter(dict(Counter(str_0).most_common(topn)))    collect1 = Counter(dict(Counter(str_1).most_common(topn)))           jiao = collect0 & collect1    bing = collect0 | collect1           sim = float(sum(jiao.values()))/float(sum(bing.values()))            return(sim)         
str_0 = '定位手机定位汽车定位GPS定位人定位位置查询'         str_1 = '导航定位手机定位汽车定位GPS定位人定位位置查询'         
str_sim(str_0,str_1,5)    0.75

推荐阅读：

画图神器pyecharts-桑基图

画图神器pyecharts-水球图

画图神器pyecharts-旭日图

刷爆网络的动态条形图，3行Python代码就能搞定

Python中读取图片的6种方式

2020年11月国内大数据竞赛信息-奖池5000万

Python字典详解-超级完整版

加群交流学习

↓扫描关注本号↓

Python高手入门之collections库-Counter