html5libHTML解析库
html5lib 是一个用来解析 HTML 文档的 Python 类库,支持HTML 5 以及最大程度兼容桌面浏览器。
主要特性包括:
- Parses valid and invalid HTML documents to a tree
 - Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup and custom simpletree output formats
 - DOM to SAX converter
 - Reports parse errors
 - Character encoding detection
 - XML mode for working with illformed XML e.g. feeds
 - Filtering and serializing of trees
 - HTML+CSS sanitizer
 - Many unit tests
 - Faster than before :)
 
评论
