htmlcxxHTML和CSS的C++解析器

联合创作 · 2023-09-21 23:58

htmlcxx 是一个 C++ 的 HTML 解析器和 CSS1 的解析器。The parsing politics attempt to mimic the behavior of Mozilla Firefox, so you should expect parse trees similar to those created by Firefox. However, it does not insert nonexistent stuff in your HTML. Therefore, serializing the DOM tree gives exactly the same output as the original HTML document. Another key feature is an STL-like tree navigation API provided by the tree.hh template library.

示例代码:

  #include <htmlcxx/html/ParserDom.h>
...

//Parse some html code
string html = "<html><body>hey</body></html>";
HTML::ParserDom parser;
tree<HTML::Node> dom = parser.parseTree(html);

//Print whole DOM tree
cout << dom << endl;

//Dump all links in the tree
tree<HTML::Node>::iterator it = dom.begin();
tree<HTML::Node>::iterator end = dom.end();
for (; it != end; ++it)
{
if (it->tagName() == "A")
{
it->parseAttributes();
cout << it->attributes("href");
}
}

//Dump all text of the document
it = dom.begin();
end = dom.end();
for (; it != end; ++it)
{
if ((!it->isTag()) && (!it->isComment()))
{
cout << it->text();
}
}
浏览 8
点赞
评论
收藏
分享

手机扫一扫分享

编辑 分享
举报
评论
图片
表情
推荐
点赞
评论
收藏
分享

手机扫一扫分享

编辑 分享
举报