xpaf开源解析框架
XPath-based Parsing Framework (XPaF) 是一个简单、方便的开源解析框架,便于从 HTML 和 XML 文档中提取语法上的相关性(subject-predicate-object triples)。
代码示例:
<table> <tr> <td class="name">Aaron</td> <td class="occ">Engineer</td> </tr> <tr> <td class="name">Jennifer</td> <td class="occ">Archeologist</td> </tr> </table>
parser_name: "my_parser" relation_tmpls { subject: "//td[@class='name']" predicate: "occupation" object: "//td[@class='occ']" subject_cardinality: MANY object_cardinality: MANY }
评论