xpaf开源解析框架
XPath-based Parsing Framework (XPaF) 是一个简单、方便的开源解析框架,便于从 HTML 和 XML 文档中提取语法上的相关性(subject-predicate-object triples)。
代码示例:
<table> <tr> <td class="name">Aaron</td> <td class="occ">Engineer</td> </tr> <tr> <td class="name">Jennifer</td> <td class="occ">Archeologist</td> </tr> </table>
parser_name: "my_parser"
relation_tmpls {
subject: "//td[@class='name']"
predicate: "occupation"
object: "//td[@class='occ']"
subject_cardinality: MANY
object_cardinality: MANY
}
评论
