当前位置: 首页-> 正文

耿骞、靳健:Information Processing & Management---Webpage retrieval based on query by example for think tank construction

2021-11-24 点击率:


Think tanks have been proved helpful for decision-making in various communities. However, collecting information manually for think tank construction implies too much time and labor cost as well as inevitable subjectivity. A probable solution is to retrieve webpages of renowned experts and institutes similar to a given example, denoted as query by webpage (QBW). Considering users' searching behaviors, a novel QBW model based on webpages' visual and textual features is proposed. Specifically, a visual feature extraction module based on pre trained neural networks and a heuristic pooling scheme is proposed, which bridges the gap that existing extractors fail to extract snapshots' high-level features and are sensitive to the noise effect brought by images. Moreover, a textual feature extraction module is proposed to represent textual content in both term and topic grains, while most existing extractors merely focus on the term grain. In addition, a series of similarity metrics are proposed, including a textual similarity metric based on feature bootstrapping to improve model's robustness and an adaptive weighting scheme to balance the effect of different types of features. The proposed QBW model is evaluated on expert and institute introduction retrieval tasks in academic and medical scenarios, in which the average value of MAP has been improved by 10% compared to existing baselines. Practically, useful insights can be derived from this study for various applications involved with webpage retrieval besides think tank construction.

Geng Qian、Chuai Ziang、Jin Jian.Webpage retrieval based on query by example for think tank construction[J]Information Processing & Management,2021,10.