【论文阅读 CIKM2014】Extending Faceted Search to the General Web

news/2024/11/23 1:55:34/

文章目录

    • Foreword
    • Motivation
    • Method
      • Query facet generation:
      • Facet feedback
    • Evaluation

Foreword

  • This paper is from CIKM 2014, so we only consider the insights
  • I have read this paper last month and today i share this blog
  • There are many papers that have not been shared. More papers can be found in: ShiyuNee/Awesome-Conversation-Clarifying-Questions-for-Information-Retrieval: Papers about Conversation and Clarifying Questions (github.com)

Motivation

Extend faceted search into open-domain web setting, which we call Faceted We Search

Method

two major components in a FWS(Faceted Web Search) system:

  • query facet generation
  • facet feedback

Query facet generation:

Facet generation is typically performed in advance for an entire corpus, an approach which is challenging when extended to the general web

  • So, we use query facet generation to generate facets for a query

  • Extracting Candidates: as before(Extracting Query Facets from Search Results)

  • Refining Candidates: re-cluster the query facets or their facet term`s into higher quality query facets.

    • topic modeling(unsupervised):
      • assumption: candidate facets are generated by a mixture of hidden topics(query facets). After training, the topics are returned as query facets, by using top terms in each topic.
      • apply both pLSA and LDA
      • only use term co-occurence information
    • QDMiner / QDM(unsupervised):
      • applies a variation of the Quality Threshold clustering algorithm to cluster the candidate facets with bias towards important ones. Then it ranks/selects the facet clusters and the terms in those clusters based on TF/IDF-like scores.
      • consider more information than just term co-occurence, but it’s not easy to add new features
    • QF-I and QF-J (supervised): based on a graphical model
      • learns how likely it is that a term in the candidate facets should be selected in the query facets, and how likely two terms are to be grouped together into a same query facet, using a rich set of features.
      • based on the likelihood scores, QF-I selects the terms and clusters the selected terms into query facets, while QF-J
        repeats the procedure, trying to performance joint inference.

Facet feedback

Re-rank the search results.

  • Boolean Filtering Model:

    • condition can be AND, OR , A + O, etc.
      • S(D,Q)S(D,Q)S(D,Q) is the score returned by the original retrieval model

    在这里插入图片描述

  • Soft Ranking Model(better):

    • expand the original query with feedback terms, using a linear combination as follows:

      在这里插入图片描述

      • S(D,Q)S(D,Q)S(D,Q) is the score from the original retrieval model

      • SE(D,Fu)S_E(D,F^u)SE(D,Fu)​ is the expansion part which captures the relevance between the document D and feedback facet FuF_uFu, using expansion model E.

        • use original retrieval model to get document scores when the feedback terms are used as query.

          在这里插入图片描述

          or

          在这里插入图片描述

    • the original retrieval model:

      • incorporates word unigrams, adjacent word bigrams, and adjacent word proximity.

      在这里插入图片描述

Evaluation

Intrinsic evaluation does not necessarily reflect the utility of the generated facets in assisting search

  • some annotator-selected facets may be of little value for the search task, and some good facet terms may be missed by annotators.

We propose an extrinsic evaluation method directly measures the utility based on a FWS task

Intrinsic Evaluation:

“gold standard” query facets are constructed by human annotators and used as the ground truth to be compared with facets generated by different systems.

  • Conventional clustering metrics: Purity and Normalized Mutual InformationNMI
  • Newly designed metrics for facet generation: wPREα,βwPRE_{\alpha,\beta}wPREα,β, some variations of nDCGnDCGnDCG

The facet annotation is usually done by first pooling facets generated by the different systems. Then annotators are asked to group or re-group terms in the pool into preferred query facets, and to give ratings for each of them regarding how useful or important the facet is.

Extrinsic Evaluation:

Propose an extrinsic evaluation method which evaluated a system based on an interactive search task that incorporates FWS

  • simulate the user feedback process based on an interaction model, using oracle feedback terms and facet terms collected from annotators.
  • Both the oracle feedback and annotator feedback incrementally select all feedback terms that a user may select, which will
    then be used in simulation based on the user model to determine which subset of the oracle or annotator feedback terms are selected by a user and how much time is spent giving that feedback.
  • Finally, the systems are evaluated by the re-ranking performance together with the estimated time cost.

Oracle and Annotator Feedback:

  • Oracle feedback: presents an ideal case of facet feedback, in which only effective terms
    • is cheap to obtain for any facet system
    • but may be quite different from what actual users may select in a real interaction.
      • So, also collect feedback terms from annotators
  • Annotator feedback

User Model:

Describes how a user selects feedback terms from facets, based on which we can estimate the time cost for the user.


http://www.ppmy.cn/news/12129.html

相关文章

day03 链表 | 203、移除链表元素 707、设计链表 206、反转链表

题目 203、移除链表元素 删除链表中等于给定值 val 的所有节点。 示例 1: 输入:head [1,2,6,3,4,5,6], val 6 输出:[1,2,3,4,5] 示例 2: 输入:head [], val 1 输出:[] 示例 3: 输入&am…

04 链式队列的实现

带头节点的链式队列: 初始化:rear和front指针都指向头节点入队:向rear指向的节点后插入新节点,并让rear指针移动指向新的队尾节点出队:front指针始终指向头节点,即删除头节点后一个节点;最后一个…

【精品】k8s(Kubernetes)由基础到实战学法指南

轻松快速学会k8s四招 图1 k8s四招 学完本篇,您会获得什么惊喜? 从初学k8s,到帮助别人学会的过程中,发现朋友们和我,并非不努力,而是没有掌握更好的方法。有方法可让我们学的更快更轻松,这篇文章,以一个networkpolicy的题目,来逐步讲解,帮助大家建立一种,自己可以根…

【Nginx】Nginx搭建高可用集群

1. KeepalivedNginx 高可用集群(主从模式)2. 配置高可用的准备工作3. 在两台服务器上安装keepalived4. 完成高可用配置(主从配置)5. 最终测试 1. KeepalivedNginx 高可用集群(主从模式) 2. 配置高可用的准备工作 需要两台服务器…

求约数,约数个数,约数之和,最大公约数

求约数 求一个数的所有约数&#xff0c;枚举 [1,n][1,\sqrt n][1,n​] 的所有整数&#xff0c;能整除 n 的就是约数&#xff0c;另一半约数就是 n / i. vector<int> get_divisors(int n) {vector<int> res;for (int i 1; i < n / i; i) {if (n % i 0) {res.…

python Django 运维设备管理系统

python Django 运维设备管理系统pythonDjango 电脑管理系统pythonDjango 资产管理系统python公司电脑管理系统python公司数据库管理系统后端语言&#xff1a;python Django数据库&#xff1a;MySQL 5.7 前端&#xff1a;html&#xff0c;css&#xff0c;js&#xff0c;bootstr…

【动态内存管理】-关于动态内存你只知道四个函数是不够的,这里还有题目教你怎么正确使用函数,还不进来看看??

&#x1f387;作者&#xff1a;小树苗渴望变成参天大树 &#x1f4a6;作者宣言&#xff1a;认真写好每一篇博客 &#x1f4a2; 作者gitee&#xff1a;link 如 果 你 喜 欢 作 者 的 文 章 &#xff0c;就 给 作 者 点 点 关 注 吧&#xff01; &#x1f38a;动态内存管理&…

听说你想用开发者工具调试我的网站?挺可以的啊。25

本篇博客重点为大家介绍&#xff0c;如何禁止用户在浏览器中查看源码&#xff0c;禁用开发者工具调试等前端需求 案例已更新到 爬虫训练场 文章目录禁用右键&#xff0c;禁用 F12禁用 ctrl U 查看源代码&#xff0c;禁用 ctrl shift i 打开开发者工具实现开发者工具无限 deb…