http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=Aashkan&feedformat=atomstatwiki - User contributions [US]2024-03-29T11:57:41ZUser contributionsMediaWiki 1.41.0http://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14615a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-10T22:39:21Z<p>Aashkan: </p>
<hr />
<div>===Background: Click Models===<br />
One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
===The Proposed Model===<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the document level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance. The objective of the paper is to estimate the actual relevance of the document <math>\ u</math>:<br />
<br />
<center><math>\ r_u = P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
===Evaluation===<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14613a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-10T22:34:56Z<p>Aashkan: </p>
<hr />
<div>===Background: Click Models===<br />
One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
===The Proposed Model===<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance. The objective of the paper is to estimate the actual relevance of the document <math>\ u</math>:<br />
<br />
<center><math>\ r_u = P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
===Evaluation===<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14612a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-10T22:33:22Z<p>Aashkan: </p>
<hr />
<div>===Background: Click Models===<br />
One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
===The Proposed Model===<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance. The objective of the paper is to estimated the actual relevance of the the documents for ranking:<br />
<br />
<center><math>\ r_u = P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
===Evaluation===<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14491a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-10T00:26:05Z<p>Aashkan: /* Motivation: Click Models */</p>
<hr />
<div>===Background: Click Models===<br />
One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
===The Proposed Model===<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
===Evaluation===<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14490a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-10T00:24:27Z<p>Aashkan: </p>
<hr />
<div>===Motivation: Click Models===<br />
One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
===The Proposed Model===<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
===Evaluation===<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14489a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-10T00:23:06Z<p>Aashkan: </p>
<hr />
<div>--Motivation -- <br />
One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14317a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:54:23Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14316a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:54:08Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
""*"" <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14314a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:52:06Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or will abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14313a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:49:30Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or abandon the search with the probability <math>\ 1 - \gamma </math> ); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14312a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:49:03Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> (or abandon the search with the probability <math>\ 1 - \gamma </math>); <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14311a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:48:28Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> or stop the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math>; <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14310a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:47:58Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> or stop the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math>; <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
: <math>\ P(E_{i+1}=0 | E_i = 1, S_i = 0) = 1-\gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14309a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:47:17Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he will examine the next document with the probability<math>\ \gamma </math> or stop the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math>; <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14308a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:46:01Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability<math>\ \gamma </math>; <br />
: <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14307a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:45:38Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <br />
: <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <br />
: <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <br />
: <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <br />
: <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <br />
: <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <br />
: <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability<math>\ \gamma </math>; <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14306a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:45:02Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
: <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability<math>\ \gamma </math>; <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14305a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:44:36Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <br />
; <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability<math>\ \gamma </math>; <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14303a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:42:11Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability<math>\ \gamma </math>; <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14302a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:41:42Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability<math>\ \gamma </math>; <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14301a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:40:54Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it; <math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document; <math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document; <math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document; <math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search; <math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability <math>\ \gamma </math>; <math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14300a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:40:14Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1); <math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions; <math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document.<br />
<math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search.<br />
<math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability <math>\ \gamma </math>.<br />
<math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14299a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:39:42Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<math>\ E_1 = 1</math><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math><br />
<br />
* The probability of being attracted depends only on the document.<br />
<math>\ P(A_i=1) = a_u</math><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<math>\ P(S_i = 1 | C_i = 1) = s_u</math><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<math>\ C_i = 0 \Rightarrow S_i = 0</math><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search.<br />
<math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability <math>\ \gamma </math>.<br />
<math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14292a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:19:30Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, she/he stops the search (a.k.a. search abandonment) with the probability <math>\ 1 - \gamma </math> or examines the next document with the probability <math>\ \gamma </math>.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14291a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:17:35Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 \minus \gamma </math> that the user stops the search (a.k.a. search abandonment), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14290a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:17:08Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 − \gamma </math> that the user stops the search (a.k.a. search abandonment), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14289a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:15:52Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position <math>\ i </math>, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document that, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 − \gamma </math> that the user stops the search (a.k.a. search abandonment), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14287a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:14:49Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document that, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 − \gamma </math> that the user stops the search (a.k.a. search abandonment), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14286a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:14:29Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document that, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 − \gamma </math> that the user stops the search (a.k.a. search abandonment), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </math>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14285a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:13:31Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user does not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document that, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 − \gamma </math> that the user stops the search (a.k.a. search abandonment), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </amth>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14284a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:11:25Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behaviour are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user did not examine the position <math>\ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document that, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 − \gamma </math> that the user stops the search (abandons), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </amth>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14283a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:10:02Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behavior are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1).<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user did not examine the position <math>/ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
* The probability of being attracted depends only on the document.<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
* The user scans the results list linearly from top to bottom until she/he decides to stop. Once the user clicks and visits the document, there is a certain probability that she/he will be satisfied by the document.<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
* No click from the user indicates no user's satisfaction on the document.<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
* Once the user is satisfied by the visited document that, she/he stops the search.<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* If the user is not satisfied by the current result, there is a probability <math>\ 1 − \gamma </math> that the user stops the search (abandons), and a probability <math>\ \gamma </math> that the user examines the next document.<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </amth>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14279a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-09T00:01:41Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>\ u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
<br />
The rest of the assumptions about the user click and browsing behavior are modeled in DBN as follows:<br />
<br />
* The user always examines the first result (i.e. document at position 1.<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user did not examine the position <math>/ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
* There is a click if and only if the user looked at the document and was attracted by it.<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<br />
*<br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<br />
*<br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<br />
*<br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<br />
*<br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
*<br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </amth>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14278a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:59:43Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>\ u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
* The user always examines the first result (i.e. document at position 1.<br />
<center><math>\ E_1 = 1</math></center><br />
<br />
* If the user did not examine the position <math>/ i </math> she/he will not examine the subsequent positions.<br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<br />
<br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </amth>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14277a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:56:49Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>\ u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>\ E_1 = 1</math></center><br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </amth>, and <math>\ \gamma </math> are updated.<br />
<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14274a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:56:17Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>\ u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>\ E_1 = 1</math></center><br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
The model is trained using the Expectation Maximization:<br />
<br />
* E-Step: Given <math>\ a_u </math> and <math>\ s_u </math>, the posterior probabilities on <math>\ A_i </math>, <math>\ E_i </math>, and <math>\ S_i </math> are computed.<br />
* M-Step: Given the posterior probabilities, values of <math>\ a_u </math>, <math>\ s_u </amth>, and <math>\ \gamma </math> are updated.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14270a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:49:43Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>\ u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>\ i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>\ i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>\ i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>\ i</math> (i.e. actual relevance).<br />
<br />
The variables <math>\ a_u</math> and <math>\ s_u</math> are related to the relevance of the document. <math>\ a_u</math> represents the perceived relevance, and <math>\ s_u</math> represents the ratio between the actual relevance (denoted by <math>\ r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>\ E_1 = 1</math></center><br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14269a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:48:53Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>\ u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>\ C_i</math> indicating whether there was a click at position <math>\ i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>\ E_1 = 1</math></center><br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14268a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:48:20Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>\ E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>\ A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>\ S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math>\ r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>\ E_1 = 1</math></center><br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>\ P(A_i=1) = a_u</math></center><br />
<center><math>\ P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>\ C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>\ S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14267a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:47:53Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>\ E_1 = 1</math></center><br />
<center><math>\ E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>\ A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>P(A_i=1) = a_u</math></center><br />
<center><math>P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14265a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:47:01Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1 = 1</math></center><br />
<center><math>E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>P(A_i=1) = a_u</math></center><br />
<center><math>P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>P(E_{i+1}=1 | E_i = 1, S_i = 0) = \gamma </math></center><br />
<br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14264a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:45:30Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1 = 1</math></center><br />
<center><math>E_i = 0 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math>P(A_i=1) = a_u</math></center><br />
<center><math>P(S_i = 1 | C_i = 1) = s_u</math></center><br />
<center><math>C_i = 0 \Rightarrow S_i = 0</math></center><br />
<center><math>S_i = 1 \Rightarrow E_{i+1} = 0</math></center><br />
<center><math>P(E_{i+1}=1 | E_i = 1), S_i = 0) = \gamma </math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14261a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:42:12Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1 = 1</math></center><br />
<center><math>E_i=0 \Rightarrow E_{i+1}=0</math></center><br />
<center><math>A_i = 1, E_i = 1 \Leftrightarrow C_i = 1</math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14259a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:41:26Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1=1</math></center><br />
<center><math>E_i=0 \Rightarrow E_{i+1}=0</math></center><br />
<center><math>Ai = 1, Ei = 1 \Rightarrow Ci = 1</math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14258a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:40:33Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1=1</math></center><br />
<center><math>E_i=0 \rightarrow E_{i+1}=0</math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14257a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:40:14Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1=1</math></center><br />
<center><math>E_i=0 \Rightarrow E_{i+1}=0</math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14255a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:38:51Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1=1</math></center><br />
<center><math>E_i=0 → E_{i+1}=0</math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14254a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:36:42Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1=1</math></center><br />
<center><math>E_i=0 0x21d2 E_{i+1}=0</math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14253a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:36:27Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1=1</math></center><br />
<center><math>E_i=0 U+0x21d2 E_{i+1}=0</math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkanhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=a_Dynamic_Bayesian_Network_Click_Model_for_Web_Search_Ranking&diff=14252a Dynamic Bayesian Network Click Model for Web Search Ranking2011-11-08T23:35:58Z<p>Aashkan: </p>
<hr />
<div>One of the most common click models in Web search, known as the ''position model'', is based on the position bias on the displayed ranked results. Under this model, it is assumed that the chance of click decreases towards the lower ranks on result pages due to the reduced visual attention from the user. A more recent click model, referred to as the ''cascade model'' of user behaviour, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisfied or their patience is exhausted. <br />
<br />
The benefit of the cascade model over the position model is its ability to explain click with respect to the relevance of the previous documents; therefore, the later model has shown state-of-the-art performance over the former one. However, the cascade model makes a strong assumption that there is only one click per search; hence, it can not explain the abandoned search or search with multiple clicks. Moreover, none of these models distinguish the perceived relevance and the actual relevance. The perceived relevance is the relevance of a document judged by the user based on their examination of the document as it is shown on a result page. The actual relevance is the relevance of the document judged by the user once she/he clicks on it and sees its content.<br />
<br />
A Dynamic Bayesian Network (DBN) model is proposed in this paper in order to study the user's browsing and click behaviour, and eventually to infer the relevance of the documents. The proposed model addresses the issues with the above models through the following assumptions about the user's click and browsing behaviour: <br />
<br />
* The user makes a linear traversal through the results and decides whether to click based on the perceived relevance of the document. <br />
* The user chooses to examine the next document if she/he is unsatisfied with the clicked document (based on the actual relevance).<br />
* A click does not necessarily mean that the user is satisfied with the clicked document. With respect to this, the proposed model attempts to distinguish the perceived relevance and the actual relevance.<br />
* There is no limit on the number of clicks that a user can make during a search.<br />
<br />
The documents ranked on a result list of a given query are presented through a sequence in DBN. The variables inside the box are defined at the session level, while those out of the box are defined at the query (user <math>u</math> who entered the query) level. <br />
<br />
<center> [[File:Dbn.png]] </center><br />
<br />
For a given position i, there is an observed variable <math>C_i</math> indicating whether there was a click at position <math>i</math>. There are three hidden binary variables defined for each position <math>i</math> in order to model examination, perceived relevance, and actual relevance:<br />
<br />
* <math>E_i</math>: whether the user examined the document at position <math>i</math>.<br />
* <math>A_i</math>: whether the user was attracted by the document at position <math>i</math> (i.e. perceived relevance).<br />
* <math>S_i</math>: whether the user was satisfied by the document at position <math>i</math> (i.e. actual relevance).<br />
<br />
The variables <math>a_u</math> and <math>s_u</math> are related to the relevance of the document. <math>a_u</math> represents the perceived relevance, and <math>s_u</math> represents the ratio between the actual relevance (denoted by <math>r_u</math>) and the perceived relevance:<br />
<br />
<center><math> r_u := P(S_i=1 | E_i=1) = P(S_i=1|C_i=1) P(C_i=1 | E_i=1) = s_u a_u</math></center><br />
<br />
The rest of the assumptions about the user click and browsing behavior can be described through the following equations in DBN model:<br />
<br />
<center><math>E_1=1</math></center><br />
<center><math>E_i=0 #0x21d2 E_{i+1}=0</math></center><br />
<center><math></math></center><br />
<br />
The Expectation Maximization algorithm is used to find the maximum likelihood estimate of the perceived relevance and the actual relevance variables. The forward-backward algorithm is used to to compute the posterior probabilities of the rest of the hidden variables.<br />
<br />
Three types of experiments are conducted in the paper to validate DBN and to compare it with the existing models. First, they evaluate the click model in terms of the predicted click rate at position 1. Then they use the predicted relevance as a feature in a ranking function. In the last set of experiments, they use the predicted relevance as a supplementary information to train a ranking function. <br />
<br />
The empirical results from the experiments on the logs of a commercial search engine indicate that DBN can accurately explain the observed clicks. They show that the function learned with the predicted relevance is not far from being as good as a function trained with a large amount of editorial data. They further show that combining both types of information can lead to an even more accurate ranking function.</div>Aashkan