http://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&feed=atom&action=historystat441w18/Image Question Answering using CNN with Dynamic Parameter Prediction - Revision history2024-03-28T09:40:14ZRevision history for this page on the wikiMediaWiki 1.41.0http://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34710&oldid=prevY53zou: /* Hashing */2018-03-20T02:03:16Z<p><span dir="auto"><span class="autocomment">Hashing</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 22:03, 19 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l182">Line 182:</td>
<td colspan="2" class="diff-lineno">Line 182:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In a comparison-based model, <math>&Omega;(log(n))</math> time is required to search a size-''n'' dictionary. Direct addressing reduces the running time to ''O(1)'' by storing ''k'' keys, ''0 &le; k < M'', in an array of size ''M''. Hence the search operation is equivalent to indexing, which is constant assuming it takes a constant (and most likely negligible) amount of time to access any element in the array. Hashing is an improvement on direct addressing in the case when Direct Addressing can be space-inefficient with ''k << M''. For keys in the universe ''U'', a hash function ''h'' mapps the keys to an element from the set {1...M} -- i.e. <math>h(k) &isin; {1...M}</math>, <math>&forall; k &isin; U</math>. This allows for space-efficient direct addressing.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In a comparison-based model, <math>&Omega;(log(n))</math> time is required to search a size-''n'' dictionary. Direct addressing reduces the running time to ''O(1)'' by storing ''k'' keys, ''0 &le; k < M'', in an array of size ''M''. Hence the search operation is equivalent to indexing, which is constant assuming it takes a constant (and most likely negligible) amount of time to access any element in the array. Hashing is an improvement on direct addressing in the case when Direct Addressing can be space-inefficient with ''k << M''. For keys in the universe ''U'', a hash function ''h'' mapps the keys to an element from the set {1...M} -- i.e. <math>h(k) &isin; {1...M}</math>, <math>&forall; k &isin; U</math>. This allows for space-efficient direct addressing.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The hashing “trick” used for dimensionality reduction in this paper is a novel network architecture called <del style="font-weight: bold; text-decoration: none;">“HashNets”</del>, which as previously stated was published in [https://arxiv.org/pdf/1504.04788.pdf this 2015 paper]. It uses a low-cost hash function to randomly group connection weights into slots in the hash table, and all connections within the same slot share a single parameter value. Then, during the training process, these parameters are tuned to adjust to the HashNets weight sharing architecture with standard backpropagation. The hashing procedure introduces no additional memory overhead, and this strategy has been shown to substantially reduce the storage requirements for neural nets while mostly preserving generalization performance.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The hashing “trick” used for dimensionality reduction in this paper is a novel network architecture called <ins style="font-weight: bold; text-decoration: none;">“HashedNets”</ins>, which as previously stated was published in [https://arxiv.org/pdf/1504.04788.pdf this 2015 paper]. It uses a low-cost hash function to randomly group connection weights into slots in the hash table, and all connections within the same slot share a single parameter value. Then, during the training process, these parameters are tuned to adjust to the HashNets weight sharing architecture with standard backpropagation. The hashing procedure introduces no additional memory overhead, and this strategy has been shown to substantially reduce the storage requirements for neural nets while mostly preserving generalization performance.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td></tr>
</table>Y53zouhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34709&oldid=prevY53zou: /* Hashing */2018-03-20T02:01:19Z<p><span dir="auto"><span class="autocomment">Hashing</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 22:01, 19 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l175">Line 175:</td>
<td colspan="2" class="diff-lineno">Line 175:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Hashing ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Hashing ==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Generating the weight matrix in the dynamic parameter layers can be a computationally challenging task, since it depends on the number of parameters. Between the [https://en.wikipedia.org/wiki/Gated_recurrent_unit GRU] and the fully-connected layer in the parameter prediction network, we need quadratically more parameters in order to increase the dimensionality of the prediction layer’s output. All this means that the weight matrix dimensions would blow up during the network training process. The most straightforward way to reduce the size of ''W'' is to reduce the number <del style="font-weight: bold; text-decoration: none;">if </del>inputs. However, the network could be overfitted if we were to keep the weight matrix small by limiting the number of training examples. Hence the authors of this paper are using a recent hashing trick introduced by [https://arxiv.org/pdf/1504.04788.pdf Chen et al.] that “folds” the neural network.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Generating the weight matrix in the dynamic parameter layers can be a computationally challenging task, since it depends on the number of parameters. Between the [https://en.wikipedia.org/wiki/Gated_recurrent_unit GRU] and the fully-connected layer in the parameter prediction network, we need quadratically more parameters in order to increase the dimensionality of the prediction layer’s output. All this means that the weight matrix dimensions would blow up during the network training process. The most straightforward way to reduce the size of ''W'' is to reduce the number <ins style="font-weight: bold; text-decoration: none;">of </ins>inputs. However, the network could be overfitted if we were to keep the weight matrix small by limiting the number of training examples. Hence the authors of this paper are using a recent hashing trick introduced by [https://arxiv.org/pdf/1504.04788.pdf Chen et al.] that “folds” the neural network.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
</table>Y53zouhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34401&oldid=prevY53zou: /* Previous and Related Works */2018-03-16T00:54:12Z<p><span dir="auto"><span class="autocomment">Previous and Related Works</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:54, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l43">Line 43:</td>
<td colspan="2" class="diff-lineno">Line 43:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Context-free languages and regular languages, which are in the lowest two levels in the hierarchy, can be expressed and coded using [https://en.wikipedia.org/wiki/Pushdown_automaton pushdown automaton] and [https://en.wikipedia.org/wiki/Deterministic_finite_automaton deterministic finite automaton]. Visually, these two classes of languages can be expressed using [https://en.wikipedia.org/wiki/Parse_tree parse trees] and [https://en.wikipedia.org/wiki/State_diagram state diagrams]. This means that given grammar and state transition rules, we can express all strings generated by those rules if the language is context-free or regular.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Context-free languages and regular languages, which are in the lowest two levels in the hierarchy, can be expressed and coded using [https://en.wikipedia.org/wiki/Pushdown_automaton pushdown automaton] and [https://en.wikipedia.org/wiki/Deterministic_finite_automaton deterministic finite automaton]. Visually, these two classes of languages can be expressed using [https://en.wikipedia.org/wiki/Parse_tree parse trees] and [https://en.wikipedia.org/wiki/State_diagram state diagrams]. This means that given grammar and state transition rules, we can express all strings generated by those rules if the language is context-free or regular.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In contrast, English is not a regular language, thus cannot be represented using finite state automaton. While some argue that English is a context-free language and that grammatically correct sentences can be generated using parse trees, those sentences can easily be nonsensical due to the other important concern that is context. For instance, “ship breathes wallet” is a grammatically correct sentences but it has no practical meaning. Hence with our current knowledge in formal language theory, English cannot be represented using a finite set of rules and grammars. Even with more <del style="font-weight: bold; text-decoration: none;">recently </del>machine learning methods like word-to-vec and bag-of-words, Natural Language Processing is still a challenging task.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In contrast, English is not a regular language, thus cannot be represented using finite state automaton. While some argue that English is a context-free language and that grammatically correct sentences can be generated using parse trees, those sentences can easily be nonsensical due to the other important concern that is context. For instance, “ship breathes wallet” is a grammatically correct sentences but it has no practical meaning. Hence with our current knowledge in formal language theory, English cannot be represented using a finite set of rules and grammars. Even with more <ins style="font-weight: bold; text-decoration: none;">complex </ins>machine learning methods like word-to-vec and bag-of-words, Natural Language Processing is still a challenging task.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Mathematical Background =</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Mathematical Background =</div></td></tr>
</table>Y53zouhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34400&oldid=prevY53zou: /* Previous and Related Works */2018-03-16T00:53:50Z<p><span dir="auto"><span class="autocomment">Previous and Related Works</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:53, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l43">Line 43:</td>
<td colspan="2" class="diff-lineno">Line 43:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Context-free languages and regular languages, which are in the lowest two levels in the hierarchy, can be expressed and coded using [https://en.wikipedia.org/wiki/Pushdown_automaton pushdown automaton] and [https://en.wikipedia.org/wiki/Deterministic_finite_automaton deterministic finite automaton]. Visually, these two classes of languages can be expressed using [https://en.wikipedia.org/wiki/Parse_tree parse trees] and [https://en.wikipedia.org/wiki/State_diagram state diagrams]. This means that given grammar and state transition rules, we can express all strings generated by those rules if the language is context-free or regular.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Context-free languages and regular languages, which are in the lowest two levels in the hierarchy, can be expressed and coded using [https://en.wikipedia.org/wiki/Pushdown_automaton pushdown automaton] and [https://en.wikipedia.org/wiki/Deterministic_finite_automaton deterministic finite automaton]. Visually, these two classes of languages can be expressed using [https://en.wikipedia.org/wiki/Parse_tree parse trees] and [https://en.wikipedia.org/wiki/State_diagram state diagrams]. This means that given grammar and state transition rules, we can express all strings generated by those rules if the language is context-free or regular.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In contrast, English is not a regular language, thus cannot be represented using finite state automaton. While some argue that English is a context-free language and that grammatically correct sentences can be generated using parse trees, those sentences can easily be nonsensical due to the other important concern that is context. For instance, “ship breathes wallet” is a grammatically correct sentences but it has no practical meaning. Hence with our current knowledge in formal language theory, English cannot be represented using a finite set of rules and grammars.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In contrast, English is not a regular language, thus cannot be represented using finite state automaton. While some argue that English is a context-free language and that grammatically correct sentences can be generated using parse trees, those sentences can easily be nonsensical due to the other important concern that is context. For instance, “ship breathes wallet” is a grammatically correct sentences but it has no practical meaning. Hence with our current knowledge in formal language theory, English cannot be represented using a finite set of rules and grammars<ins style="font-weight: bold; text-decoration: none;">. Even with more recently machine learning methods like word-to-vec and bag-of-words, Natural Language Processing is still a challenging task</ins>.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Mathematical Background =</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Mathematical Background =</div></td></tr>
</table>Y53zouhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34316&oldid=prevY53zou: /* Presented by */2018-03-15T15:23:01Z<p><span dir="auto"><span class="autocomment">Presented by</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:23, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3">Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Presented by =</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Presented by =</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Rosie Zou, Xinkai Wei, Glen Chalatov, Ameer Dharamshi</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Rosie <ins style="font-weight: bold; text-decoration: none;">Yuyan </ins>Zou, Xinkai Wei, Glen Chalatov, Ameer Dharamshi</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Introduction =</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Introduction =</div></td></tr>
</table>Y53zouhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34315&oldid=prevX46wei: Name2018-03-15T15:22:50Z<p>Name</p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:22, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3">Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Presented by =</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Presented by =</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Rosie Zou, <del style="font-weight: bold; text-decoration: none;">Kye </del>Wei, Glen Chalatov, Ameer Dharamshi</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Rosie Zou, <ins style="font-weight: bold; text-decoration: none;">Xinkai </ins>Wei, Glen Chalatov, Ameer Dharamshi</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Introduction =</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Introduction =</div></td></tr>
</table>X46weihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34314&oldid=prevAdharams: /* Critique */2018-03-15T15:12:23Z<p><span dir="auto"><span class="autocomment">Critique</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:12, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l237">Line 237:</td>
<td colspan="2" class="diff-lineno">Line 237:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Critique=</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Critique=</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>One of the key criticisms of Image Q&A research in general is that the testing data sets are fairly small and may not be sufficient to have confidence in the model. In addition, the testing examples given are fairly <del style="font-weight: bold; text-decoration: none;">direct </del>questions. <del style="font-weight: bold; text-decoration: none;">We </del>do not have any sense of how the model scales to more complex questions that specify multiple conditions on requests.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>One of the key criticisms of Image Q&A research in general is that the testing data sets are fairly small and may not be sufficient to have confidence in the model. In addition, the testing examples given are fairly <ins style="font-weight: bold; text-decoration: none;">directed </ins>questions. <ins style="font-weight: bold; text-decoration: none;">While this model answers a wider variety and more complex questions than other models, we </ins>do not have any sense of how the model scales to <ins style="font-weight: bold; text-decoration: none;">even </ins>more complex questions that specify multiple conditions on requests<ins style="font-weight: bold; text-decoration: none;">. We do not know if/when the question complexity becomes too difficult for this model to answer</ins>.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The model itself takes a new approach to the Image Q&A problem that is far more intuitive than the traditional methods by incorporating the question in its parameters using the dynamic parameter prediction network. In addition to being more intuitive, it performs better than existing models. However, beyond the dynamic parameter prediction network, this model is not particularly original. The idea of removing the last layers of a VGG net and replacing them with layers suited to the specific task has been done many times over.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The model itself takes a new approach to the Image Q&A problem that is far more intuitive than the traditional methods by incorporating the question in its parameters using the dynamic parameter prediction network. In addition to being more intuitive, it performs better than existing models. However, beyond the dynamic parameter prediction network, this model is not particularly original. The idea of removing the last layers of a VGG net and replacing them with layers suited to the specific task has been done many times over.</div></td></tr>
</table>Adharamshttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34313&oldid=prevY53zou: /* Hashing */2018-03-15T15:11:05Z<p><span dir="auto"><span class="autocomment">Hashing</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:11, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l186">Line 186:</td>
<td colspan="2" class="diff-lineno">Line 186:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Let <math>w^d_{mn}</math> be the element at position <math>(m,n)</math> in the weight matrix <math>W_d(q)</math>, which corresponds to the weight between the ''m''-th output neuron and the ''n''-th input neuron. The hashing function used in this paper is defined to be <math>&psi;(m,n) : (m,n) &rarr;</math> <math>\{1...K\}</math>, where <math>K = dim(p)</math>. More specifically: <math>w^d_{mn} = p_{&psi;(m,n)} &bull; &epsilon;(m,n)</math>, where <math>&epsilon;(m,n): N &times; N &rarr;<del style="font-weight: bold; text-decoration: none;"></math> </del>{+1, -1} is a second, independent hashing function used to remove the bias of the hashed inner product (more details on this bias term can be found in the original paper by Chen et al.).</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Let <math>w^d_{mn}</math> be the element at position <math>(m,n)</math> in the weight matrix <math>W_d(q)</math>, which corresponds to the weight between the ''m''-th output neuron and the ''n''-th input neuron. The hashing function used in this paper is defined to be <math>&psi;(m,n) : (m,n) &rarr;</math> <math>\{1...K\}</math>, where <math>K = dim(p)</math>. More specifically: <math>w^d_{mn} = p_{&psi;(m,n)} &bull; &epsilon;(m,n)</math>, where <math>&epsilon;(m,n): N &times; N &rarr; <ins style="font-weight: bold; text-decoration: none;">\</ins>{+1, -1<ins style="font-weight: bold; text-decoration: none;">\</ins>}<ins style="font-weight: bold; text-decoration: none;"></math> </ins>is a second, independent hashing function used to remove the bias of the hashed inner product (more details on this bias term can be found in the original paper by Chen et al.).</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The number of free parameters is reduced during this hashing process, which the authors believed to be reasonable since there are many redundant parameters in deep neural networks (as shown in [https://arxiv.org/pdf/1306.0543.pdf this 2013 NIPS paper]) and that it is possible to parameterize a network using a smaller set of candidate weights. Once these candidate weights have been computed, we simply map them to the dynamic parameter layer’s weight matrix positions by applying the hash function to the weight matrix positions.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The number of free parameters is reduced during this hashing process, which the authors believed to be reasonable since there are many redundant parameters in deep neural networks (as shown in [https://arxiv.org/pdf/1306.0543.pdf this 2013 NIPS paper]) and that it is possible to parameterize a network using a smaller set of candidate weights. Once these candidate weights have been computed, we simply map them to the dynamic parameter layer’s weight matrix positions by applying the hash function to the weight matrix positions.</div></td></tr>
</table>Y53zouhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34312&oldid=prevY53zou: /* Hashing */2018-03-15T15:10:47Z<p><span dir="auto"><span class="autocomment">Hashing</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:10, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l186">Line 186:</td>
<td colspan="2" class="diff-lineno">Line 186:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Let <math>w^d_{mn}</math> be the element at position <math>(m,n)</math> in the weight matrix <math>W_d(q)</math>, which corresponds to the weight between the ''m''-th output neuron and the ''n''-th input neuron. The hashing function used in this paper is defined to be <math>&psi;(m,n) : (m,n) &rarr;</math> {1...K}, where <math>K = dim(p)</math>. More specifically: <math>w^d_{mn} = p_{&psi;(m,n)} &bull; &epsilon;(m,n)</math>, where <math>&epsilon;(m,n): N &times; N &rarr;</math> {+1, -1} is a second, independent hashing function used to remove the bias of the hashed inner product (more details on this bias term can be found in the original paper by Chen et al.).</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Let <math>w^d_{mn}</math> be the element at position <math>(m,n)</math> in the weight matrix <math>W_d(q)</math>, which corresponds to the weight between the ''m''-th output neuron and the ''n''-th input neuron. The hashing function used in this paper is defined to be <math>&psi;(m,n) : (m,n) &rarr;</math> <ins style="font-weight: bold; text-decoration: none;"><math>\</ins>{1...K<ins style="font-weight: bold; text-decoration: none;">\</ins>}<ins style="font-weight: bold; text-decoration: none;"></math></ins>, where <math>K = dim(p)</math>. More specifically: <math>w^d_{mn} = p_{&psi;(m,n)} &bull; &epsilon;(m,n)</math>, where <math>&epsilon;(m,n): N &times; N &rarr;</math> {+1, -1} is a second, independent hashing function used to remove the bias of the hashed inner product (more details on this bias term can be found in the original paper by Chen et al.).</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The number of free parameters is reduced during this hashing process, which the authors believed to be reasonable since there are many redundant parameters in deep neural networks (as shown in [https://arxiv.org/pdf/1306.0543.pdf this 2013 NIPS paper]) and that it is possible to parameterize a network using a smaller set of candidate weights. Once these candidate weights have been computed, we simply map them to the dynamic parameter layer’s weight matrix positions by applying the hash function to the weight matrix positions.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The number of free parameters is reduced during this hashing process, which the authors believed to be reasonable since there are many redundant parameters in deep neural networks (as shown in [https://arxiv.org/pdf/1306.0543.pdf this 2013 NIPS paper]) and that it is possible to parameterize a network using a smaller set of candidate weights. Once these candidate weights have been computed, we simply map them to the dynamic parameter layer’s weight matrix positions by applying the hash function to the weight matrix positions.</div></td></tr>
</table>Y53zouhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441w18/Image_Question_Answering_using_CNN_with_Dynamic_Parameter_Prediction&diff=34311&oldid=prevY53zou: /* Hashing */2018-03-15T15:10:16Z<p><span dir="auto"><span class="autocomment">Hashing</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="us">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 11:10, 15 March 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l180">Line 180:</td>
<td colspan="2" class="diff-lineno">Line 180:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A quick background on hashing and direct addressing: </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A quick background on hashing and direct addressing: </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In a comparison-based model, <math>&Omega;(log(n))</math> time is required to search a size-''n'' dictionary. Direct addressing reduces the running time to ''O(1)'' by storing ''k'' keys, ''0 &le; k < M'', in an array of size ''M''. Hence the search operation is equivalent to indexing, which is constant assuming it takes a constant (and most likely negligible) amount of time to access any element in the array. Hashing is an improvement on direct addressing in the case when Direct Addressing can be space-inefficient with ''k << M''. For keys in the universe ''U'', a hash function ''h'' mapps the keys to an element from the set {1...M} -- i.e. <math>h(k) &isin; {1...M}</math>, <math>&forall; <del style="font-weight: bold; text-decoration: none;">''</del>k &isin; U</math>. This allows for space-efficient direct addressing.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In a comparison-based model, <math>&Omega;(log(n))</math> time is required to search a size-''n'' dictionary. Direct addressing reduces the running time to ''O(1)'' by storing ''k'' keys, ''0 &le; k < M'', in an array of size ''M''. Hence the search operation is equivalent to indexing, which is constant assuming it takes a constant (and most likely negligible) amount of time to access any element in the array. Hashing is an improvement on direct addressing in the case when Direct Addressing can be space-inefficient with ''k << M''. For keys in the universe ''U'', a hash function ''h'' mapps the keys to an element from the set {1...M} -- i.e. <math>h(k) &isin; {1...M}</math>, <math>&forall; k &isin; U</math>. This allows for space-efficient direct addressing.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The hashing “trick” used for dimensionality reduction in this paper is a novel network architecture called “HashNets”, which as previously stated was published in [https://arxiv.org/pdf/1504.04788.pdf this 2015 paper]. It uses a low-cost hash function to randomly group connection weights into slots in the hash table, and all connections within the same slot share a single parameter value. Then, during the training process, these parameters are tuned to adjust to the HashNets weight sharing architecture with standard backpropagation. The hashing procedure introduces no additional memory overhead, and this strategy has been shown to substantially reduce the storage requirements for neural nets while mostly preserving generalization performance.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The hashing “trick” used for dimensionality reduction in this paper is a novel network architecture called “HashNets”, which as previously stated was published in [https://arxiv.org/pdf/1504.04788.pdf this 2015 paper]. It uses a low-cost hash function to randomly group connection weights into slots in the hash table, and all connections within the same slot share a single parameter value. Then, during the training process, these parameters are tuned to adjust to the HashNets weight sharing architecture with standard backpropagation. The hashing procedure introduces no additional memory overhead, and this strategy has been shown to substantially reduce the storage requirements for neural nets while mostly preserving generalization performance.</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l186">Line 186:</td>
<td colspan="2" class="diff-lineno">Line 186:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Weight sharing is implemented in the model by allowing a single parameter in the candidate weight vector ''p'' to be shared by multiple elements of <math>W_d(q)</math>. This can be done using a pre-defined hash function that converts the 2-dimensional location in <math>W_d(q)</math> to a 1-dimensional index in ''p''. As mentioned in Chen et al.’s paper, this dimensionality reduction technique maintains the accuracy of the network.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Let <math>w^d_{mn}</math> be the element at position <math>(m,n)</math> in the weight matrix <math>W_d(q)</math>, which corresponds to the weight between the ''m''-th output neuron and the ''n''-th input neuron. The hashing function used in this paper is defined to be <math>&psi;(m,n) : (m,n) <del style="font-weight: bold; text-decoration: none;">-</del>> {1...K}<del style="font-weight: bold; text-decoration: none;"></math></del>, where <math>K = dim(p)</math>. More specifically: <math>w^d_{mn} = p_{&psi;(m,n)} &bull; &epsilon;(m,n)</math>, where <math>&epsilon;(m,n): N &times; N &rarr; {+1, -1}<del style="font-weight: bold; text-decoration: none;"></math> </del>is a second, independent hashing function used to remove the bias of the hashed inner product (more details on this bias term can be found in the original paper by Chen et al.).</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Let <math>w^d_{mn}</math> be the element at position <math>(m,n)</math> in the weight matrix <math>W_d(q)</math>, which corresponds to the weight between the ''m''-th output neuron and the ''n''-th input neuron. The hashing function used in this paper is defined to be <math>&psi;(m,n) : (m,n) <ins style="font-weight: bold; text-decoration: none;">&rarr;</math</ins>> {1...K}, where <math>K = dim(p)</math>. More specifically: <math>w^d_{mn} = p_{&psi;(m,n)} &bull; &epsilon;(m,n)</math>, where <math>&epsilon;(m,n): N &times; N &rarr;<ins style="font-weight: bold; text-decoration: none;"></math> </ins>{+1, -1} is a second, independent hashing function used to remove the bias of the hashed inner product (more details on this bias term can be found in the original paper by Chen et al.).</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The number of free parameters is reduced during this hashing process, which the authors believed to be reasonable since there are many redundant parameters in deep neural networks (as shown in [https://arxiv.org/pdf/1306.0543.pdf this 2013 NIPS paper]) and that it is possible to parameterize a network using a smaller set of candidate weights. Once these candidate weights have been computed, we simply map them to the dynamic parameter layer’s weight matrix positions by applying the hash function to the weight matrix positions.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The number of free parameters is reduced during this hashing process, which the authors believed to be reasonable since there are many redundant parameters in deep neural networks (as shown in [https://arxiv.org/pdf/1306.0543.pdf this 2013 NIPS paper]) and that it is possible to parameterize a network using a smaller set of candidate weights. Once these candidate weights have been computed, we simply map them to the dynamic parameter layer’s weight matrix positions by applying the hash function to the weight matrix positions.</div></td></tr>
</table>Y53zou