http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=Mbayati&feedformat=atomstatwiki - User contributions [US]2024-03-29T08:22:12ZUser contributionsMediaWiki 1.41.0http://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38958Convolutional Neural Networks for Sentence Classification2018-11-14T02:49:50Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.Malekmohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
In this paper, sentence classification using convolutioanl neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
<br />
= The Used Models =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are m different filters (kernels) each resulting in one feature map. The resulting m feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following: \newline<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. <br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. <br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation.<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
[[File:two.PNG|700px|thumb|center]]<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. <br />
<br />
= Results = <br />
<br />
The authors have experimented the model on various benchmarks. The following figure shows a summary of the used data sets:<br />
<br />
[[File:data.PNG|700px|thumb|center]]<br />
<br />
The columns after the first data column show number of target classes, average sentence length, data set size, vocabulary size, number of words present in the set of pre-trained word2vec vectors and test size respectively. <br />
<br />
The results obtained from the models introduced above and other methods in the literature are shown in the following figure:<br />
<br />
[[File:results.PNG|700px|thumb|center]]<br />
<br />
<br />
The obtained results show that the baseline model with randomly initialized word vectors does not perform well on its own. Althoug it is surprising that the accuracy rate obtained from the baseline model is not that far from the CNN-static which uses the pre-trained word vectors. Also, we can observe the gain obtained from using CNN: even a simple model using CNN with static word2vec vectors performs remarkably well such that its results are competitive against those of more sophisticated deep learning models. Also, CNN-nonstatic which tunes the word vectors for each task gives more improvement. Despite of our expectation to have a better performance for the CNN-multichannel (compared to the CNN-non-static and CNN-static) the obtained results are mixed. The authors have claimed that by regularizing the fine tuning process of task specific vectors the performance of CNN-multichannel can get improved.<br />
<br />
=Conclusion=<br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
=References=<br />
<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38953Convolutional Neural Networks for Sentence Classification2018-11-14T02:45:40Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.Malekmohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
In this paper, sentence classification using convolutioanl neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
<br />
= The Used Models =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are m different filters (kernels) each resulting in one feature map. The resulting m feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following: \newline<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. <br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. <br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation.<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
[[File:two.PNG|700px|thumb|center]]<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. <br />
<br />
= Results = <br />
<br />
The authors have experimented the model on various benchmarks. The following figure shows a summary of the used data sets:<br />
<br />
[[File:data.PNG|700px|thumb|center]]<br />
<br />
The columns after the first data column show number of target classes, average sentence length, data set size, vocabulary size, number of words present in the set of pre-trained word2vec vectors and test size respectively. <br />
<br />
The results obtained from the models introduced above and other methods in the literature are shown in the following figure:<br />
<br />
[[File:results.PNG|700px|thumb|center]]<br />
<br />
<br />
The obtained results show that the baseline model with randomly initialized word vectors does not perform well on its own. Althoug it is surprising that the accuracy rate obtained from the baseline model is not that far from the CNN-static which uses the pre-trained word vectors. Also, we can observe the gain obtained from using CNN: even a simple model using CNN with static word2vec vectors performs remarkably well such that its results are competitive against those of more sophisticated deep learning models. Also, CNN-nonstatic which tunes the word vectors for each task gives more improvement. Despite of our expectation to have a better performance for the CNN-multichannel (compared to the CNN-non-static and CNN-static) the obtained results are mixed. The authors have claimed that by regularizing the fine tuning process of task specific vectors the performance of CNN-multichannel can get improved.<br />
<br />
=Conclusion=<br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:results.PNG&diff=38952File:results.PNG2018-11-14T02:43:17Z<p>Mbayati: </p>
<hr />
<div></div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:data.PNG&diff=38951File:data.PNG2018-11-14T02:43:04Z<p>Mbayati: </p>
<hr />
<div></div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38950Convolutional Neural Networks for Sentence Classification2018-11-14T02:42:36Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.Malekmohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
In this paper, sentence classification using convolutioanl neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
<br />
= The Used Models =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are m different filters (kernels) each resulting in one feature map. The resulting m feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following: \newline<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. <br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. <br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation.<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
[[File:two.PNG|700px|thumb|center]]<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. <br />
<br />
= Results = <br />
<br />
The authors have experimented the model on various benchmarks. The following figure shows a summary of the used data sets:<br />
<br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38906Convolutional Neural Networks for Sentence Classification2018-11-14T01:09:20Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.Malekmohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are m different filters (kernels) each resulting in one feature map. The resulting m feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks.<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. <br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
[[File:two.PNG|700px|thumb|center]]<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets.<br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38903Convolutional Neural Networks for Sentence Classification2018-11-14T01:00:09Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.Malekmohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks.<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
[[File:two.PNG|700px|thumb|center]]<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets.<br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38902Convolutional Neural Networks for Sentence Classification2018-11-14T00:59:35Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.MalekMohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks.<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
[[File:two.PNG|700px|thumb|center]]<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets.<br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=38901stat441F182018-11-14T00:57:58Z<p>Mbayati: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
<br />
= Record your contributions here [https://docs.google.com/spreadsheets/d/10CHiJpAylR6kB9QLqN7lZHN79D9YEEW6CDTH27eAhbQ/edit?usp=sharing]=<br />
<br />
Use the following notations:<br />
<br />
P: You have written a summary/critique on the paper.<br />
<br />
T: You had a technical contribution on a paper (excluding the paper that you present).<br />
<br />
E: You had an editorial contribution on a paper (excluding the paper that you present).<br />
<br />
<br />
<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Memory-Based Parameter Adaptation || [https://arxiv.org/pdf/1802.10542.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Memory-Based_Parameter_Adaptation#Incremental_Learning Summary] ||<br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going Deeper with Convolutions ||[https://arxiv.org/pdf/1409.4842.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary]<br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Topic Compositional Neural Language Model|| [https://arxiv.org/pdf/1712.09783.pdf paper] || <br />
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18/TCNLM Summary]<br />
|-<br />
|Nov 15 || Zhaoran Hou, Pei Wei Wang, Chi Zhang, Yiming Li, Daoyi Chen, Ying Chi|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797 Paper] || <br />
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/ Summary]<br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek, Brendan Ross, Jon Barenboim, Junqiao Lin, James Bootsma || 5|| A Neural Representation of Sketch Drawings || [https://arxiv.org/pdf/1704.03477.pdf Paper] || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classification || [https://arxiv.org/pdf/1408.5882.pdf paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation summary]|| <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Residual_Learning_for_Image_Recognition Summary]<br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song, Yongqi Dong || 11|| Towards Deep Learning Models Resistant to Adversarial Attacks || [https://arxiv.org/pdf/1706.06083.pdf Paper] || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||<br />
|-<br />
|Nov 28 || Hudson Ash, Stephen Kingston, Richard Zhang, Alexandre Xiao, Ziqiu Zhu || 13 || Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness || [https://arxiv.org/pdf/1608.05842.pdf Paper] ||<br />
|-<br />
|Nov 21 || Frank Jiang, Yuan Zhang, Jerry Hu || 14 || Distributed Representations of Words and Phrases and their Compositionality || [https://arxiv.org/pdf/1310.4546.pdf Paper] ||<br />
|-<br />
|Nov 21 || Yu Xuan Lee, Tsen Yee Heng || 15 || Gradient Episodic Memory for Continual Learning || [http://papers.nips.cc/paper/7225-gradient-episodic-memory-for-continual-learning.pdf Paper] ||<br />
|-<br />
|Makeup || || || || ||</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=38900stat441F182018-11-14T00:56:39Z<p>Mbayati: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
<br />
= Record your contributions here [https://docs.google.com/spreadsheets/d/10CHiJpAylR6kB9QLqN7lZHN79D9YEEW6CDTH27eAhbQ/edit?usp=sharing]=<br />
<br />
Use the following notations:<br />
<br />
P: You have written a summary/critique on the paper.<br />
<br />
T: You had a technical contribution on a paper (excluding the paper that you present).<br />
<br />
E: You had an editorial contribution on a paper (excluding the paper that you present).<br />
<br />
<br />
<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Memory-Based Parameter Adaptation || [https://arxiv.org/pdf/1802.10542.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Memory-Based_Parameter_Adaptation#Incremental_Learning Summary] ||<br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going Deeper with Convolutions ||[https://arxiv.org/pdf/1409.4842.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary]<br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Topic Compositional Neural Language Model|| [https://arxiv.org/pdf/1712.09783.pdf paper] || <br />
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18/TCNLM Summary]<br />
|-<br />
|Nov 15 || Zhaoran Hou, Pei Wei Wang, Chi Zhang, Yiming Li, Daoyi Chen, Ying Chi|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797 Paper] || <br />
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/ Summary]<br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek, Brendan Ross, Jon Barenboim, Junqiao Lin, James Bootsma || 5|| A Neural Representation of Sketch Drawings || [https://arxiv.org/pdf/1704.03477.pdf Paper] || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classification || [https://arxiv.org/pdf/1408.5882.pdf paper] [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation/ summary]|| <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Residual_Learning_for_Image_Recognition Summary]<br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song, Yongqi Dong || 11|| Towards Deep Learning Models Resistant to Adversarial Attacks || [https://arxiv.org/pdf/1706.06083.pdf Paper] || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||<br />
|-<br />
|Nov 28 || Hudson Ash, Stephen Kingston, Richard Zhang, Alexandre Xiao, Ziqiu Zhu || 13 || Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness || [https://arxiv.org/pdf/1608.05842.pdf Paper] ||<br />
|-<br />
|Nov 21 || Frank Jiang, Yuan Zhang, Jerry Hu || 14 || Distributed Representations of Words and Phrases and their Compositionality || [https://arxiv.org/pdf/1310.4546.pdf Paper] ||<br />
|-<br />
|Nov 21 || Yu Xuan Lee, Tsen Yee Heng || 15 || Gradient Episodic Memory for Continual Learning || [http://papers.nips.cc/paper/7225-gradient-episodic-memory-for-continual-learning.pdf Paper] ||<br />
|-<br />
|Makeup || || || || ||</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38898Convolutional Neural Networks for Sentence Classification2018-11-14T00:53:04Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.MalekMohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
[[File:two.PNG|700px|thumb|center]]<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38897Convolutional Neural Networks for Sentence Classification2018-11-14T00:52:20Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.MalekMohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center]]<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{two.PNG}<br />
\caption{The second introduced model structure with static and non-static channels \label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38896Convolutional Neural Networks for Sentence Classification2018-11-14T00:51:37Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.MalekMohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
[[File:one.PNG|700px|thumb|center.]]<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{two.PNG}<br />
\caption{The second introduced model structure with static and non-static channels \label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:one.PNG&diff=38894File:one.PNG2018-11-14T00:50:07Z<p>Mbayati: </p>
<hr />
<div></div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:two.PNG&diff=38893File:two.PNG2018-11-14T00:49:50Z<p>Mbayati: </p>
<hr />
<div></div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38892Convolutional Neural Networks for Sentence Classification2018-11-14T00:47:03Z<p>Mbayati: </p>
<hr />
<div>This is a summary of the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.MalekMohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{one.PNG}<br />
\caption{The first introduced model structure with static channel\label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{two.PNG}<br />
\caption{The second introduced model structure with static and non-static channels \label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38891Convolutional Neural Networks for Sentence Classification2018-11-14T00:46:29Z<p>Mbayati: </p>
<hr />
<div>This is a summary based on the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
= Presented by = <br />
*M.Bayati<br />
*S.MalekMohammadi<br />
*V.Luong<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{one.PNG}<br />
\caption{The first introduced model structure with static channel\label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{two.PNG}<br />
\caption{The second introduced model structure with static and non-static channels \label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38890Convolutional Neural Networks for Sentence Classification2018-11-14T00:44:06Z<p>Mbayati: </p>
<hr />
<div>This is a summary based on the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim <sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
<br />
= Introduction =<br />
<br />
In this paper, sentence classification using convolutional neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
<br />
= The Used Model and Results =<br />
<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{one.PNG}<br />
\caption{The first introduced model structure with static channel\label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following:<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{two.PNG}<br />
\caption{The second introduced model structure with static and non-static channels \label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
= Conclusion = <br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38889Convolutional Neural Networks for Sentence Classification2018-11-14T00:40:17Z<p>Mbayati: </p>
<hr />
<div>This is a summary based on the paper, Convolutional Neural Networks for Sentence Classification by Yoon Kim.<sup>[[#References|[1]]]</sup>.<br />
<br />
<br />
\begin{document}<br />
<br />
\maketitle<br />
<br />
\section{Introduction}\label{intro}<br />
In this paper, sentence classification using convolutioanl neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
<br />
\section{The Used Model and Results }<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{one.PNG}<br />
\caption{The first introduced model structure with static channel\label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following: \newline<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{two.PNG}<br />
\caption{The second introduced model structure with static and non-static channels \label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
\section{Conclusion}<br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
=References=<br />
* <sup>[1]</sup>Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10.3115/v1/d14-1181.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F18-STAT841-Proposal&diff=38885F18-STAT841-Proposal2018-11-14T00:25:54Z<p>Mbayati: </p>
<hr />
<div><br />
'''Use this format (Don’t remove Project 0)'''<br />
<br />
'''Project # 0'''<br />
Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
'''Title:''' Making a String Telephone<br />
<br />
'''Description:''' We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 1'''<br />
Group members:<br />
<br />
Weng, Jiacheng<br />
<br />
Li, Keqi<br />
<br />
Qian, Yi<br />
<br />
Liu, Bomeng<br />
<br />
'''Title:''' RSNA Pneumonia Detection Challenge<br />
<br />
'''Description:''' <br />
<br />
Our team’s project is the RSNA Pneumonia Detection Challenge from Kaggle competition. The primary goal of this project is to develop a machine learning tool to detect patients with pneumonia based on their chest radiographs (CXR). <br />
<br />
Pneumonia is an infection that inflames the air sacs in human lungs which has symptoms such as chest pain, cough, and fever [1]. Pneumonia can be very dangerous especially to infants and elders. In 2015, 920,000 children under the age of 5 died from this disease [2]. Due to its fatality to children, diagnosing pneumonia has a high order. A common method of diagnosing pneumonia is to obtain patients’ chest radiograph (CXR) which is a gray-scale scan image of patients’ chests using x-ray. The infected region due to pneumonia usually shows as an area or areas of increased opacity [3] on CXR. However, many other factors can also contribute to increase in opacity on CXR which makes the diagnose very challenging. The diagnose also requires highly-skilled clinicians and a lot of time of CXR screening. The Radiological Society of North America (RSNA®) sees the opportunity of using machine learning to potentially accelerate the initial CXR screening process. <br />
<br />
For the scope of this project, our team plans to contribute to solving this problem by applying our machine learning knowledge in image processing and classification. Team members are going to apply techniques that include, but are not limited to: logistic regression, random forest, SVM, kNN, CNN, etc., in order to successfully detect CXRs with pneumonia.<br />
<br />
<br />
[1] (Accessed 2018, Oct. 4). Pneumonia [Online]. MAYO CLINIC. Available from: https://www.mayoclinic.org/diseases-conditions/pneumonia/symptoms-causes/syc-20354204<br />
[2] (Accessed 2018, Oct. 4). RSNA Pneumonia Detection Challenge [Online]. Kaggle. Available from: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge<br />
[3] Franquet T. Imaging of community-acquired pneumonia. J Thorac Imaging 2018 (epub ahead of print). PMID 30036297<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 2'''<br />
Group members:<br />
<br />
Hou, Zhaoran<br />
<br />
Zhang, Chi<br />
<br />
'''Title:''' <br />
<br />
'''Description:'''<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 3'''<br />
Group members:<br />
<br />
Hanzhen Yang<br />
<br />
Jing Pu Sun<br />
<br />
Ganyuan Xuan<br />
<br />
Yu Su<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:'''<br />
<br />
Our team chose the [https://www.kaggle.com/c/quickdraw-doodle-recognition Quick, Draw! Doodle Recognition Challenge] from the Kaggle Competition. The goal of the competition is to build an image recognition tool that can classify hand-drawn doodles into one of the 340 categories.<br />
<br />
The main challenge of the project remains in the training set being very noisy. Hand-drawn artwork may deviate substantially from the actual object, and is almost definitively different from person to person. Mislabeled images also present a problem since they will create outlier points when we train our models. <br />
<br />
We plan on learning more about some of the currently mature image recognition algorithms to inspire and develop our own model.<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 4'''<br />
Group members:<br />
<br />
Snaith, Mitchell<br />
<br />
'''Title:''' Reproducibility report: *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks*<br />
<br />
'''Description:''' <br />
<br />
The paper *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks* [1] has been submitted to ICLR 2019. It aims to "fix" variational Bayes and turn it into a robust inference tool through two innovations. <br />
<br />
Goals are to: <br />
<br />
- reproduce the deterministic variational inference scheme as described in the paper without referencing the original author's code, providing a 3rd party implementation<br />
<br />
- reproduce experiment results with own implementation, using the same NN framework for reference implementations of compared methods described in the paper<br />
<br />
- reproduce experiment results with the author's own implementation<br />
<br />
- explore other possible applications of variational Bayes besides heteroscedastic regression<br />
<br />
[1] OpenReview location: https://openreview.net/forum?id=B1l08oAct7<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 5'''<br />
Group members:<br />
<br />
Rebecca, Chen<br />
<br />
Susan,<br />
<br />
Mike, Li<br />
<br />
Ted, Wang<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' <br />
<br />
Classification has become a more and more eye-catching, especially with the rise of machine learning in these years. Our team is particularly interested in machine learning algorithms that optimize some specific type image classification. <br />
<br />
In this project, we will dig into base classifiers we learnt from the class and try to cook them together to find an optimal solution for a certain type images dataset. Currently, we are looking into a dataset from Kaggle: Quick, Draw! Doodle Recognition Challenge. The dataset in this competition contains 50M drawings among 340 categories and is the subset of the world’s largest doodling dataset and the doodling dataset is updating by real drawing game players. Anyone can contribution by joining it! (quickdraw.withgoogle.com).<br />
<br />
For us, as machine learning students, we are more eager to help getting a better classification method. By “better”, we mean find a balance between simplify and accuracy. We will start with neural network via different activation functions in each layer and we will also combine base classifiers with bagging, random forest, boosting for ensemble learning. Also, we will try to regulate our parameters to avoid overfitting in training dataset. Last, we will summary features of this type image dataset, formulate our solutions and standardize our steps to solve this kind problems <br />
<br />
Hopefully, we can not only finish our project successfully, but also make a little contribution to machine learning research field.<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 6'''<br />
Group members:<br />
<br />
Ngo, Jameson<br />
<br />
Xu, Amy<br />
<br />
'''Title:''' Kaggle Challenge: [https://www.kaggle.com/c/quickdraw-doodle-recognition Quick, Draw! Doodle Recognition ]<br />
<br />
'''Description:''' <br />
<br />
We will participate in the Quick, Draw! Doodle Reconigtion competition featured on Kaggle. We will classify doodles based on the images given from a game.<br />
<br />
These images may be incomplete or mislabeled, so we would need to find a way to effectively ignore/solve these issues in order to correctly classify them.<br />
<br />
--------------------------------------------------------------------<br />
'''Project # 7'''<br />
Group members:<br />
<br />
Qianying Zhao<br />
<br />
Hui Huang<br />
<br />
Meiyu Zhou<br />
<br />
Gezhou Zhang<br />
<br />
'''Title:''' Google Analytics Customer Revenue Prediction<br />
<br />
'''Description:''' <br />
Our group will participate in the featured Kaggle competition of Google Analytics Customer Revenue Prediction. In this competition, we will analyze customer dataset from a Google Merchandise Store selling swags to predict revenue per customer using Rstudio. Our presentation report will include not only how we've concluded by classifying and analyzing provided data with appropriate models, but also how we performed in the contest.<br />
<br />
--------------------------------------------------------------------<br />
'''Project # 8'''<br />
Group members:<br />
<br />
Jiayue Zhang<br />
<br />
Lingyun Yi<br />
<br />
Rongrong Su<br />
<br />
Siao Chen<br />
<br />
<br />
'''Title:''' Kaggle--Two Sigma: Using News to Predict Stock Movements<br />
<br />
<br />
'''Description:''' <br />
Stock price is affected by the news to some extent. What is the news influence on stock price and what is the predicted power of the news? <br />
What we are going to do is to use the content of news to predict the tendency of stock price. We will mine the data, finding the useful information behind the big data. As the result we will predict the stock price performance when market faces news.<br />
<br />
<br />
--------------------------------------------------------------------<br />
'''Project # 9'''<br />
Group members:<br />
<br />
Hassan, Ahmad Nayar<br />
<br />
McLellan, Isaac<br />
<br />
Brewster, Kristi<br />
<br />
Melek, Marina Medhat Rassmi <br />
<br />
<br />
'''Title:''' Quick, Draw! Doodle Recognition<br />
<br />
'''Description:''' <br />
<br />
'''Background'''<br />
<br />
Google’s Quick, Draw! is an online game where a user is prompted to draw an image depicting a certain category in under 20 seconds. As the drawing is being completed, the game uses a model which attempts to correctly identify the image being drawn. With the aim to improve the underlying pattern recognition model this game uses, Google is hosting a Kaggle competition asking the public to build a model to correctly identify a given drawing. The model should classify the drawing into one of the 340 label categories within the Quick, Draw! Game in 3 guesses or less.<br />
<br />
'''Proposed Approach'''<br />
<br />
Each image/doodle (input) is considered as a matrix of pixel values. In order to classify images, we need to essentially reshape an images’ respective matrix of pixel values - convolution. This would reduce the dimensionality of the input significantly which in turn reduces the number of parameters of any proposed recognition model. Using filters, pooling layers and further convolution, a final layer called the fully connected layer is used to correlate images with categories, assigning probabilities (weights) and hence classifying images. <br />
<br />
This approach to image classification is called convolution neural network (CNN) and we propose using this to classify the doodles within the Quick, Draw! Dataset.<br />
<br />
To control overfitting and underfitting of our proposed model and minimizing the error, we will use different architectures consisting of different types and dimensions of pooling layers and input filters.<br />
<br />
'''Challenges'''<br />
<br />
This project presents a number of interesting challenges:<br />
* The data given for training is noisy in that it contains drawings that are incomplete or simply poorly drawn. Dealing with this noise will be a significant part of our work. <br />
* There are 340 label categories within the Quick, Draw! dataset, this means that the model created must be able to classify drawings based on a large pool of information while making effective use of powerful computational resources.<br />
<br />
'''Tools & Resources'''<br />
<br />
* We will use Python & MATLAB.<br />
* We will use the Quick, Draw! Dataset available on the Kaggle competition website. <https://www.kaggle.com/c/quickdraw-doodle-recognition/data><br />
<br />
--------------------------------------------------------------------<br />
'''Project # 10'''<br />
Group members:<br />
<br />
Lam, Amanda<br />
<br />
Huang, Xiaoran<br />
<br />
Chu, Qi<br />
<br />
Sang, Di<br />
<br />
'''Title:''' Kaggle Competition: Human Protein Atlas Image Classification<br />
<br />
'''Description:'''<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 11'''<br />
Group members:<br />
<br />
Bobichon, Philomene<br />
<br />
Maheshwari, Aditya<br />
<br />
An, Zepeng<br />
<br />
Stranc, Colin<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' <br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 12'''<br />
Group members:<br />
<br />
Huo, Qingxi<br />
<br />
Yang, Yanmin<br />
<br />
Cai, Yuanjing<br />
<br />
Wang, Jiaqi<br />
<br />
'''Title:''' <br />
<br />
'''Description:''' <br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 13'''<br />
Group members:<br />
<br />
Ross, Brendan<br />
<br />
Barenboim, Jon<br />
<br />
Lin, Junqiao<br />
<br />
Bootsma, James<br />
<br />
'''Title:''' Expanding Neural Netwrok<br />
<br />
'''Description:''' The goal of our project is to create an expanding neural network algorithm which starts off by training a small neural network then expands it to a larger one. We hypothesize that with the proper expansion method we could decrease training time and prevent overfitting. The method we wish to explore is to link together input dimensions based on covariance. Then when the neural network reaches convergence we create a larger neural network without the links between dimensions and using starting values from the smaller neural network. <br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 14'''<br />
Group members:<br />
<br />
Schneider, Jason <br />
<br />
Walton, Jordyn <br />
<br />
Abbas, Zahraa<br />
<br />
Na, Andrew<br />
<br />
'''Title:''' Application of ML Classification to Cancer Identification<br />
<br />
'''Description:''' The application of machine learning to cancer classification based on gene expression is a topic of great interest to physicians and biostatisticians alike. We would like to work on this for our final project to encourage the application of proven ML techniques to improve accuracy of cancer classification and diagnosis. In this project, we will use the dataset from Golub et al. [1] which contains data on gene expression on tumour biopsies to train a model and classify healthy individuals and individuals who have cancer.<br />
<br />
One challenge we may face pertains to the way that the data was collected. Some parts of the dataset have thousands of features (which each represent a quantitative measure of the expression of a certain gene) but as few as twenty samples. We propose some ways to mitigate the impact of this; including the use of PCA, leave-one-out cross validation, or regularization. <br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 15'''<br />
Group members:<br />
<br />
Praneeth, Sai<br />
<br />
Peng, Xudong <br />
<br />
Li, Alice<br />
<br />
Vajargah, Shahrzad<br />
<br />
'''Title:''' Google Analytics Customer Revenue Prediction [1] - A Kaggle Competition<br />
<br />
'''Description:''' Guess which cabin class in airlines is the most profitable? One might guess economy - but in reality, it's the premium classes that show higher returns. According to research conducted by Wendover productions [2], despite having less than 50 seats and taking up more space than the economy class, premium classes end up driving more revenue than other classes.<br />
<br />
In fact, just like airlines, many companies adopt the business model where the vast majority of revenue is derived from a minority group of customers. As a result, data-intensive promotional strategies are getting more and more attention nowadays from marketing teams to further improve company returns.<br />
<br />
In this Kaggle competition, we are challenged to analyze a Google Merchanidize Store's customer dataset to predict revenue per customer. We will implement a series of data analytics methods including pre-processing, data augmentation, and parameter tuning. Different classification algorithms will be compared and optimized in order to achieve the best results.<br />
<br />
'''Reference:'''<br />
<br />
[1] Kaggle. (2018, Sep 18). Google Analytics Customer Revenue Prediction. Retrieved from https://www.kaggle.com/c/ga-customer-revenue-prediction<br />
<br />
[2] Kottke, J (2017, Mar 17). The economics of airline classes. Retrieved from https://kottke.org/17/03/the-economics-of-airline-classes<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 16'''<br />
Group members:<br />
<br />
Wang, Yu Hao<br />
<br />
Grant, Aden <br />
<br />
McMurray, Andrew<br />
<br />
Song, Baizhi<br />
<br />
'''Title:''' Google Analytics Customer Revenue Prediction - A Kaggle Competition<br />
<br />
The 80/20 rule has proven true for many businesses–only a small percentage of customers produce most of the revenue. As such, marketing teams are challenged to make appropriate investments in promotional strategies.<br />
<br />
GStore<br />
<br />
RStudio, the developer of free and open tools for R and enterprise-ready products for teams to scale and share work, has partnered with Google Cloud and Kaggle to demonstrate the business impact that thorough data analysis can have.<br />
<br />
In this competition, you’re challenged to analyze a Google Merchandise Store (also known as GStore, where Google swag is sold) customer dataset to predict revenue per customer. Hopefully, the outcome will be more actionable operational changes and a better use of marketing budgets for those companies who choose to use data analysis on top of GA data.<br />
<br />
we will test a variety of classification algorithms to determine an appropriate model.<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 17'''<br />
Group Members:<br />
<br />
Jiang, Ya Fan<br />
<br />
Zhang, Yuan<br />
<br />
Hu, Jerry Jie<br />
<br />
'''Title:''' Kaggle Competition: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' Construction of a classifier that can learn from noisy training data and generalize to a clean test set . Training data coming from the Google game "Quick, Draw"<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 18'''<br />
Group Members:<br />
<br />
Zhang, Ben<br />
<br />
'''Title:''' Two Sigma: Using News to Predict Stock Movements<br />
<br />
'''Description:''' Use news analytics to predict stock price performance. This is subject to change.<br />
<br />
----------------------------------------------------------------------<br />
'''Project # 19'''<br />
Group Members:<br />
<br />
Yan Yu Chen<br />
<br />
Qisi Deng<br />
<br />
Hengxin Li<br />
<br />
Bochao Zhang<br />
<br />
Our team currently has two interested topics at hand, and we have summarized the objective of each topic below. Please note that we will narrow down our choices after further discussions with the instructor.<br />
<br />
'''Description 1:''' With 14 percent of American claiming that social media is their most dominant news source, fake news shared on Facebook and Twitter are invading people’s information learning experience. Concomitantly, the quality and nature of online news have been gradually diluted by fake news that are sometimes imperceptible. With an aim of creating an unalloyed Internet surfing experience, we sought to develop a tool that performs fake news detection and classification. <br />
<br />
'''Description 2:''' Statistics Canada has recently reported an increasing trend of Toronto’s violent crime score. Though the Royal Canadian Mounted Police has put in the effort and endeavor to track crimes, the ambiguous snapshots captured by outdated cameras often hamper the investigation. Motivated by the aforementioned circumstance, our second interest focuses on the accurate numeral and letter identification within variable-resolution images.<br />
<br />
----------------------------------------------------------------------<br />
'''Project # 20'''<br />
Group Members:<br />
<br />
Dong, Yongqi (Michael)<br />
<br />
Kingston, Stephen<br />
<br />
'''Title:''' Kaggle--Two Sigma: Using News to Predict Stock Movements <br />
<br />
'''Description:''' The movement in price of a trade-able security, or stock, on any given day is an aggregation of each individual market participant’s appraisal of the intrinsic value of the underlying company or assets. These values are primarily driven by investors’ expectations of the company’s ability to generate future free cash flow. A steady stream of information on the state of macro and micro-economic variables which affect a company’s operations inform these market actors, primarily through news articles and alerts. We would like to take a universe of news headlines and parse the information into features, which allow us to classify the direction and ‘intensity’ of a stock’s price move, in any given day. Strategies may include various classification methods to determine the most effective solution.<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 21'''<br />
Group members:<br />
<br />
Xiao, Alex<br />
<br />
Zhang, Richard<br />
<br />
Ash, Hudson<br />
<br />
Zhu, Ziqiu<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge [Subject to Change]<br />
<br />
'''Description:''' <br />
<br />
"Quick, Draw! was released as an experimental game to educate the public in a playful way about how AI works. The game prompts users to draw an image depicting a certain category, such as ”banana,” “table,” etc. The game generated more than 1B drawings, of which a subset was publicly released as the basis for this competition’s training set. That subset contains 50M drawings encompassing 340 label categories."<br />
<br />
Our goal as students are to a build classification tool that will classify hand-drawn doodles into one of the 340 label categories.<br />
<br />
----------------------------------------------------------------------<br />
<br />
'''Project # 22'''<br />
Group Members:<br />
<br />
Lee, Yu Xuan<br />
<br />
Heng, Tsen Yee<br />
<br />
'''Title:''' Two Sigma: Using News to Predict Stock Movements<br />
<br />
'''Description:''' Use news analytics to predict stock price performance. This is subject to change.<br />
<br />
<br />
-------------------------------------------------------------------------<br />
<br />
'''Project # 23'''<br />
Group Members:<br />
<br />
Bayati, Mahdiyeh<br />
<br />
Malek Mohammadi, Saber<br />
<br />
Luong, Vincent<br />
<br />
<br />
'''Title:''' Human Protein Atlas Image Classification<br />
<br />
<br />
'''Description:''' The Human Protein Atlas is a Sweden-based initiative aimed at mapping all human proteins in cells, tissues and organs.</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_Neural_Networks_for_Sentence_Classi%EF%AC%81cation&diff=38864Convolutional Neural Networks for Sentence Classification2018-11-13T20:45:41Z<p>Mbayati: Created page with "\documentclass{article} \usepackage[utf8]{inputenc} \usepackage[letterpaper]{geometry} \usepackage{float} \usepackage{graphicx} \usepackage{mathtools} \usepackage{graphicx} \u..."</p>
<hr />
<div>\documentclass{article}<br />
\usepackage[utf8]{inputenc}<br />
\usepackage[letterpaper]{geometry}<br />
\usepackage{float}<br />
\usepackage{graphicx}<br />
\usepackage{mathtools}<br />
\usepackage{graphicx}<br />
\usepackage{mathrsfs}<br />
\usepackage{caption}<br />
\usepackage{dsfont}<br />
\usepackage{amsmath}<br />
\usepackage{subcaption}<br />
\usepackage{amssymb,amsmath}<br />
\usepackage{adjustbox}<br />
\usepackage{algorithm}<br />
\usepackage{algpseudocode}<br />
\usepackage{multirow}<br />
\usepackage{color}<br />
<br />
\usepackage{multicol}<br />
\usepackage{setspace}<br />
\newcommand\tab[1][1cm]{\hspace*{#1}}<br />
\DeclarePairedDelimiter{\floor}{\lfloor}{\rfloor}<br />
<br />
\title{Convolutional Neural Networks for Sentence Classification}<br />
\author{Saber Malekmohammadi, Maya Bayati, Vincent Luong}<br />
<br />
<br />
\begin{document}<br />
<br />
\maketitle<br />
<br />
\section{Introduction}\label{intro}<br />
In this paper, sentence classification using convolutioanl neural networks is studied. Each sentence is encoded by concatenation of the word vectors of its words and the encoded representation is fed to a model consisting of a convolutional layer followed by a dense layer for doing the classification task. Different variants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning task-specific vectors (instead of using the pre-trained vectors without any change) offers further gains in performance. <br />
<br />
<br />
<br />
\section{The Used Model and Results }<br />
Using neural models to learn vector representation for words is one of the most important contributions within natural language processing. The vector representations are obtained from projecting 1-hot representation of words (a sparse representation) onto a lower dimensional space. Continuous Bag of Words (COBW) neural language model, trained by Mikolov et al. (2013), is one of the unsupervised algorithms providing such a low dimensional vector representation in which semantic features of words are encoded. Having the low-dimensional vector representations, one can feed them to different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations used in this paper are obtained from the CBOW.<br />
<br />
The model that the authors use constitutes of a convolutional layer followed by a dense layer. In the convolutional layer, there are $m$ different filters (kernels) each resulting in one feature map. The resulting $m$ feature maps form the penultimate layer which is passed to a fully connected softmax layer, as shown in the following:<br />
<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{one.PNG}<br />
\caption{The first introduced model structure with static channel\label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
<br />
The authors introduce several variants of the model that are briefly explained in the following: \newline<br />
<br />
They first consider a baseline model, which is called CNN-rand. In this variant, they do not use any pretrained vector representation for words. All words are assigned a random vector representation and the assigned vectors are fed to the model. The random vectors get modified during training. The authors observed that the variant does not perform well compared to the other variants. \newline<br />
<br />
In the second variant of the model, which is called CNN-static, they use the pretrained word2vec vectors. They keep the pretrained vectors static; i.e., during training, the vectors do not change and only the other parameters of the model (edge weights and kernels) get learned. They observed that this simple model achieves excellent results on multiple benchmarks. Note that the used word vectors are pre-trained (regardless of the given classification task and its data set) and the model achieves excellent results when using them, while when feeding another set of publicly available word vectors (trained by Collobert et al. (2011) on Wikipedia) to the same model, the performance of the model is not as good as when word2vec word vectors are used. Based on this observation, the authors stated that the pre-trained word2vec vectors are good encoded representation for words and they can be utilized for different classification tasks. \newline<br />
<br />
<br />
<br />
The third variant that the authors consider is called CNN-non-static. Here is the first time that the authors try to learn task-specific word vectors for words. In the model, they use the pre-trained word2vec vectors to initialize the task-specific vectors, but after getting initialized, the vectors get fine-tuned during training via backpropagation. They observed that learning task-specific vectors through fine-tuning results in further improvements compared the case when the pre-trained word2vec vectors are used without any change. \newline<br />
<br />
In the fourth variant introduced for the model (CNN-multichannel), the authors add a simple modification to the structure of the model to make it capable of using both the pre-trained and the task-specific vectors. What they have done, is adding another channel of inputs to the model structure as shown in the following:<br />
<br />
<br />
\begin{figure}[H]<br />
\begin{center}<br />
%<br />
\includegraphics[ height=7cm, width=15cm]{two.PNG}<br />
\caption{The second introduced model structure with static and non-static channels \label{apartment}}<br />
\end{center}<br />
\end{figure}<br />
<br />
<br />
<br />
In this model, as shown, there are two sets (channels) of word vectors. The first one is the pre-trained word2vec vectors that are static and do not change during training. The other channel is initialized with the pretrained word2vec vectors, but it gets fine-tuned during training via backpropagation. They observed that this model, similar to CNN-non-static, results in further improvement of the model when doing classification tasks on different data sets. \newline <br />
<br />
<br />
\section{Conclusion}<br />
<br />
In both the CNN-non-static and the CNN-multichannel, they observed that the models are able to fine-tune the word vectors to make them more specific for each given task. For example, they observed that in word2vec, the most similar word to “good” is “great”, while “nice” seems closer to that as long as the goal is expressing sentiment. This can be observed in the learned vectors reflected by CNN-non-static and CNN-multichannel: for the word vectors in CNN-non-static and those in the second channel of CNN-multichannel, the most similar word to “good” is “nice”. So fine-tuning allows the model to learn more meaningful representation for words depending on the task in hand. This can be counted as the most important contribution of the paper, which adds improvement to the performance of the model compared to when it uses the pre-trained word2vec vectors in a static way and regardless of the given task. <br />
<br />
<br />
<br />
<br />
<br />
<br />
\end{document}</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36693stat441F182018-10-10T19:41:02Z<p>Mbayati: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Will be added soon || || <br />
|-<br />
|Nov 13 || Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 2|| Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks || [http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf Paper] || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Will be added soon || || <br />
|-<br />
|Nov 15 || Eric, Mike, Rebcca, Susan|| 4|| Will be added soon|| || || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classification || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Will be added soon || || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su|| 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu || 11|| TBA || || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36692stat441F182018-10-10T19:38:45Z<p>Mbayati: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Will be added soon || || <br />
|-<br />
|Nov 13 || Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 2|| Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks || [http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf Paper] || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Will be added soon || || <br />
|-<br />
|Nov 15 || Eric, Mike, Rebcca, Susan|| 4|| Will be added soon|| || || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent || 6|| Convolutional Neural Networks for Sentence Classification || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Will be added soon || || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su|| 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu || 11|| TBA || || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36519stat441F182018-10-02T19:29:40Z<p>Mbayati: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|NOv 13 || || 1|| || || <br />
|-<br />
|Nov 13 || || 2|| || || <br />
|-<br />
|NOv 15 || || 3|| || || <br />
|-<br />
|Nov 15 || || 4|| || || <br />
|-<br />
|NOv 20 || || 5|| || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent || 6|| Will be added soon || || <br />
|-<br />
|NOv 22 || || 7|| || || <br />
|-<br />
|Nov 22 || || 8|| || || <br />
|-<br />
|NOv 27 || || 9|| || || <br />
|-<br />
|Nov 27 || || 10|| || || <br />
|-<br />
|NOv 29 || || 11|| || || <br />
|-<br />
|Nov 29 || || 12|| || ||</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36518stat441F182018-10-02T19:28:24Z<p>Mbayati: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|NOv 13 || || 1|| || || <br />
|-<br />
|Nov 13 || || 2|| || || <br />
|-<br />
|NOv 15 || || 3|| || || <br />
|-<br />
|Nov 15 || || 4|| || || <br />
|-<br />
|NOv 20 || || 5|| || || <br />
|-<br />
|Nov 20 || Mahdiyeh Bayati, Saber Malekmohammadi, Vincent || 6|| Will be added soon || || <br />
|-<br />
|NOv 22 || || 7|| || || <br />
|-<br />
|Nov 22 || || 8|| || || <br />
|-<br />
|NOv 27 || || 9|| || || <br />
|-<br />
|Nov 27 || || 10|| || || <br />
|-<br />
|NOv 29 || || 11|| || || <br />
|-<br />
|Nov 29 || || 12|| || ||</div>Mbayatihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36517stat441F182018-10-02T19:25:40Z<p>Mbayati: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|NOv 13 || || 1|| || || <br />
|-<br />
|Nov 13 || || 2|| || || <br />
|-<br />
|NOv 15 || || 3|| || || <br />
|-<br />
|Nov 15 || || 4|| || || <br />
|-<br />
|NOv 20 || || 5|| || || <br />
|-<br />
|Nov 20 || Maya Bayati || 6|| || || <br />
|-<br />
|NOv 22 || || 7|| || || <br />
|-<br />
|Nov 22 || || 8|| || || <br />
|-<br />
|NOv 27 || || 9|| || || <br />
|-<br />
|Nov 27 || || 10|| || || <br />
|-<br />
|NOv 29 || || 11|| || || <br />
|-<br />
|Nov 29 || || 12|| || ||</div>Mbayati