http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=Y2748li&feedformat=atomstatwiki - User contributions [US]2023-12-07T22:53:11ZUser contributionsMediaWiki 1.28.3http://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38515stat841F18/2018-11-09T04:33:15Z<p>Y2748li: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill with random values (e.g, Gaussian noise|Gaussian random noise);<br />
# estimate by least-squares fit to a matrix of response variables, computed using the Moore–Penrose pseudoinverse|pseudoinverse, given a design matrix]] :<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
ELM is a learning mechanism for the generalized SLFNs, where learning is made without iterative tuning. The essence of ELM is that the hidden layer of the generalized SLFNs should not be tuned. This paper has shown that both LS-SVM and PSVM can be simplified by removing the term bias b and the resultant learning algorithms are unified with ELM. Instead of different variants requested for different types of applications, ELM can be applied in regression and multiclass classification appli- cations directly. <br />
<br />
ELM requires less human intervention than SVM and LS- SVM/PSVM. If the feature mappings h(x) are known to users, in ELM, only one parameter C needs to be specified by users. The generalization performance of ELM is not sensitive to the dimensionality L of the feature space (the number of hidden nodes) as long as L is set large enough (e.g., L ≥ 1000 for all the real-world cases tested in our simulations). Different from SVM, LS-SVM, and PSVM which usually request two parameters (C,γ) to be specified by users, single-parameter setting makes ELM be used easily and efficiently. If feature mappings are unknown to users, similar to SVM, LS-SVM, and PSVM, kernels can be applied in ELM as well. Different from LS-SVM and PSVM, ELM does not have con- straints on the Lagrange multipliers αi’s. Since LS-SVM and ELM have the same optimization objective functions and LS- SVM has some optimization constraints on Lagrange multipli- ers αi’s, in this sense, LS-SVM tends to obtain a solution which is suboptimal to ELM.<br />
<br />
As verified by the simulation results, compared to SVM and LS-SVM ELM achieves similar or better generalization performance for regression and binary class classification cases, and much better generalization performance for multiclass clas- sification cases. ELM has better scalability and runs at much faster learning speed (up to thousands of times) than traditional SVM and LS-SVM.<br />
<br />
== Critiques ==<br />
<br />
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.<br />
<br />
Firstly, Algrithms such as SVM and Deep Learning are focusing on fitting a complex function with less parameters while ELM uses more parameters to fit a relatively simple function<br />
<br />
Secondly, the name: an ELM is *exactly* what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.<br />
<br />
Thirdly, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart).<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38514stat841F18/2018-11-09T04:31:26Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill with random values (e.g, Gaussian noise|Gaussian random noise);<br />
# estimate by least-squares fit to a matrix of response variables, computed using the Moore–Penrose pseudoinverse|pseudoinverse, given a design matrix]] :<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.<br />
<br />
Firstly, Algrithms such as SVM and Deep Learning are focusing on fitting a complex function with less parameters while ELM uses more parameters to fit a relatively simple function<br />
<br />
Secondly, the name: an ELM is *exactly* what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.<br />
<br />
Thirdly, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart).<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38513stat841F18/2018-11-09T04:30:47Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill with random values (e.g, [[Gaussian noise|Gaussian random noise]]);<br />
# estimate by [[least-squares fit]] to a matrix of response variables, computed using the [[Moore–Penrose pseudoinverse|pseudoinverse]], given a [[design matrix]] :<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.<br />
<br />
Firstly, Algrithms such as SVM and Deep Learning are focusing on fitting a complex function with less parameters while ELM uses more parameters to fit a relatively simple function<br />
<br />
Secondly, the name: an ELM is *exactly* what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.<br />
<br />
Thirdly, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart).<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38512stat841F18/2018-11-09T04:28:58Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where {{math|'''W'''<sub>1</sub>}} is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and {{math|'''W'''<sub>2</sub>}} is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill {{math|'''W'''<sub>1</sub>}} with random values (e.g, [[Gaussian noise|Gaussian random noise]]);<br />
# estimate {{math|'''W'''<sub>2</sub>}} by [[least-squares fit]] to a matrix of response variables {{math|'''Y'''}}, computed using the [[Moore–Penrose pseudoinverse|pseudoinverse]] {{math|⋅<sup>+</sup>}}, given a [[design matrix]] {{math|'''X'''}}:<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.<br />
<br />
Firstly, Algrithms such as SVM and Deep Learning are focusing on fitting a complex function with less parameters while ELM uses more parameters to fit a relatively simple function<br />
<br />
Secondly, the name: an ELM is *exactly* what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.<br />
<br />
Thirdly, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart).<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38511stat841F18/2018-11-09T04:26:03Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where <math>{\boldsymbol \beta}_i</math> is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where {{math|'''W'''<sub>1</sub>}} is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and {{math|'''W'''<sub>2</sub>}} is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill {{math|'''W'''<sub>1</sub>}} with random values (e.g, [[Gaussian noise|Gaussian random noise]]);<br />
# estimate {{math|'''W'''<sub>2</sub>}} by [[least-squares fit]] to a matrix of response variables {{math|'''Y'''}}, computed using the [[Moore–Penrose pseudoinverse|pseudoinverse]] {{math|⋅<sup>+</sup>}}, given a [[design matrix]] {{math|'''X'''}}:<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.<br />
<br />
Firstly, Algrithms such as SVM and Deep Learning are focusing on fitting a complex function with less parameters while ELM uses more parameters to fit a relatively simple function<br />
<br />
Secondly, the name: an ELM is *exactly* what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.<br />
<br />
Thirdly, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart).<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38510stat841F18/2018-11-09T04:25:51Z<p>Y2748li: /* Algorithms */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
<br />
<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where <math>{\boldsymbol \beta}_i</math> is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where {{math|'''W'''<sub>1</sub>}} is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and {{math|'''W'''<sub>2</sub>}} is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill {{math|'''W'''<sub>1</sub>}} with random values (e.g, [[Gaussian noise|Gaussian random noise]]);<br />
# estimate {{math|'''W'''<sub>2</sub>}} by [[least-squares fit]] to a matrix of response variables {{math|'''Y'''}}, computed using the [[Moore–Penrose pseudoinverse|pseudoinverse]] {{math|⋅<sup>+</sup>}}, given a [[design matrix]] {{math|'''X'''}}:<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.<br />
<br />
Firstly, Algrithms such as SVM and Deep Learning are focusing on fitting a complex function with less parameters while ELM uses more parameters to fit a relatively simple function<br />
<br />
Secondly, the name: an ELM is *exactly* what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.<br />
<br />
Thirdly, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart).<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38509stat841F18/2018-11-09T04:23:53Z<p>Y2748li: /* Critiques */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
<br />
==Algorithms==<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where <math>{\boldsymbol \beta}_i</math> is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where {{math|'''W'''<sub>1</sub>}} is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and {{math|'''W'''<sub>2</sub>}} is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill {{math|'''W'''<sub>1</sub>}} with random values (e.g, [[Gaussian noise|Gaussian random noise]]);<br />
# estimate {{math|'''W'''<sub>2</sub>}} by [[least-squares fit]] to a matrix of response variables {{math|'''Y'''}}, computed using the [[Moore–Penrose pseudoinverse|pseudoinverse]] {{math|⋅<sup>+</sup>}}, given a [[design matrix]] {{math|'''X'''}}:<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.<br />
<br />
Firstly, Algrithms such as SVM and Deep Learning are focusing on fitting a complex function with less parameters while ELM uses more parameters to fit a relatively simple function<br />
<br />
Secondly, the name: an ELM is *exactly* what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.<br />
<br />
Thirdly, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart).<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38508stat841F18/2018-11-09T04:19:42Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<br />
The extreme learning machine (ELM) is a particular kind of machine learning setup in which a single layer or multiple layers apply. The ELM includes numbers of hidden neurons where the input weights are assigned randomly. Extreme learning machines use the concept of random projection and early perceptron models to do specific kinds of problem-solving.<br />
<br />
<br />
==Algorithms==<br />
Given a single hidden layer of ELM, suppose that the output function of the <math>i</math>-th hidden node is <math>h_i(\mathbf{x})=G(\mathbf{a}_i,b_i,\mathbf{x})</math>, where <math>\mathbf{a}_i</math> and <math>b_i</math> are the parameters of the <math>i</math>-th hidden node. The output function of the ELM for SLFNs with <math>L</math> hidden nodes is:<br />
<br />
<math>f_L({\bf x})=\sum_{i=1}^L{\boldsymbol \beta}_ih_i({\bf x})</math>, where <math>{\boldsymbol \beta}_i</math> is the output weight of the <math>i</math>-th hidden node.<br />
<br />
<math>\mathbf{h}(\mathbf{x})=[G(h_i(\mathbf{x}),...,h_L(\mathbf{x}))]</math> is the hidden layer output mapping of ELM. Given <math>N</math> training samples, the hidden layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix}<br />
{\bf h}({\bf x}_1)\\<br />
\vdots\\<br />
{\bf h}({\bf x}_N)<br />
\end{matrix}\right]=\left[\begin{matrix}<br />
G({\bf a}_1, b_1, {\bf x}_1) &\cdots & G({\bf a}_L, b_L, {\bf x}_1)\\<br />
\vdots &\vdots&\vdots\\<br />
G({\bf a}_1, b_1, {\bf x}_N) &\cdots & G({\bf a}_L, b_L, {\bf x}_N)<br />
\end{matrix}\right]<br />
</math><br />
<br />
and <math>\mathbf{T}</math> is the training data target matrix: <math>{\bf T}=\left[\begin{matrix}<br />
{\bf t}_1\\<br />
\vdots\\<br />
{\bf t}_N<br />
\end{matrix}\right]<br />
</math><br />
<br />
General speaking, ELM is a kind of regularization neural networks but with non-tuned hidden layer mappings (formed by either random hidden nodes, kernels or other implementations), its objective function is:<br />
<br />
<math><br />
\text{Minimize: } \|{\boldsymbol \beta}\|_p^{\sigma_1}+C\|{\bf H}{\boldsymbol \beta}-{\bf T}\|_q^{\sigma_2}<br />
</math><br />
<br />
where <math>\sigma_1>0, \sigma_2>0, p,q=0, \frac{1}{2}, 1, 2, \cdots, +\infty</math>. <br />
<br />
Different combinations of <math>\sigma_1</math>, <math>\sigma_2</math>, <math>p</math> and <math>q</math> can be used and result in different learning algorithms for regression, classification, sparse coding, compression, feature learning and clustering.<br />
<br />
As a special case, a simplest ELM training algorithm learns a model of the form (for single hidden layer sigmoid neural networks):<br />
<br />
:<math>\mathbf{\hat{Y}} = \mathbf{W}_2 \sigma(\mathbf{W}_1 x)</math><br />
<br />
where {{math|'''W'''<sub>1</sub>}} is the matrix of input-to-hidden-layer weights, <math>\sigma</math> is an activation function, and {{math|'''W'''<sub>2</sub>}} is the matrix of hidden-to-output-layer weights. The algorithm proceeds as follows:<br />
<br />
# Fill {{math|'''W'''<sub>1</sub>}} with random values (e.g, [[Gaussian noise|Gaussian random noise]]);<br />
# estimate {{math|'''W'''<sub>2</sub>}} by [[least-squares fit]] to a matrix of response variables {{math|'''Y'''}}, computed using the [[Moore–Penrose pseudoinverse|pseudoinverse]] {{math|⋅<sup>+</sup>}}, given a [[design matrix]] {{math|'''X'''}}:<br />
#:<math>\mathbf{W}_2 = \sigma(\mathbf{W}_1 \mathbf{X})^+ \mathbf{Y}</math><br />
<br />
<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38507stat841F18/2018-11-09T04:12:07Z<p>Y2748li: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<center><br />
[[File:dd.png|800px]]<br />
</center><br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:dd.png&diff=38506File:dd.png2018-11-09T04:11:29Z<p>Y2748li: </p>
<hr />
<div></div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38505stat841F18/2018-11-09T04:09:54Z<p>Y2748li: /* Performance Verification */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38504stat841F18/2018-11-09T04:09:35Z<p>Y2748li: /* Performance Verification */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
Fig. 1.<br />
</center><br />
Fig. 1 shows the scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38503stat841F18/2018-11-09T04:08:44Z<p>Y2748li: /* Performance Verification */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
Scalability of different classifiers: An example on letter data set. training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
</center><br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38502stat841F18/2018-11-09T04:07:07Z<p>Y2748li: /* Performance Verification */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
<br />
Fig. 1.<br />
training time spent by LS-SVM and ELM (Gaussian kernel) increases sharply when the number of training data increases. However, the training time spent by ELM with Sigmoid additive node and multiquadric function node increases very slowly when the number of training data increases.<br />
</center><br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38501stat841F18/2018-11-09T04:05:47Z<p>Y2748li: /* Performance Verification */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|400px]]<br />
</center><br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38500stat841F18/2018-11-09T04:05:33Z<p>Y2748li: /* Performance Verification */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
<center><br />
[[File:cc.png|800px]]<br />
</center><br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:cc.png&diff=38499File:cc.png2018-11-09T04:05:02Z<p>Y2748li: </p>
<hr />
<div></div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38498stat841F18/2018-11-09T04:03:25Z<p>Y2748li: </p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== Performance Verification ==<br />
<br />
<center><br />
[[File:bb.png|800px]]<br />
</center><br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:bb.png&diff=38497File:bb.png2018-11-09T04:02:59Z<p>Y2748li: </p>
<hr />
<div></div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38496stat841F18/2018-11-09T03:57:00Z<p>Y2748li: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work ==<br />
<br />
As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training al- gorithms is usually intensive, which is at least quadratic with respect to the number of training examples<br />
<br />
Least square SVM (LS-SVM) [2] and proximal SVM (PSVM) [3] provide fast implementations of the traditional SVM. Both LS-SVM and PSVM use equality optimization constraints instead of inequalities from the traditional SVM, which results in a direct least square solution by avoiding quadratic programming.<br />
<br />
SVM, LS-SVM, and PSVM are originally proposed for bi- nary classification. Different methods have been proposed in or- der for them to be applied in multiclass classification problems. One-against-all (OAA) and one-against-one (OAO) methods are mainly used in the implementation of SVM in multiclass classification applications [8]. <br />
<br />
extreme learning machine (ELM) for single hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on real world benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38495stat841F18/2018-11-09T03:49:43Z<p>Y2748li: /* Motivation */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38494stat841F18/2018-11-09T03:48:05Z<p>Y2748li: /* References */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38493stat841F18/2018-11-09T03:47:52Z<p>Y2748li: /* References */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [1]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38492stat841F18/2018-11-09T03:45:17Z<p>Y2748li: /* References */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38491stat841F18/2018-11-09T03:44:27Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|800px]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,<br />
Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38490stat841F18/2018-11-09T03:44:15Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png|200px]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,<br />
Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38489stat841F18/2018-11-09T03:42:58Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
<center><br />
[[File:aa.png]]<br />
</center><br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,<br />
Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38488stat841F18/2018-11-09T03:41:44Z<p>Y2748li: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture ==<br />
[[File:aa.jpg]]<br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,<br />
Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:aa.png&diff=38487File:aa.png2018-11-09T03:41:13Z<p>Y2748li: </p>
<hr />
<div></div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38486stat841F18/2018-11-09T03:39:39Z<p>Y2748li: /* References */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture == <br />
<br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==<br />
<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,<br />
Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38485stat841F18/2018-11-09T03:29:22Z<p>Y2748li: /* Motivation */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation ==<br />
<br />
There are several issues on BP learning algorithms:<br />
<br />
(1) When the learning rate Z is too small, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges.<br />
<br />
(2) Another peculiarity of the error surface that impacts the performance of the BP learning algorithm is the presence of local minima [6]. It is undesirable that the learning algorithm stops at a local minima if it is located far above a global minima.<br />
<br />
(3) Neural network may be over-trained by using BP algorithms and obtain worse generalization performance. Thus, validation and suitable stopping methods are required in the cost function minimization procedure.<br />
<br />
(4) Gradient-based learning is very time-consuming in most applications.<br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture == <br />
<br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38484stat841F18/2018-11-09T03:27:03Z<p>Y2748li: </p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Motivation == <br />
<br />
== Previous Work == <br />
<br />
<br />
<br />
<br />
<br />
<br />
== Model Architecture == <br />
<br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38483stat841F18/2018-11-09T03:26:36Z<p>Y2748li: /* Introduction */</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction ==<br />
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) [1] and its variants [2]–[4] have been extensively used in classification applications.<br />
Least square support vector machine (LS-SVM) and proximal sup- port vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification appli- cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases.<br />
<br />
== Previous Work == <br />
<br />
<br />
== Motivation == <br />
<br />
<br />
<br />
== Model Architecture == <br />
<br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=38482stat441F182018-11-09T03:07:44Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Memory-Based Parameter Adaptation || [https://arxiv.org/pdf/1802.10542.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/images/0/0f/MbPA_Summary.pdf Summary] ||<br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going Deeper with Convolutions ||[https://arxiv.org/pdf/1409.4842.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary]<br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Topic Compositional Neural Language Model|| [https://arxiv.org/pdf/1712.09783.pdf paper] || <br />
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18/TCNLM Summary]<br />
|-<br />
|Nov 15 || Zhaoran Hou, Pei Wei Wang, Chi Zhang, Yiming Li, Daoyi Chen, Ying Chi|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797 Paper] || <br />
[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/ Summary]<br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek, Brendan Ross, Jon Barenboim, Junqiao Lin, James Bootsma || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classiﬁcation || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song, Yongqi Dong || 11|| Towards Deep Learning Models Resistant to Adversarial Attacks || [https://arxiv.org/pdf/1706.06083.pdf Paper] || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||<br />
|-<br />
|Makeup || Hudson Ash, Stephen Kingston, Richard Zhang, Alexandre Xiao, Ziqiu Zhu || || || ||<br />
|-<br />
|Makeup || Frank Jiang, Yuan Zhang, Jerry Hu || || || ||<br />
|-<br />
|Makeup || Yu Xuan Lee, Tsen Yee Heng || 15 || Gradient Episodic Memory for Continual Learning || [http://papers.nips.cc/paper/7225-gradient-episodic-memory-for-continual-learning.pdf Paper] ||<br />
|-<br />
|Makeup || || || || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat841F18/&diff=38481stat841F18/2018-11-09T03:05:16Z<p>Y2748li: Created page with "== Presented by == Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang == Introduction == == Previous Work == == Motivation == == Model Architecture == == ILSVRC 20..."</p>
<hr />
<div>== Presented by == <br />
Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang<br />
<br />
== Introduction == <br />
<br />
== Previous Work == <br />
<br />
<br />
== Motivation == <br />
<br />
<br />
<br />
== Model Architecture == <br />
<br />
<br />
== ILSVRC 2014 Challenge Results ==<br />
<br />
<br />
<br />
== Conclusion ==<br />
<br />
<br />
== Critiques ==<br />
<br />
== References ==</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=37312stat441F182018-10-28T19:49:25Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Will be added soon || || <br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going deeper with convolutions ||[https://arxiv.org/pdf/1409.4842.pdf paper] || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| The Evolution of Sentiment Analysis|| || <br />
|-<br />
|Nov 15 || Zhaoran Hou, Pei Wei Wang, Chi Zhang, Yiming Li, Daoyi Chen, Ying Chi|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797] || ||<br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classiﬁcation || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith, Alexandre Xiao, Hudson Ash, Richard Zhang, Stephen Kingston, Ziqiu Zhu || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song || 11|| TBA || || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=37311stat441F182018-10-28T19:29:22Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Will be added soon || || <br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going deeper with convolutions ||[https://arxiv.org/pdf/1409.4842.pdf paper] || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| The Evolution of Sentiment Analysis|| || <br />
|-<br />
|Nov 15 || Zhaoran Hou, Pei Wei Wang, Chi Zhang, Yiming Li, Daoyi Chen, Ying Chi|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797] || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classiﬁcation || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith, Alexandre Xiao, Hudson Ash, Richard Zhang, Stephen Kingston, Ziqiu Zhu || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song || 11|| TBA || || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=37310stat441F182018-10-28T19:28:54Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Will be added soon || || <br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going deeper with convolutions ||[https://arxiv.org/pdf/1409.4842.pdf paper] || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| The Evolution of Sentiment Analysis|| || <br />
|-<br />
|Nov 15 || zhaoran Hou, Pei Wei Wang, Chi Zhang, Yiming Li, Daoyi Chen, Ying Chi|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797] || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classiﬁcation || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith, Alexandre Xiao, Hudson Ash, Richard Zhang, Stephen Kingston, Ziqiu Zhu || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song || 11|| TBA || || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=37309stat441F182018-10-28T19:28:27Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Will be added soon || || <br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going deeper with convolutions ||[https://arxiv.org/pdf/1409.4842.pdf paper] || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| The Evolution of Sentiment Analysis|| || <br />
|-<br />
|Nov 15 || zhaoran Hou, Pei Wei Wang, Chi Zhang, Yiming Li, daoyi Chen, Ying Chi|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797] || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classiﬁcation || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith, Alexandre Xiao, Hudson Ash, Richard Zhang, Stephen Kingston, Ziqiu Zhu || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song || 11|| TBA || || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=37308stat441F182018-10-28T19:26:22Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|Nov 13 || Jason Schneider, Jordyn Walton, Zahraa Abbas, Andrew Na || 1|| Will be added soon || || <br />
|-<br />
|Nov 13 ||Sai Praneeth M, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah|| 2|| Going deeper with convolutions ||[https://arxiv.org/pdf/1409.4842.pdf paper] || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| The Evolution of Sentiment Analysis|| || <br />
|-<br />
|Nov 15 || Eric, Mike, Rebcca, Susan|| 4|| Extreme Learning Machine for regression and Multi-class Classification|| [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6035797] || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent Loung || 6|| Convolutional Neural Networks for Sentence Classiﬁcation || [https://arxiv.org/pdf/1408.5882.pdf paper] || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai, Colin Stranc, Philomène Bobichon, Aditya Maheshwari, Zepeng An || 7|| Robust Probabilistic Modeling with Bayesian Data Reweighting || [http://proceedings.mlr.press/v70/wang17g/wang17g.pdf Paper] || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su, Jiacheng Weng, Keqi Li, Yi Qian, Bomeng Liu || 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith, Alexandre Xiao, Hudson Ash, Richard Zhang, Stephen Kingston, Ziqiu Zhu || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam, Yan Jiao, Shuyue Wang, Yutong Wu, Shikun Cui || 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu, Aden Grant, Yu Hao Wang, Andrew McMurry, Baizhi Song || 11|| TBA || || <br />
|-<br />
|Nov 29 || Qianying Zhao, Hui Huang, Lingyun Yi, Jiayue Zhang, Siao Chen, Rongrong Su, Gezhou Zhang, Meiyu Zhou || 12|| || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F18-STAT841-Proposal&diff=36609F18-STAT841-Proposal2018-10-06T05:11:50Z<p>Y2748li: </p>
<hr />
<div><br />
'''Use this format (Don’t remove Project 0)'''<br />
<br />
'''Project # 0'''<br />
Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
'''Title:''' Making a String Telephone<br />
<br />
'''Description:''' We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 1'''<br />
Group members:<br />
<br />
Weng, Jiacheng<br />
<br />
Li, Keqi<br />
<br />
Qian, Yi<br />
<br />
Liu, Bomeng<br />
<br />
'''Title:''' RSNA Pneumonia Detection Challenge<br />
<br />
'''Description:''' <br />
<br />
Our team’s project is the RSNA Pneumonia Detection Challenge from Kaggle competition. The primary goal of this project is to develop a machine learning tool to detect patients with pneumonia based on their chest radiographs (CXR). <br />
<br />
Pneumonia is an infection that inflames the air sacs in human lungs which has symptoms such as chest pain, cough, and fever [1]. Pneumonia can be very dangerous especially to infants and elders. In 2015, 920,000 children under the age of 5 died from this disease [2]. Due to its fatality to children, diagnosing pneumonia has a high order. A common method of diagnosing pneumonia is to obtain patients’ chest radiograph (CXR) which is a gray-scale scan image of patients’ chests using x-ray. The infected region due to pneumonia usually shows as an area or areas of increased opacity [3] on CXR. However, many other factors can also contribute to increase in opacity on CXR which makes the diagnose very challenging. The diagnose also requires highly-skilled clinicians and a lot of time of CXR screening. The Radiological Society of North America (RSNA®) sees the opportunity of using machine learning to potentially accelerate the initial CXR screening process. <br />
<br />
For the scope of this project, our team plans to contribute to solving this problem by applying our machine learning knowledge in image processing and classification. Team members are going to apply techniques that include, but are not limited to: logistic regression, random forest, SVM, kNN, CNN, etc., in order to successfully detect CXRs with pneumonia.<br />
<br />
<br />
[1] (Accessed 2018, Oct. 4). Pneumonia [Online]. MAYO CLINIC. Available from: https://www.mayoclinic.org/diseases-conditions/pneumonia/symptoms-causes/syc-20354204<br />
[2] (Accessed 2018, Oct. 4). RSNA Pneumonia Detection Challenge [Online]. Kaggle. Available from: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge<br />
[3] Franquet T. Imaging of community-acquired pneumonia. J Thorac Imaging 2018 (epub ahead of print). PMID 30036297<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 2'''<br />
Group members:<br />
<br />
Hou, Zhaoran<br />
<br />
Zhang, Chi<br />
<br />
'''Title:''' <br />
<br />
'''Description:'''<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 3'''<br />
Group members:<br />
<br />
Hanzhen Yang<br />
<br />
Jing Pu Sun<br />
<br />
Ganyuan Xuan<br />
<br />
Yu Su<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:'''<br />
<br />
Our team chose the [https://www.kaggle.com/c/quickdraw-doodle-recognition Quick, Draw! Doodle Recognition Challenge] from the Kaggle Competition. The goal of the competition is to build an image recognition tool that can classify hand-drawn doodles into one of the 340 categories.<br />
<br />
The main challenge of the project remains in the training set being very noisy. Hand-drawn artwork may deviate substantially from the actual object, and is almost definitively different from person to person. Mislabeled images also present a problem since they will create outlier points when we train our models. <br />
<br />
We plan on learning more about some of the currently mature image recognition algorithms to inspire and develop our own model.<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 4'''<br />
Group members:<br />
<br />
Snaith, Mitchell<br />
<br />
'''Title:''' Reproducibility report: *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks*<br />
<br />
'''Description:''' <br />
<br />
The paper *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks* [1] has been submitted to ICLR 2019. It aims to "fix" variational Bayes and turn it into a robust inference tool through two innovations. <br />
<br />
Goals are to: <br />
<br />
- reproduce the deterministic variational inference scheme as described in the paper without referencing the original author's code, providing a 3rd party implementation<br />
<br />
- reproduce experiment results with own implementation, using the same NN framework for reference implementations of compared methods described in the paper<br />
<br />
- reproduce experiment results with the author's own implementation<br />
<br />
- explore other possible applications of variational Bayes besides heteroscedastic regression<br />
<br />
[1] OpenReview location: https://openreview.net/forum?id=B1l08oAct7<br />
<br />
'''Project # 5'''<br />
Group members:<br />
<br />
Rebecca, Chen<br />
<br />
Susan,<br />
<br />
Mike, Li<br />
<br />
Ted, Wang<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' <br />
<br />
Classification has become a more and more eye-catching, especially with the rise of machine learning in these years. Our team is particularly interested in machine learning algorithms that optimize some specific type image classification. <br />
<br />
In this project, we will dig into base classifiers we learnt from the class and try to cook them together to find an optimal solution for a certain type images dataset. Currently, we are looking into a dataset from Kaggle: Quick, Draw! Doodle Recognition Challenge. The dataset in this competition contains 50M drawings among 340 categories and is the subset of the world’s largest doodling dataset and the doodling dataset is updating by real drawing game players. Anyone can contribution by joining it! (quickdraw.withgoogle.com).<br />
<br />
For us, as machine learning students, we are more eager to help getting a better classification method. By “better”, we mean find a balance between simplify and accuracy. We will start with neural network via different activation functions in each layer and we will also combine base classifiers with bagging, random forest, boosting for ensemble learning. Also, we will try to regulate our parameters to avoid overfitting in training dataset. Last, we will summary features of this type image dataset, formulate our solutions and standardize our steps to solve this kind problems <br />
<br />
Hopefully, we can not only finish our project successfully, but also make a little contribution to machine learning research field.<br />
<br />
--------------------------------------------------------------------</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F18-STAT841-Proposal&diff=36608F18-STAT841-Proposal2018-10-06T05:10:26Z<p>Y2748li: </p>
<hr />
<div><br />
'''Use this format (Don’t remove Project 0)'''<br />
<br />
'''Project # 0'''<br />
Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
'''Title:''' Making a String Telephone<br />
<br />
'''Description:''' We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 1'''<br />
Group members:<br />
<br />
Weng, Jiacheng<br />
<br />
Li, Keqi<br />
<br />
Qian, Yi<br />
<br />
Liu, Bomeng<br />
<br />
'''Title:''' RSNA Pneumonia Detection Challenge<br />
<br />
'''Description:''' <br />
<br />
Our team’s project is the RSNA Pneumonia Detection Challenge from Kaggle competition. The primary goal of this project is to develop a machine learning tool to detect patients with pneumonia based on their chest radiographs (CXR). <br />
<br />
Pneumonia is an infection that inflames the air sacs in human lungs which has symptoms such as chest pain, cough, and fever [1]. Pneumonia can be very dangerous especially to infants and elders. In 2015, 920,000 children under the age of 5 died from this disease [2]. Due to its fatality to children, diagnosing pneumonia has a high order. A common method of diagnosing pneumonia is to obtain patients’ chest radiograph (CXR) which is a gray-scale scan image of patients’ chests using x-ray. The infected region due to pneumonia usually shows as an area or areas of increased opacity [3] on CXR. However, many other factors can also contribute to increase in opacity on CXR which makes the diagnose very challenging. The diagnose also requires highly-skilled clinicians and a lot of time of CXR screening. The Radiological Society of North America (RSNA®) sees the opportunity of using machine learning to potentially accelerate the initial CXR screening process. <br />
<br />
For the scope of this project, our team plans to contribute to solving this problem by applying our machine learning knowledge in image processing and classification. Team members are going to apply techniques that include, but are not limited to: logistic regression, random forest, SVM, kNN, CNN, etc., in order to successfully detect CXRs with pneumonia.<br />
<br />
<br />
[1] (Accessed 2018, Oct. 4). Pneumonia [Online]. MAYO CLINIC. Available from: https://www.mayoclinic.org/diseases-conditions/pneumonia/symptoms-causes/syc-20354204<br />
[2] (Accessed 2018, Oct. 4). RSNA Pneumonia Detection Challenge [Online]. Kaggle. Available from: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge<br />
[3] Franquet T. Imaging of community-acquired pneumonia. J Thorac Imaging 2018 (epub ahead of print). PMID 30036297<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 2'''<br />
Group members:<br />
<br />
Hou, Zhaoran<br />
<br />
Zhang, Chi<br />
<br />
'''Title:''' <br />
<br />
'''Description:'''<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 3'''<br />
Group members:<br />
<br />
Hanzhen Yang<br />
<br />
Jing Pu Sun<br />
<br />
Ganyuan Xuan<br />
<br />
Yu Su<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:'''<br />
<br />
Our team chose the [https://www.kaggle.com/c/quickdraw-doodle-recognition Quick, Draw! Doodle Recognition Challenge] from the Kaggle Competition. The goal of the competition is to build an image recognition tool that can classify hand-drawn doodles into one of the 340 categories.<br />
<br />
The main challenge of the project remains in the training set being very noisy. Hand-drawn artwork may deviate substantially from the actual object, and is almost definitively different from person to person. Mislabeled images also present a problem since they will create outlier points when we train our models. <br />
<br />
We plan on learning more about some of the currently mature image recognition algorithms to inspire and develop our own model.<br />
<br />
--------------------------------------------------------------------<br />
<br />
'''Project # 4'''<br />
Group members:<br />
<br />
Snaith, Mitchell<br />
<br />
'''Title:''' Reproducibility report: *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks*<br />
<br />
'''Description:''' <br />
<br />
The paper *Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks* [1] has been submitted to ICLR 2019. It aims to "fix" variational Bayes and turn it into a robust inference tool through two innovations. <br />
<br />
Goals are to: <br />
<br />
- reproduce the deterministic variational inference scheme as described in the paper without referencing the original author's code, providing a 3rd party implementation<br />
<br />
- reproduce experiment results with own implementation, using the same NN framework for reference implementations of compared methods described in the paper<br />
<br />
- reproduce experiment results with the author's own implementation<br />
<br />
- explore other possible applications of variational Bayes besides heteroscedastic regression<br />
<br />
[1] OpenReview location: https://openreview.net/forum?id=B1l08oAct7<br />
<br />
'''Project # 5'''<br />
Group members:<br />
<br />
Rebecca, Chen<br />
<br />
Susan,<br />
<br />
Mike, Li<br />
<br />
Ted, Wang<br />
<br />
'''Title:''' Kaggle Challenge: Quick, Draw! Doodle Recognition Challenge<br />
<br />
'''Description:''' <br />
<br />
Classification has become a more and more eye-catching, especially with the rise of machine learning in these years. Our team is particularly interested in machine learning algorithms that optimize some specific type image classification. <br />
<br />
In this project, we will dig into base classifiers we learnt from the class and try to cook them together to find an optimal solution for a certain type images dataset. Currently, we are looking into a dataset from Kaggle: Quick, Draw! Doodle Recognition Challenge. The dataset in this competition contains 50M drawings among 340 categories and is the subset of the world’s largest doodling dataset and the doodling dataset is updating by real drawing game players. Anyone can contribution by joining it! (quickdraw.withgoogle.com).<br />
<br />
For us, as machine learning students, we are more eager to help getting a better classification method. By “better”, we mean find a balance between simplify and accuracy. We will start with neural network via different activation functions in each layer and we will also combine base classifiers with bagging, random forest, boosting for ensemble learning. Also, we will try to regulate our parameters to avoid overfitting in training dataset. Last, we will summary features of this type image dataset, formulate our solutions and standardize our steps to solve this kind problems <br />
<br />
Hopefully, we can not only finish our project successfully, but also make a little contribution to machine learning research field.</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36573stat441F182018-10-04T22:28:06Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|NOv 13 || || 1|| || || <br />
|-<br />
|Nov 13 || || 2|| || || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Will be added soon|| || <br />
|-<br />
|Nov 15 || Eric, Mike, Rebcca, Susan|| 4|| Will be added soon|| || || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent || 6|| Will be added soon || || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai || 7|| Will be added soon || || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su|| 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam|| 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu || 11|| TBA || || <br />
|-<br />
|Nov 29 || || 12|| || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36572stat441F182018-10-04T22:27:29Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|NOv 13 || || 1|| || || <br />
|-<br />
|Nov 13 || || 2|| || || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Will be added soon|| || <br />
|-<br />
|Nov 15 || Eric, Yiming Li, Rebcca, Susan|| 3|| Will be added soon|| || || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent || 6|| Will be added soon || || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai || 7|| Will be added soon || || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su|| 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam|| 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu || 11|| TBA || || <br />
|-<br />
|Nov 29 || || 12|| || ||</div>Y2748lihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F18&diff=36571stat441F182018-10-04T22:26:52Z<p>Y2748li: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F18-STAT841-Proposal| Project Proposal ]] ==<br />
<br />
[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]<br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="100pt"|Name <br />
|width="30pt"|Paper number <br />
|width="700pt"|Title<br />
|width="30pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|-<br />
|Feb 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [http://wikicoursenote.com/wiki/Stat946f15/Sequence_to_sequence_learning_with_neural_networks#Long_Short-Term_Memory_Recurrent_Neural_Network Summary]<br />
|-<br />
|NOv 13 || || 1|| || || <br />
|-<br />
|Nov 13 || || 2|| || || <br />
|-<br />
|NOv 15 || Yan Yu Chen, Qisi Deng, Hengxin Li, Bochao Zhang|| 3|| Will be added soon|| || <br />
|-<br />
|Nov 15 || || 4|| || || Eric, Yiming Li, Rebcca, Susan|| 3|| Will be added soon|| || <br />
|-<br />
|NOv 20 || Kristi Brewster, Isaac McLellan, Ahmad Nayar Hassan, Marina Medhat Rassmi Melek || 5|| A Neural Representation of Sketch Drawings || || <br />
|-<br />
|Nov 20 || Maya(Mahdiyeh) Bayati, Saber Malekmohammadi, Vincent || 6|| Will be added soon || || <br />
|-<br />
|NOv 22 || Qingxi Huo, Yanmin Yang, Jiaqi Wang, Yuanjing Cai || 7|| Will be added soon || || <br />
|-<br />
|Nov 22 || Hanzhen Yang, Jing Pu Sun, Ganyuan Xuan, Yu Su|| 8|| Deep Residual Learning for Image Recognition || [http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || <br />
|-<br />
|NOv 27 || Mitchell Snaith || 9|| You Only Look Once: Unified, Real-Time Object Detection, V1 -> V3 || [https://arxiv.org/pdf/1506.02640.pdf Paper] || <br />
|-<br />
|Nov 27 || Qi Chu, Gloria Huang, Dylan Sang, Amanda Lam|| 10|| tba || || <br />
|-<br />
|NOv 29 || Jameson Ngo, Amy Xu || 11|| TBA || || <br />
|-<br />
|Nov 29 || || 12|| || ||</div>Y2748li