Search results

Jump to navigation Jump to search
View (previous 20 | ) (20 | 50 | 100 | 250 | 500)
  • Estimating parameters of a univariate Gaussian: Assuming the prior is Gaussian: ...
    5 KB (870 words) - 09:45, 30 August 2017
  • ...plication of LSH is large scale nearest neighbour search/classification. A large database of objects (e.g. images) can be partitioned into disjoint buckets ...athcal{D}</math>. The central limit theorem tells us that for sufficiently large <math>t</math>, the random vector ...
    17 KB (2,894 words) - 09:46, 30 August 2017
  • ...mall, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges. ...benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generaliz ...
    10 KB (1,620 words) - 17:50, 9 November 2018
  • * For large-size networks, most local minima are equivalent and yield similar performan ...minimum (i.e. one with a large value in terms of the loss function) may be large for small-size networks, but decreases quickly with network size. ...
    13 KB (2,168 words) - 09:46, 30 August 2017
  • ...stribution of impact in a receptive field distributes as a Gaussian. Since Gaussian distributions generally decay quickly from the center, the effective recept ...becomes large binomial coefficients distribute with respect to $t$ like a Gaussian distribution. More specifically, when $n \to \infty$ we can write ...
    27 KB (4,400 words) - 15:12, 7 November 2017
  • ...rred to as functional regularisation for Continual Learning, leverages the Gaussian process to construct an approximate posterior belief over the underlying ta ...1) Deciding which data to store often remains heuristic; 2) It requires a large quantity of stored data to achieve good performance. ...
    26 KB (4,302 words) - 23:25, 7 December 2020
  • ...X} </math> by computing <math> \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)]</math>, where <math> {\mathbf f}(x|W)</math> represents the mapping fun ...he same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limite ...
    20 KB (3,263 words) - 09:45, 30 August 2017
  • This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. ...lion words. It is also shown that this approach can be incorporated into a large vocabulary ...
    15 KB (2,517 words) - 09:46, 30 August 2017
  • ...ing gradient-descent. The approach has been illustrated on localization of large scale sensor networks.<br /> ...he quadratic coefficients in the objective function while the vector <math>b \,</math> collects all the linear coefficients. Note that the matrix <math> ...
    12 KB (1,953 words) - 09:45, 30 August 2017
  • ...ric. A classical similarity matrix for clustering is the diagonally-scaled Gaussian similarity, defined as For two subsets of <math>A,B\subset X</math>, we define ...
    35 KB (5,767 words) - 09:45, 30 August 2017
  • ...hat most of the information content of a signal lays in a few samples with large magnitude. This lead to study and investigation on a class of signals, know ...sible signals in a compressed way? Is there any method to sense only those large value coefficients? In parallel works by Donoho <ref name="R1"> D. Donoho, ...
    23 KB (3,784 words) - 09:45, 30 August 2017
  • ...X, it can be imagined as <math>\,\{Bv | v \in R^d \}</math> where <math>\,B \in R^{n \times d}</math> and <math>\,d \leq n</math> are the dimensions of ...B^T X | B^T X </math> <math>\, \Longleftrightarrow Y \perp (X - B^T X) | B^T X </math>. ...
    26 KB (4,280 words) - 09:45, 30 August 2017
  • ...arning. Low-rank matrix decomposition produces a compact representation of large matrices, which is the key to scaling up a great variety of kernel learning ...the ith eigenvector and eigenvalue of <math>W</math>. In practice, given a large dataset, the Nystrom method selects <math>m</math> landmark points <math>Z< ...
    16 KB (2,675 words) - 09:46, 30 August 2017
  • ...he two existing Gaussian distributions and the outcome is provided by that Gaussian distribution. ...s with a Markov chain dynamics. The expectation function is a multivariate Gaussian function with the chain output as the means, and a covariance matrix repres ...
    18 KB (2,835 words) - 09:46, 30 August 2017
  • ...The signal will be compressible if the above representation has just a few large coefficients and many small coefficients. We shall now briefly overview how *The initial number of samples <math>\,N</math> may be very large even if the desired <math>\ K</math> is small. ...
    18 KB (2,888 words) - 09:45, 30 August 2017
  • ...arch. This impressive breakthrough was awarded first place in the ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC 2015).</p> ...a chain rule expansion of parameters at deeper layers. When a network has large number of layers, the gradient tends to vanish or explode during back propa ...
    19 KB (2,963 words) - 14:42, 22 November 2018
  • ...recognition systems. DNNs are proved to outperform GMMs in both small and large vocabulary speech recognition tasks. ...ines (RBMs) are used for pretraining except for the first layer which uses Gaussian-Bernoulli RBM (GRBM) since the input is real-value. ...
    24 KB (3,699 words) - 09:46, 30 August 2017
  • ...rained from scratch for each new function. While Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a ne ...distribution over functions (stochastic processes) is another approach -- Gaussian Processes being a commonly used example of this. Such Bayesian methods can ...
    32 KB (4,970 words) - 00:26, 17 December 2018
  • ...stribution. The generated output of these networks is trained to match the Gaussian distribution by minimizing a given loss function. Using this idea, previous ...set consisting of 70k vector sketches along with pixel images was used for large-scale exploration of human sketches. The ShadowDraw system that used 30k ra ...
    30 KB (4,807 words) - 00:40, 17 December 2018
  • ...that when we are far away from the minima, it is beneficial for us to take large steps towards the minima, as it would require a lesser number of steps to c ...er of parameter updates in training a model. This can be achieved by using large batch training, which can be divided across many machines. ...
    27 KB (4,025 words) - 13:28, 17 December 2018
View (previous 20 | ) (20 | 50 | 100 | 250 | 500)