Search results

monte Carlo Integration
Estimating parameters of a univariate Gaussian: Assuming the prior is Gaussian: ...

5 KB (870 words) - 09:45, 30 August 2017
kernelized Locality-Sensitive Hashing
...plication of LSH is large scale nearest neighbour search/classification. A large database of objects (e.g. images) can be partitioned into disjoint buckets ...athcal{D}</math>. The central limit theorem tells us that for sufficiently large <math>t</math>, the random vector ...

17 KB (2,894 words) - 09:46, 30 August 2017
stat841F18/
...mall, the learning algorithm converges very slowly. However, when Z is too large, the algorithm becomes unstable and diverges. ...benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generaliz ...

10 KB (1,620 words) - 17:50, 9 November 2018
the loss surfaces of multilayer networks (Choromanska et al.)
* For large-size networks, most local minima are equivalent and yield similar performan ...minimum (i.e. one with a large value in terms of the loss function) may be large for small-size networks, but decreases quickly with network size. ...

13 KB (2,168 words) - 09:46, 30 August 2017
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
...stribution of impact in a receptive field distributes as a Gaussian. Since Gaussian distributions generally decay quickly from the center, the effective recept ...becomes large binomial coefficients distribute with respect to $t$ like a Gaussian distribution. More specifically, when $n \to \infty$ we can write ...

27 KB (4,400 words) - 15:12, 7 November 2017
Functional regularisation for continual learning with gaussian processes
...rred to as functional regularisation for Continual Learning, leverages the Gaussian process to construct an approximate posterior belief over the underlying ta ...1) Deciding which data to store often remains heuristic; 2) It requires a large quantity of stored data to achieve good performance. ...

26 KB (4,302 words) - 23:25, 7 December 2020
learning a Nonlinear Embedding by Preserving Class Neighborhood Structure
...X} </math> by computing <math> \mathbf D[{\mathbf f}(x^a|W),{\mathbf f}(x^b|W)]</math>, where <math> {\mathbf f}(x|W)</math> represents the mapping fun ...he same class and making examples from different classes be separated by a large margin. All these methods rely on linear transformation, which has a limite ...

20 KB (3,263 words) - 09:45, 30 August 2017
continuous space language models
This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. ...lion words. It is also shown that this approach can be incorporated into a large vocabulary ...

15 KB (2,517 words) - 09:46, 30 August 2017
graph Laplacian Regularization for Larg-Scale Semidefinite Programming
...ing gradient-descent. The approach has been illustrated on localization of large scale sensor networks. ...he quadratic coefficients in the objective function while the vector <math>b \,</math> collects all the linear coefficients. Note that the matrix <math> ...

12 KB (1,953 words) - 09:45, 30 August 2017
learning Spectral Clustering, With Application To Speech Separation
...ric. A classical similarity matrix for clustering is the diagonally-scaled Gaussian similarity, defined as For two subsets of <math>A,B\subset X</math>, we define ...

35 KB (5,767 words) - 09:45, 30 August 2017
compressed Sensing Reconstruction via Belief Propagation
...hat most of the information content of a signal lays in a few samples with large magnitude. This lead to study and investigation on a class of signals, know ...sible signals in a compressed way? Is there any method to sense only those large value coefficients? In parallel works by Donoho <ref name="R1"> D. Donoho, ...

23 KB (3,784 words) - 09:45, 30 August 2017
regression on Manifold using Kernel Dimension Reduction
...X, it can be imagined as <math>\,\{Bv | v \in R^d \}</math> where <math>\,B \in R^{n \times d}</math> and <math>\,d \leq n</math> are the dimensions of ...B^T X | B^T X </math> <math>\, \Longleftrightarrow Y \perp (X - B^T X) | B^T X </math>. ...

26 KB (4,280 words) - 09:45, 30 August 2017
inductive Kernel Low-rank Decomposition with Priors: A Generalized Nystrom Method
...arning. Low-rank matrix decomposition produces a compact representation of large matrices, which is the key to scaling up a great variety of kernel learning ...the ith eigenvector and eigenvalue of <math>W</math>. In practice, given a large dataset, the Nystrom method selects <math>m</math> landmark points <math>Z< ...

16 KB (2,675 words) - 09:46, 30 August 2017
incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains(Summary)
...he two existing Gaussian distributions and the outcome is provided by that Gaussian distribution. ...s with a Markov chain dynamics. The expectation function is a multivariate Gaussian function with the chain output as the means, and a covariance matrix repres ...

18 KB (2,835 words) - 09:46, 30 August 2017
compressive Sensing
...The signal will be compressible if the above representation has just a few large coefficients and many small coefficients. We shall now briefly overview how *The initial number of samples <math>\,N</math> may be very large even if the desired <math>\ K</math> is small. ...

18 KB (2,888 words) - 09:45, 30 August 2017
Deep Residual Learning for Image Recognition
...arch. This impressive breakthrough was awarded first place in the ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC 2015). ...a chain rule expansion of parameters at deeper layers. When a network has large number of layers, the gradient tends to vanish or explode during back propa ...

19 KB (2,963 words) - 14:42, 22 November 2018
deep neural networks for acoustic modeling in speech recognition
...recognition systems. DNNs are proved to outperform GMMs in both small and large vocabulary speech recognition tasks. ...ines (RBMs) are used for pretraining except for the first layer which uses Gaussian-Bernoulli RBM (GRBM) since the input is real-value. ...

24 KB (3,699 words) - 09:46, 30 August 2017
conditional neural process
...rained from scratch for each new function. While Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a ne ...distribution over functions (stochastic processes) is another approach -- Gaussian Processes being a commonly used example of this. Such Bayesian methods can ...

32 KB (4,970 words) - 00:26, 17 December 2018
a neural representation of sketch drawings
...stribution. The generated output of these networks is trained to match the Gaussian distribution by minimizing a given loss function. Using this idea, previous ...set consisting of 70k vector sketches along with pixel images was used for large-scale exploration of human sketches. The ShadowDraw system that used 30k ra ...

30 KB (4,807 words) - 00:40, 17 December 2018
DON'T DECAY THE LEARNING RATE , INCREASE THE BATCH SIZE
...that when we are far away from the minima, it is beneficial for us to take large steps towards the minima, as it would require a lesser number of steps to c ...er of parameter updates in training a model. This can be achieved by using large batch training, which can be divided across many machines. ...

27 KB (4,025 words) - 13:28, 17 December 2018
dropout
...hniques for preventing overfitting in deep neural network which contains a large number of parameters. The key idea is to randomly drop units from the neura ...m layer <math> l </math>. <math>\ \bold{W}^{(l)} </math> and <math>\ \bold{b}^{(l)} </math> are the weights and biases at layer <math>l </math>. With dr ...

13 KB (2,182 words) - 09:46, 30 August 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
...display="inline">N</math> the training set size and <math display="inline">B</math> the batch size. Consequently the optimum batch size is proportional ...ture is small may generalize better than “sharp minima” whose curvature is large (Chaudhari et al., 2016; Hochreiter & Schmidhuber, 1997). Indeed, Dziugaite ...

34 KB (5,220 words) - 20:32, 10 December 2018
markov Random Fields for Super-Resolution
...ne node of the Markov network is assigned to each patch. Figure [[File:MRF1.jpg|thumb|right|Fig.1 Markov network for vision problems. Each node in the netw For large networks the computation of Eq(2) and Eq(3) are infeasible to evaluate dire ...

18 KB (3,001 words) - 09:46, 30 August 2017
maximum likelihood estimation of intrinsic dimension
...ifferent sample size (Fig. 1(a)) and different intrinsic dimension (Fig. 1(b)). The result is shown in figure 1. [[File:GarciaF1.jpg]] ...

15 KB (2,484 words) - 09:46, 30 August 2017
Neural Audio Synthesis of Musical Notes with WaveNet autoencoders
...t and medium scale (~500ms) signals, but rely on external conditioning for large-term dependencies; the proposed model removes the need for external conditi ..., which is a a large data set of musical notes inspired by the emerging of large image data sets. This data set servers as a great training/test resource fo ...

18 KB (2,701 words) - 00:19, 21 April 2018
stat946s13
== Set B == ...Component Analysis: Generalizing PCA for more flexible inference in linear-Gaussian models |Summary]] ...

29 KB (4,816 words) - 09:46, 30 August 2017
Patch Based Convolutional Neural Network for Whole Slide Tissue Image Classification
..., labeling patches requires specialized annotators; an excessive task at a large scale. ...es, then <math>P(y_i\ |\ x_{i,j}\ ;\ \theta)</math> is denoised by using a gaussian kernel for finding <math>P\left(H_{i,j}\right|X)</math>. The results in the ...

16 KB (2,470 words) - 14:07, 19 November 2021
Wasserstein Auto-encoders
...of representation learning were based on supervised approaches, which used large labeled datasets to achieve impressive results. On the other hand, popular ...sults hold for the random decoders as shown by the authors in the appendix B.1. ...

30 KB (4,923 words) - 19:25, 10 December 2018
Loss Function Search for Face Recognition
...researchers need to put in a lot of effort in creating their method in the large design space. AM-LFS takes an optimization approach for selecting hyperpara ..., <math>B</math> models are generated with rewards <math>R(a_i), i \in [1, B]</math>. <math>\mu</math> updates after each epoch from the reward function ...

26 KB (4,157 words) - 09:51, 15 December 2020
Augmix: New Data Augmentation method to increase the robustness of the algorithm
...mages per "synonym set" or "sysnet" in the WordNet hierarchy. WordNet is a large lexical database of English where nouns, verbs, adjectives and adverbs are ...of their approach. I would be really curious to test their methodology in Large Models, adding self-attention layers to models improves robustness. To test ...

11 KB (1,652 words) - 18:44, 6 December 2020
Robust Imitation of Diverse Behaviors
...ions. The end product is a robust neural network policy that can imitate a large and diverse set of behaviors using few training demonstrations. ** They require large training datasets in order to work for non-trivial tasks ...

20 KB (3,075 words) - 01:17, 7 April 2018
Summary of A Probabilistic Approach to Neural Network Pruning
...ing to be further trained. However, finding these lottery tickets inside a large neural network is computationally expensive (NP hard in general). ...> denotes the Hadamard product so that <math display="inline">\left[A\circ B\right]_{ij}=A_{ij}B_{ij};\ i\in\{1,...,m\},\ j\in\{1,...,n\}</math>. ...

28 KB (4,367 words) - 00:30, 23 November 2021
graves et al., Speech recognition with deep recurrent neural networks
...}}_{x h}}}</math> is the input-hidden weight matrix), and the offset <math>b</math> terms are bias vectors with appropriate subscripts (e.g. </spa ...}_t + {{{\mathbf{W}}}_{h {{\mathbf{c}}}}} {{\mathbf{h}}}_{t-1} + {{\mathbf{b}}}_{{\mathbf{c}}}\right)</math> ...

25 KB (3,828 words) - 09:46, 30 August 2017
GradientLess Descent
Curiously, a large amount of this approach focuses on approximating gradients and then using f ...ne">y</math> from the uniform distribution on <math display="inline">B_x = B\left( x, \frac{r}{\sqrt{n}} \right) </math> satisfies ...

11 KB (1,754 words) - 22:06, 9 December 2020
stat341f11
*involves four parameters: integers <math>\,a, b, m</math>, and an initial value <math>\,x_0</math> which we call the seed :<math>x_{k+1} \equiv (ax_{k} + b) \mod{m}</math> ...

139 KB (23,688 words) - 09:45, 30 August 2017
proposal for STAT946 projects Fall 2010
When there is a very large number of data <math>\,n</math>, and a very small portion of them totalling (Series B), 58:267–288, 1996.</ref>, the lasso penalty function is a more appropriate ...

17 KB (2,679 words) - 09:45, 30 August 2017
stat946f11pool
...is the largest for a specific combinations of <math> A </math> and <math> B </math>. ...age to the exact algorithms approach is that for large graphs which have a large number of nodes these algorithms take a long time to produce a result. When ...

100 KB (18,249 words) - 09:45, 30 August 2017
Learning What and Where to Draw
...that generates a synthetic image given a noise vector drawn from either a Gaussian or Uniform distribution. The discriminator is tasked with classifying image # Identifying large scale stock market movements/patterns using by adding RNN layers to the GAN ...

18 KB (2,781 words) - 12:35, 4 December 2017
Summary - A Neural Representation of Sketch Drawings
...t learning as they let us design generative models of data and fit them to large data-sets. ...AE a generative model, the generate latent vectors should roughly follow a Gaussian distribution as shown in Fig.2. This allows the user to generate an output ...

25 KB (4,196 words) - 01:32, 14 November 2018
STAT946F17/ Coupled GAN
...l network, takes as input a random ''latent'' vector (typically uniform or Gaussian) and synthesizes novel images resembling the real images (training set). Th # Task B: Learning a joint distribution of a digit and its negative image. ...

32 KB (4,965 words) - 15:02, 4 December 2017
End-to-End Differentiable Adversarial Imitation Learning
...p state-action pairs. The disadvantage of BC is that the training requires large amounts of expert data, which is hard to obtain. In addition, an agent trai ...es each action since the transition function to move from state A to state B is not learned. ...

24 KB (3,880 words) - 23:00, 20 April 2018
stat340s13
[[File:Det_vs_sto.jpg]] ...itial position and pattern starts to repeat, but if we make the number set large enough we can prevent the numbers from repeating too early. Although the ps ...

370 KB (63,356 words) - 09:46, 30 August 2017
what game are we playing
...ntensive studies of methods using AI to search for the optimal solution of large-scale, zero-sum and extensive form problems. However, most of these works o ...ors found that the best way to implement the module was to use a medium to large batch size, RMSProp, or Adam optimizers with a learning rate between <math> ...

25 KB (4,131 words) - 23:55, 6 December 2020
Being Bayesian about Categorical Probability
...le, despite these, BNNs do not scale to the state of the art techniques or large data sets. There are techniques to explicitly avoid modeling the full weigh ...single beset configuration (one-hot encoding). Specifically, BNNs with the Gaussian Weight prior $$F_x(y) = N (0,T^{-1} I)$$ has score of configuration <math>W ...

29 KB (4,651 words) - 10:57, 15 December 2020
stat341 / CM 361
...ve Congruential Method'''. This involves three integer parameters ''a'', ''b'', and ''m'', and a '''seed''' variable ''x0''. This method dete :<math>x_{i+1} = (ax_{i} + b) \mod{m}</math> ...

145 KB (24,333 words) - 09:45, 30 August 2017
stat946w18/Self Normalizing Neural Networks
# It has a saturation region, so it can dampen variances that are too large; ...e number of nodes is common, so <math display="inline">n</math> is usually large, and by the Central Limit Theorem, <math display="inline">z</math> approach ...

45 KB (6,836 words) - 23:26, 20 April 2018
stat841f14
Image:Numerical example of PCA.jpg|Finding two principal components of original data in 2D space. Components o ...en reconstruct the picture using first d principal components. If d is too large, we can not completely remove the noise. If it is too small, we will lose s ...

220 KB (37,901 words) - 09:46, 30 August 2017
stat946f11
Also let <math>B = \{2\},\ X_B = \{X_2\}</math> so we can write <math>A \longrightarrow B</math>: <math>A\,\!</math> "causes" <math>B\,\!</math>. ...

162 KB (28,558 words) - 09:45, 30 August 2017
Deep Reinforcement Learning in Continuous Action Spaces a Case Study in the Game of Simulated Curling
...an take and/or the number of possible game states is finite. Deep CNNs for large, non-convex continuous action spaces are not directly applicable. To solve ...s chosen as a domain to test the network on. Curling was chosen due to its large action space, the potential for complicated strategies, and the need for pr ...

35 KB (5,619 words) - 18:39, 10 December 2018
DeepVO Towards end to end visual odometry with deep RNN
...assic Machine Learning techniques such as k-nearest neighbors (KNNs) [15], Gaussian Processes [16], and Support Vector Machines [17]. However, these models wer ...y-based methods have a deficiency of getting lower accuracies when shown a large open area in the images as mentioned by the authors. The authors put forth ...

16 KB (2,430 words) - 18:30, 16 December 2018

Search results

Navigation menu

Search