Search results

Page title matches

Learning to Teach
This is a summary of the paper titled: "Learning to Teach", authored by Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, and Tie-Yan ...ent, determining the appropriate data, loss function, and hypothesis space to facilitate the learning of the student model. ...

21 KB (3,351 words) - 18:40, 16 December 2018
Sandbox to test w2l
#REDIRECT [[sandbox to test w2l]] ...

33 bytes (6 words) - 09:46, 30 August 2017
sandbox to test w2l

54 bytes (12 words) - 09:46, 30 August 2017
Learning What and Where to Draw
...label or a non-localized caption. The authors of 'Learning What and Where to Draw' believe that image synthesis will be drastically enhanced by incorpor ...scription what each image is intended to depict. The proposed model learns to perform location and content-controllable image synthesis on the Caltech-UC ...

18 KB (2,781 words) - 12:35, 4 December 2017
link to my paper
</ref>. Now we turn to ...

204 bytes (22 words) - 09:45, 30 August 2017
From Variational to Deterministic Autoencoders
...er are able to generate samples that are comparable or better when applied to domains of images and structured objects. The authors point to several drawbacks currently associated with VAE's including: ...

15 KB (2,313 words) - 19:11, 2 December 2020
Link to my paper
#REDIRECT [[link to my paper]] ...

30 bytes (5 words) - 09:45, 30 August 2017
Pixels to Graphs by Associative Embedding
...ons between them. An explicit representation of this semantics is referred to as a scene graph where we represent objects grounded in the scene as vertic ...all of the objects in the scene, then isolate individual pairs of objects to identify the relationships between them. This breakdown often restricts the ...

17 KB (2,749 words) - 18:26, 16 December 2018
Learning to Navigate in Cities Without a Map
[https://arxiv.org/pdf/1804.00168.pdf Learning to Navigate in Cities Without a Map] ...forcement learning (RL), it suffers from data inefficiency and sensitivity to changes in the environment. Thus, it is unclear whether this method could b ...

28 KB (4,494 words) - 00:24, 17 December 2018
learning Spectral Clustering, With Application To Speech Separation
...hat generalizes to the unseen datasets when spectral clustering is applied to them. Traditional spectral clustering techniques assume a metric or a simil Clustering refers to partition a given dataset into clusters such that data points in the same c ...

35 KB (5,767 words) - 09:45, 30 August 2017
Towards Deep Learning Models Resistant to Adversarial Attacks
This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmid ...e|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.[https://arxiv.org/abs/1412.6572 Source]]] ...

14 KB (2,192 words) - 03:01, 23 November 2018
Learning Spectral Clustering, With Application To Speech Separation
#REDIRECT [[learning Spectral Clustering, With Application To Speech Separation]] ...

81 bytes (9 words) - 09:45, 30 August 2017
DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION
...ue function, directly on latent state samples which help to enable scaling to more complex tasks. ...omes with using finite imagination horizons. The authors have also managed to demonstrate empirical performance for visual control by evaluating the mode ...

13 KB (2,072 words) - 06:07, 10 December 2020
Summary of A Probabilistic Approach to Neural Network Pruning
...proposes that the subnetworks can achieve similar accuracy without having to be further trained. However, finding these lottery tickets inside a large n ...theoretical guarantees of pruning. This study, ''A Probabilistic Approach to Neural Network Pruning'' by Xin Qian and Diego Klabjan [18], focuses on the ...

28 KB (4,367 words) - 00:30, 23 November 2021
A Game Theoretic Approach to Class-wise Selective Rationalization
...alternative conclusions. Each class consists of three players who compete to find evidence for both factual and counterfactual circumstances. In a simpl ...ng explanations for a specific class by probing the importance with regard to the relevant class logit. ...

11 KB (1,594 words) - 13:14, 25 November 2021
Neural Machine Translation: Jointly Learning to Align and Translate
#REDIRECT [[neural Machine Translation: Jointly Learning to Align and Translate]] ...

81 bytes (10 words) - 09:46, 30 August 2017
rOBPCA: A New Approach to Robust Principal Component Analysis
...ix. Since the classical estimation for covariance matrix is very sensitive to the presence of outliers, it is not surprising that the principal component ...to show that Bayesian robust estimator may be alternative choice compared to classical robust estimators. ...

15 KB (2,414 words) - 09:46, 30 August 2017
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
...n from Tel Aviv University. This paper is part of the NIPS 2018 conference to be hosted in December 2018 at Montréal, Canada. This paper summary is based ...framework for capturing such effects is structured prediction, which seeks to predict structured objects (such as graphs with nodes and edges) rather tha ...

29 KB (4,603 words) - 21:21, 6 December 2018
neural Machine Translation: Jointly Learning to Align and Translate
...hod is more effective compared to other neural network models when applied to long sentences. ...word. The decoder then selectively combines the most relevant annotations to generate each target word; this implements a mechanism of attention in the ...

14 KB (2,221 words) - 09:46, 30 August 2017
stat946F18/Beyond Word Importance Contextual Decomposition to Extract Interactions from LSTMs
...for analyzing individual predictions made by the LSTMs without any change to the underlying original model. The problem of sentiment analysis is chosen ...n domain, this paper shows how the contextual decomposition method is used to successfully extract positive and negative negations from an LSTM. This pap ...

31 KB (5,069 words) - 18:21, 16 December 2018
ROBPCA: A New Approach to Robust Principal Component Analysis
#REDIRECT [[rOBPCA: A New Approach to Robust Principal Component Analysis]] ...

75 bytes (10 words) - 09:46, 30 August 2017
A Rank Minimization Heuristic with Application to Minimum Order System Approximation
#REDIRECT [[a Rank Minimization Heuristic with Application to Minimum Order System Approximation]] ...

98 bytes (12 words) - 09:45, 30 August 2017
A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
#REDIRECT [[a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization]] ...

105 bytes (12 words) - 09:45, 30 August 2017
STAT946F17/ Teaching Machines to Describe Images via Natural Language Feedback
...rd in that we can easily point to where the mistakes occur and suggest how to correct them. ...n also be seen as a multimodal problem where the whole network/model needs to combine the solution space of learning in both the image processing and tex ...

23 KB (3,760 words) - 10:33, 4 December 2017
a Rank Minimization Heuristic with Application to Minimum Order System Approximation
...stics and signal processing. Except in some special cases the RMP is known to be computationally hard. \mbox{subject to: } & X \in C, ...

8 KB (1,446 words) - 09:45, 30 August 2017
a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
...pular online shopping website Amazon.com for recommending related products to users of Amazon.com based on what these users have recently purchased from Our goal, then, is to predict or infer the other preferences---in a sense, completing the matrix. ...

24 KB (3,853 words) - 09:45, 30 August 2017
Obfuscated Gradients Give a False Sense of Security Circumventing Defenses to Adversarial Examples
...lassify with high confidence. These attacks pose a major threat that needs to be addressed before these systems can be deployed on a large scale, especia ...much lower than claimed. In fact, the majority of these attacks were found to be ineffective against true iterative white box attacks. ...

27 KB (3,974 words) - 17:54, 6 December 2018
a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis
...rform [http://en.wikipedia.org/wiki/Inference inference] across data sets. To this end, they demonstrate their penalized CCA method on a genomic data set ...r value decomposition will give the best rank-<math>r</math> approximation to the matrix. ...

30 KB (4,829 words) - 09:45, 30 August 2017
A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis
#REDIRECT [[a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...

131 bytes (15 words) - 09:45, 30 August 2017
graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns
...ide information and incorporate the side information in the classification to improve the algorithms. ...uctured classification problem in practice, we need both an expressive way to represent our beliefs about the structure, as well as an efficient probabil ...

17 KB (2,924 words) - 09:46, 30 August 2017
Graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns
...RECT [[graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns]] ...

145 bytes (17 words) - 09:46, 30 August 2017
End to end Active Object Tracking via Reinforcement Learning
...g box labeling. In addition, Camera Control is non-trivial, which can lead to many expensive trial-and-errors in the real world. To address these challenges, this paper presents an end-to-end active tracking solution via deep reinforcement learning. More specific ...

29 KB (4,453 words) - 18:27, 16 December 2018
DeepVO Towards end to end visual odometry with deep RNN
...lude the VO field, thus the paper proposes a novel deep-learning based end-to-end VO algorithm and then empirically demonstrates its viability. ...ture based methods and direct methods, which differ in the method employed to select reference points. Sparse feature based methods establish reference p ...

16 KB (2,430 words) - 18:30, 16 December 2018
Stat946f15/Sequence to sequence learning with neural networks
#REDIRECT [[stat946f15/Sequence to sequence learning with neural networks]] ...

75 bytes (10 words) - 09:46, 30 August 2017
stat946f15/Sequence to sequence learning with neural networks
...amount of work to learn more than one language past childhood. The ability to efficiently and quickly translate between languages would then be of great ...s that capture their meaning, as sentences with similar meanings are close to each other while sentences with different meanings will be far. ...

23 KB (3,755 words) - 19:49, 5 February 2018
End-to-End Differentiable Adversarial Imitation Learning
...is that the training requires large amounts of expert data, which is hard to obtain. In addition, an agent trained using BC is unaware of how its action ...re it takes each action since the transition function to move from state A to state B is not learned. ...

24 KB (3,880 words) - 23:00, 20 April 2018
Augmix: New Data Augmentation method to increase the robustness of the algorithm
...s & Dietterich (2019), showing that the classification error rose from 25% to 62% when some corruption was introduced on the ImageNet test set. ...ce that networks trained on translation augmentations are highly sensitive to the shifting of pixels. ...

11 KB (1,652 words) - 18:44, 6 December 2020
U-Time:A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging Summary
...pecially trained to be applied on one dataset alone and might be difficult to use for non-experts in a more general setting (Perslev et al., 2019). ...r architectural tuning to be applied to variable data sets, and it is able to classify sleep stages at any temporal resolution (Perslev et al., 2019). ...

8 KB (1,170 words) - 01:41, 26 November 2021
From Machine Learning to Machine Reasoning
#REDIRECT [[from Machine Learning to Machine Reasoning]] ...

56 bytes (7 words) - 09:46, 30 August 2017
Convolutional Sequence to Sequence Learning
'''Sequence to sequence learning''' has been used to solve many tasks such as machine translation, speech recognition, and text ...other. This allows to precisely control the maximum length of dependencies to be modeled. ...

27 KB (4,178 words) - 20:37, 28 November 2017
from Machine Learning to Machine Reasoning
...82, 273–302.</ref>. Algorithms for inference do exist but they do however, come at a price of reduced expressive capabilities in logical inference and prob ...ut not yet formal or logical. Informal logic is attractive because we hope to avoid the computational complexity that is associated with combinatorial se ...

21 KB (3,225 words) - 09:46, 30 August 2017
learn what not to learn
...ees to climb"). Then a machine learning model can be trained to generalize to unseen states. ...with high probability. '''Note that the core assumption is that it is easy to predict which actions are invalid or inferior in each state and leverage th ...

29 KB (4,751 words) - 13:38, 17 December 2018

Page text matches

Imagination Augmented Agents for Deep Reinforcement Learning
...experience, yet in complex domains for which a simulator is not available to the agents, the performance of model-based agents employing standard planni ...ct useful knowledge gathered from model simulations. This allows the agent to benefit from model-based imagination without the pitfalls of conventional m ...

2 KB (210 words) - 20:39, 9 March 2018
is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction
...ed using a convex combination to a number of clusters rather than uniquely to one cluster. This is an unsupervised version of the so-called multi-class c ...e data, authors have recently proposed discrete analogues to PCA. We refer to the method as multinomial PCA(mPCA) because it is a precise multinomial ana ...

2 KB (321 words) - 09:45, 30 August 2017
Deep Transfer Learning with Joint Adaptation Networks
...ean discrepancy (JMMD) criterion. Adversarial training strategy is adopted to maximize JMMD such that the distributions of the source and target domains ...

760 bytes (109 words) - 15:32, 2 October 2017
Sandbox to test w2l
#REDIRECT [[sandbox to test w2l]] ...

33 bytes (6 words) - 09:46, 30 August 2017
Link to my paper
#REDIRECT [[link to my paper]] ...

30 bytes (5 words) - 09:45, 30 August 2017
importance Sampling June 2 2009
...lt but there exists a probability distribution function g(x) which is easy to sample from, then <math>I</math> can be written as ...playstyle E_g(w(x)) \rightarrow</math>the expectation of w(x) with respect to g(x) ...

2 KB (395 words) - 09:45, 30 August 2017
Learning Spectral Clustering, With Application To Speech Separation
#REDIRECT [[learning Spectral Clustering, With Application To Speech Separation]] ...

81 bytes (9 words) - 09:45, 30 August 2017
ROBPCA: A New Approach to Robust Principal Component Analysis
#REDIRECT [[rOBPCA: A New Approach to Robust Principal Component Analysis]] ...

75 bytes (10 words) - 09:46, 30 August 2017
Neural Machine Translation: Jointly Learning to Align and Translate
#REDIRECT [[neural Machine Translation: Jointly Learning to Align and Translate]] ...

81 bytes (10 words) - 09:46, 30 August 2017
A Rank Minimization Heuristic with Application to Minimum Order System Approximation
#REDIRECT [[a Rank Minimization Heuristic with Application to Minimum Order System Approximation]] ...

98 bytes (12 words) - 09:45, 30 August 2017
A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
#REDIRECT [[a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization]] ...

105 bytes (12 words) - 09:45, 30 August 2017
deepGenerativeModels
To properly train a neural network a large labeled dataset, however large data These models scale linearly in proportion to the number of classes in the data sets. The number of evaluations could be ...

466 bytes (70 words) - 09:46, 30 August 2017
A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis
#REDIRECT [[a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...

131 bytes (15 words) - 09:45, 30 August 2017
contributions on Video-Based Face Recognition Using Adaptive Hidden Markov Models
...the methods used for video-based face recognition were based on the still-to-still techniques which aimed at selecting good frame and then performed som ...s kind of application, which is called online video. The other scenario is to process the video content offline, like indexing the meeting records or ana ...

3 KB (512 words) - 09:45, 30 August 2017
Pre-Training-Tasks-For-Embedding-Based-Large-Scale-Retrieval
...ifically, this paper explores whether we can train machine learning models to learn from dialog. *Evaluated some baseline models on this data and compared them to standard supervised learning. ...

2 KB (309 words) - 19:52, 17 November 2020
Graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns
...RECT [[graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns]] ...

145 bytes (17 words) - 09:46, 30 August 2017
Stat946f15/Sequence to sequence learning with neural networks
#REDIRECT [[stat946f15/Sequence to sequence learning with neural networks]] ...

75 bytes (10 words) - 09:46, 30 August 2017
From Machine Learning to Machine Reasoning
#REDIRECT [[from Machine Learning to Machine Reasoning]] ...

56 bytes (7 words) - 09:46, 30 August 2017
sign up for your presentation
...following table. Put your name and a link to the paper that you are going to present. Chose a date between Nov 16 and Dec 2 (inclusive). .../Correlate/pmd.pdf], [[A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis|Summary]] ...

4 KB (570 words) - 09:45, 30 August 2017
a Dynamic Bayesian Network Click Model for web search ranking
...a document they will not examine the next document. This model is similar to hidden Markov model in that there is a conditional dependency between the p ...that if a URL clicked by a user that means it’s both examined and relevant to the query . In another word , given a query q , position i and URL u the pr ...

3 KB (593 words) - 09:46, 30 August 2017
Hash Embeddings for Efficient Word Representations
...hence because of this for whatever data we need to feed in the network has to be continuous in nature. Images can easily be represented as real-valued ve ...parameters it needs to learn is quite high. There have been some solutions to it: ...

4 KB (646 words) - 19:44, 26 October 2017
Batch Normalization Summary
...batch-normalization layers right before the activations (to have the input to the activations be normalized as desired). Both networks were trained with ...he 15th, 50th, and 85th percentiles of the input were recorded. The figure to the left demonstrates how these values changed during training. The y axis ...

4 KB (637 words) - 02:07, 28 November 2018
learning2reasoning
...properties (cite). Algorithms for inference do exist but they do however, come at a price of reduced expressive capabilities in logical inference and prob ...

852 bytes (116 words) - 09:46, 30 August 2017
paper Summaries
==A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis== [[A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...

2 KB (222 words) - 09:45, 30 August 2017
acceptance-Rejection Sampling
...>; on the other hand we would reject the samples if the ratio is not close to 1. At x=9; we will reject samples according to the ratio <math> \frac {f(x)}{c \cdot g(x)} </math> after sampling from <ma ...

6 KB (937 words) - 09:45, 30 August 2017
Task Understanding from Confushing Multitask Data
3 data sets are used to compare CSL to existing methods, 1 function regression task and 2 image classification tas ...s <math>f_j</math> as well as determine which mapping function corresponds to each of the <math>m</math> observations. 3 scalar-valued, scalar-input func ...

5 KB (878 words) - 19:25, 15 November 2020
Meta-Learning-For-Domain Generalization
...d during training time. Here by defining tasks as domains, the paper tries to overcome the problem in a model-agnostic way. ...

1 KB (200 words) - 15:47, 9 November 2020
copyofstat341
...sed for the uniform distribution, other methods must be developed in order to generate pseudo random numbers from other distributions. ...he fact that when a random sample from the uniform distribution is applied to the inverse of a cumulative density function (cdf) of some distribution, th ...

5 KB (836 words) - 09:45, 30 August 2017
again on Markov Chain
...on of its classes. This decomposition is always possible and it is reduced to one class only in the case of an irreducible chain. ...ath> The state 3 can go to every other state but none of the others can go to it ...

7 KB (1,129 words) - 09:45, 30 August 2017
proof of Lemma 1
-\textbf{u}^T\textbf{a} \; \textrm{ subject } \; \textrm{ to } \; \|\textbf{u}\|^2_2 \leq 1, \; \|\textbf{u}\|_1 \leq c_1 and we differentiate, set the derivative to 0 and solve for <math>\textbf{u}</math>: ...

2 KB (311 words) - 09:45, 30 August 2017
main Page
'''NOTE: Wiki has been migrated from wikicoursenote.com to wiki.math.uwaterloo.ca/statwiki''' ==Go to [[stat841f10|Stat441/841 & CM 463/763-Fall 2010]] == ...

5 KB (769 words) - 22:53, 5 September 2021
s13Stat946proposal
...pefully, the pattern of the teams and lineups in the latent space can lead to interesting conclusions. Secondly, we apply the selected methods to lineup data sets and get the plots of the lineups in the low-dimensional sp ...

6 KB (983 words) - 09:46, 30 August 2017
markov Chain Definitions
...<math>f(x)</math> so that a variation of importance estimation can be used to estimate an integral in the form All that is required is a Markov chain which eventually converges to <math>f(x)</math>. ...

5 KB (865 words) - 09:45, 30 August 2017
Batch Normalization
...ork, the inputs are no longer normalized at each hidden layer. So, we want to reduce this internal covariate shift by normalizing the input at each hidde ...However, this is a very expensive operation, and does not necessarily lead to a gradient function that is well defined. ...

6 KB (931 words) - 21:10, 28 November 2018
proposal for STAT946 (Deep Learning) final projects Fall 2015
...r the gander , some of which occasionally amuses but none of which amounts to much of a story” contains negative sentiment, but it is not immediately cle This competition seeks to implement machine learning algorithms that can determine the sentiment of a ...

7 KB (1,125 words) - 09:46, 30 August 2017
bayesian and Frequentist Schools of Thought
...n the Bayesian and Frequentist views on probability, along with references to '''Bayesian Inference'''. ...enough, by the central limit theorem, the Normal distribution can be used to approximate a Binomial distribution. ...

6 KB (924 words) - 09:45, 30 August 2017
f11Stat946presentation
...n up your name at the moment. When you chose the paper that you would like to present, add its title and a link to the paper. ...

3 KB (418 words) - 09:45, 30 August 2017
Deep Residual Learning for Image Recognition Summary
...ces as the parameters in the model are tuned, and thus the model is unable to evolve. ...would result in the error values of the deeper network being at most equal to those of the shallower network. However, this result is not seen in practic ...

6 KB (1,020 words) - 12:01, 3 December 2021
Convolutional Neural Networks for Sentence Classiﬁcation
...riants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning ta ...different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations us ...

7 KB (1,086 words) - 22:49, 13 November 2018
a Deeper Look into Importance Sampling
...playstyle E_g(h(x)) \rightarrow</math>the expectation of h(x) with respect to g(x), where <math>\displaystyle \frac{f(x)}{g(x)} </math> is a weight <math The method of Importance Sampling is simple but can lead to some problems. The <math> \displaystyle \hat I </math> estimated by Importa ...

6 KB (1,083 words) - 09:45, 30 August 2017
stat940F21
|width="30pt"|Link to the paper |width="30pt"|Link to the video ...

5 KB (642 words) - 23:29, 1 December 2021
test1
{{Cleanup|date=September 2010|reason=explain what needs to be done}} ...

255 bytes (46 words) - 09:45, 30 August 2017
Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network
...nvolutional Neural Network, and Support Vector Machine models are proposed to address this issue. In 2019, Aashrith and et al. used CNN to recognize traffic signs. They achieved 99.18% accuracy on Belgium Data and ...

4 KB (515 words) - 18:44, 17 December 2021
the Indian Buffet Process: An Introduction and Review
The Indian buffet process can also be used to define a prior distribution in any setting where the where <math> \alpha </math> is a hyper-parameter, which is similar to the parameter defined in DP. ...

6 KB (1,032 words) - 09:46, 30 August 2017
deflation Method for Penalized Matrix Decomposition Sparse PCA
...Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". ''Biostati The penalized matrix decomposition can be used to obtain a version of sparse PCA. In this case, ...

2 KB (277 words) - 09:45, 30 August 2017
metric and Kernel Learning Using a Linear Transformation
...tructure in the data explicityly but most of them are unable to generalize to new added data points as only implicit non-linear transformation is given. ...that it can handle out-of-sample extensions. Also, even though the matrix to be learned may be infinite-dimensional, it can be fully represented in term ...

6 KB (1,007 words) - 09:46, 30 August 2017
genetics
...o account to detect variants of mutations. This procedure should enable us to prognosis, diagnosis, and/or control a wide variety of diseases. ...type of interruptions on this important step of gene expression would lead to various kind of disease such as cancers and neurological disorders. ...

6 KB (980 words) - 09:46, 30 August 2017
proof
...<math>\lambda_{\max}(\cdot)</math> at the matrix <math>X \in S_n</math>. To do this we must first define the subgradient. ...tion we are interested in is <math>\lambda_{\max}(\cdot)</math>. In order to define the subgradiant of this function we must first ensure it is convex. ...

3 KB (589 words) - 09:45, 30 August 2017
importance Sampling and Monte Carlo Simulation
...order to get a distribution for the probability 'p' of a Binomial, we have to divide the Binomial distribution by n. This new distribution has the same s # Compute <math>\displaystyle \delta = p_1 - p_2</math> in order to get n values for <math>\displaystyle \delta</math>; ...

7 KB (1,232 words) - 09:45, 30 August 2017
link to my paper
</ref>. Now we turn to ...

204 bytes (22 words) - 09:45, 30 August 2017
Deep Learning for Extreme Multi-label Text Classification
...out. However, the shortcomings of the existing methods are inevitable due to data sparsity and scalability. With deep learning and Convolutional Neural ...interpret. Therefore, the concept of compressing label space is introduced to effectively create lower-dimensional label vectors using either linear or n ...

6 KB (969 words) - 21:50, 13 November 2021
binomial Probability Monte Carlo Sampling June 2 2009
...order to get a distribution for the probability 'p' of a Binomial, we have to divide the Binomial distribution by n. This new distribution has the same s # Compute <math>\displaystyle \delta = p_1 - p_2</math> in order to get n values for <math>\displaystyle \delta</math>; ...

5 KB (788 words) - 09:45, 30 August 2017
Dynamic Routing Between Capsulesl
...cases, we want to reduce the number of dimensions because we always want to save computations. The reason behind this kind of pooling method is based o ...od is that, it only passes the local patterns into the next layer. That is to say, if our original data set doesn't have the good property of neighborhoo ...

8 KB (1,394 words) - 19:54, 20 March 2018
Unsupervised Machine Translation Using Monolingual Corpora Only
The paper presents an unsupervised method to machine translation using only monoligual corpora without any alignment bet The general approach of the methodology is to first use a unsupervised word-by-word translation model proposed by [Connea ...

8 KB (1,359 words) - 22:48, 19 November 2018
generating Random Numbers
...lling a fair die repetitively to produce a series of random numbers from 1 to 6). One way to generate pseudo random numbers from the uniform distribution is using the ' ...

8 KB (1,324 words) - 09:45, 30 August 2017
a Dynamic Bayesian Network Click Model for Web Search Ranking
...users click on what appears as the first search results and it is unlikely to click on results that do not appear at the beginning, even though relevant. ...el'' of user behavior, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisf ...

11 KB (1,852 words) - 09:45, 30 August 2017
hierarchical Dirichlet Processes
...osal generally cannot model shared information between groups. One idea is to make <math>G_0</math> become discrete by limiting the choice of <math> G_0 ...e measure. Note that <math>G_0</math> is discrete with probability one due to the fact of Dirichlet process. ...

8 KB (1,341 words) - 09:46, 30 August 2017
large-Scale Supervised Sparse Principal Component Analysis
...s that it is computationally expensive. Many algorithms have been proposed to solve the sparse PCA problem, and the authors introduced a fast block coord ...nsion of the data. Since <math>\hat{n}</math> could be very small compared to the dimension <math>n</math> of the data, this algorithm is computationally ...

7 KB (1,209 words) - 09:46, 30 August 2017
nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization
...s with computing k-nearest neighbors of each input and adding a constraint to preserve distances and angles between k-nearest neighbors: and also a constraint on outputs to be centerd on the origin: ...

7 KB (1,093 words) - 09:45, 30 August 2017
deep Sparse Rectifier Neural Networks
...easy to train and easy to generalize, while neuroscientists' objective is to produce useful representation of the scientific data. In other words, machi ...e at 1/2 of their maximum rate when at zero. A solution to this problem is to use a rectifier neuron which does not fire at it's zero value. This rectifi ...

9 KB (1,338 words) - 09:46, 30 August 2017
stat441w18/summary 1
...based methods where they learn the i-th training examples are "remembered" to learn for corresponding weights. Prediction on untrained examples are then ...nal feature space and then apply existing linear methods. The main goal is to reduce the bottleneck of kernel-based inference methods. ...

5 KB (753 words) - 12:51, 7 March 2018
stat946F18
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

14 KB (1,851 words) - 03:22, 2 December 2018
stat441F21
|width="15pt"|Link to the paper |width="30pt"|Link to the summary ...

8 KB (1,194 words) - 04:28, 1 December 2021
measuring and testing dependence by correlation of distances
...o random variables could be in different dimensions. Second, dCov is equal to zero if and only is the two variables are independent. ...ritten in terms of the expectations of Euclidean distances which is easier to interpret: ...

4 KB (586 words) - 09:46, 30 August 2017
a Rank Minimization Heuristic with Application to Minimum Order System Approximation
...stics and signal processing. Except in some special cases the RMP is known to be computationally hard. \mbox{subject to: } & X \in C, ...

8 KB (1,446 words) - 09:45, 30 August 2017
on the Number of Linear Regions of Deep Neural Networks
...rger. Furthermore, having many layers can theoretically cause problems due to vanishing gradients. ...number of input regions. This is caused by the deep hierarchy which allows to apply the same computation across different regions of the input space. ...

8 KB (1,391 words) - 09:46, 30 August 2017
STAT946F20/BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
..."bank" as a "financial institution" or the "land alongside or sloping down to a river or lake". ...e positional encoding, which has the same dimension as the word embedding, to obtain the sequential information of the inputs. BERT is built by the N uni ...

9 KB (1,342 words) - 06:36, 10 December 2020
Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases
...f pneumonia in CT Scan images. Then they carried out 10 k cross validation to estimate the model will perform on unseen dataset. And finally they evaluat ...iologists, radiologists and computer scientists have been working together to detect microbial diseases such as tuberculosis, malaria and pneumonia using ...

7 KB (974 words) - 14:56, 21 November 2021
techniques for Normal and Gamma Sampling
...sform Method and sample from independent uniform distributions seen before to generate a sample following a Gamma distribution. ...le to use the Acceptance-Rejection method, but there are still better ways to sample from a Standard Normal Distribution. ...

7 KB (1,114 words) - 09:45, 30 August 2017
test
...imal). This creates a big problem, as this method becomes very susceptible to poor data (i.e., not very robust). This intuitively makes sense, as the age ...e noisy demonstration to be ranked according to their relative performance to each other. Another similar method requires extra labelling of the data wit ...

10 KB (1,526 words) - 17:39, 26 November 2021
context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
...by its ability to be understood. There are two major approaches introduced to deal with this problem. One is unit-selection synthesis and the other one i ...make the sewing part of the process more natural. The latter objective is to optimize the concatenation cost. The overall cost for this approach can be ...

10 KB (1,678 words) - 09:46, 30 August 2017
adaptive dimension reduction for clustering high dimensional data
...somewhere close to the initial configurations. The conventional method is to try a number of initial values, and pick up the best one of the results. ...ated with clustering process (ii) make effective use of cluster membership to connect reduced dimensional space and full dimension space. ...

9 KB (1,428 words) - 09:46, 30 August 2017
importance Sampling and Markov Chain Monte Carlo (MCMC)
Apply the idea of importance sampling to both numerator and denominator: This is very important and useful especially when f is known only up to a proportionality constant. Often, this is the case in Bayesian approach wh ...

6 KB (1,113 words) - 09:45, 30 August 2017
mULTIPLE OBJECT RECOGNITION WITH VISUAL ATTENTION
...eep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. It has been shown that the pr ...e process continues until the model decides that there are no more objects to process. ...

11 KB (1,714 words) - 09:46, 30 August 2017
This Looks Like That: Deep Learning for Interpretable Image Recognition
...ks. The goal of the algorithm is to utilize human-understandable reasoning to perform image classification tasks. ...ility is critical, where diagnosis using X-ray scans is based on comparing to other prototypical scans. [1] ...

10 KB (1,573 words) - 23:36, 9 December 2020
Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree
...wx + b = 0 separate the set exactly with the distance from the hyperplane to the nearest sample is maximum, then this hyperplane will be called an optim ...methods. In the kernel methods, we map the sample from the original space to a higher dimension and then we construct a hyperplane that separates the sa ...

9 KB (1,392 words) - 01:45, 23 November 2021
cardinality Restricted Boltzmann Machines
...ls. The first notion is the sparsity of the graph (i.e., something related to the number of edges in the graph). In this sense, a sparse model is one tha ...neural network literature, it means that we do not want the entire neurons to be activated in the same time. For our RBM, we want that the values of hidd ...

9 KB (1,501 words) - 09:46, 30 August 2017
semi-supervised Learning with Deep Generative Models
...n machine based Deep Belief Network (DBN). Where layers of RBM are trained to learn unsupervised features of the data and then a final classification lay ...e on benchmark tasks and uses deep neural networks in an innovative manner to create a layered semi-supervised classification/generation model. ...

9 KB (1,554 words) - 09:46, 30 August 2017
self-Taught Learning
...her type (i.e. has class labels that do not apply to data set that we wish to classify). ...hasize that the unlabeled data need not belong to the class labels we wish to assign, as long as it is related. This fact distinguishes it from semi-supe ...

12 KB (1,871 words) - 09:45, 30 August 2017
contributions on Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
...synthesis requires a much larger and more complex set of contexts in order to achieve high quality synthesised speech. Examples of such contexts are the * Identity of neighbouring phones to the central phone. Two phones to the left and the right of the centre phone are usually considered as phonet ...

8 KB (1,374 words) - 09:45, 30 August 2017
deep Learning of the tissue-regulated splicing code
...twork (DNN) in predicting outcome of splicing, and compare the performance to formerly trained model Bayesian Neural Network<ref>https://www.cs.cmu.edu/a The cost function we want to minimize here during training is <math>E=-\sum_a\sum_{k=1}^{C}{y_{n,k}log(h ...

8 KB (1,353 words) - 09:46, 30 August 2017
f17Stat946PaperSignUp
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

10 KB (1,213 words) - 19:28, 19 November 2020
Automatic Bank Fraud Detection Using Support Vector Machines
...rvised learning). These methods were then tested on various bank databases to determine how effective they were in detecting fraud. ...a higher probability of being fraudulent. Different methods have been used to tackle this problem impacting financial institutions such as: Bayesian Netw ...

12 KB (1,776 words) - 19:07, 24 November 2021
video-based face recognition using Adaptive HMM
...e or still images. It is very complex problem with high dimensionality due to the nature of digital images. Face recognition benefits many fields such a ...by speaker adaptation, this paper presents an Adaptive Hidden Markov model to recognize human face from frames sequence. The proposed model trains HMM on ...

10 KB (1,640 words) - 09:46, 30 August 2017
positive Semidefinite Metric Learning Using Boosting-like Algorithms
...etric learning algorithms because it uses a special optimization technique to solve the semi-definite programming (SDP) problem. ...etric to be positive semi-definite. Semi-definite programming is difficult to implement and does not scale well. Based on observation that any positive s ...

9 KB (1,558 words) - 09:46, 30 August 2017
maximum-Margin Matrix Factorization
...ome <math>\pm1</math> while some other cells are unknown. The main goal is to find matrix X such than it preserves the knowledge in Y, and predicts the v ...oblem of over-fitting in the prediction process. If the rank of X is equal to the rank of Y, we will have the trivial solution of X = Y. ...

12 KB (2,046 words) - 09:45, 30 August 2017
hamming Distance Metric Learning
...umber of samples. Like other metric learning methods this paper also tries to optimize some cost function which is based one a similarity measure between The task is to learn a mapping from b(x) that project p-dimensional real valued input x on ...

10 KB (1,792 words) - 09:46, 30 August 2017
strategies for Training Large Scale Neural Network Language Models
...-based implementation of a class based maximum entropy model, that allows to easily control the trade-off between memory complexity and computational ...training neural network language models with maximum entropy models leads to better performance in terms of computational complexity. ...

9 KB (1,542 words) - 09:46, 30 August 2017
monte Carlo Integration
Note that <math>\displaystyle\lim_{N\to\infty}\tilde{\mu} = \bar{x}</math>. Also note that when <math>N = 0</math> ...mann as a Los Alamos code word for the stochastic simulations they applied to building better atomic bombs. ...

5 KB (870 words) - 09:45, 30 August 2017
A Knowledge-Grounded Neural Conversation Model
...igh demand there is incentive to build systems that can respond seamlessly to requests. ...d opinion or fact-based content. The ability to do this would elevate them to the level of task-oriented conversational applications. ...

11 KB (1,713 words) - 13:09, 20 March 2018
U-Time:A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging Summary
...pecially trained to be applied on one dataset alone and might be difficult to use for non-experts in a more general setting (Perslev et al., 2019). ...r architectural tuning to be applied to variable data sets, and it is able to classify sleep stages at any temporal resolution (Perslev et al., 2019). ...

8 KB (1,170 words) - 01:41, 26 November 2021
Do Vision Transformers See Like CNN
...ransformers, to learn how exactly the vision transformer solves its tasks, to compare and contrast the results of these two different architectures, and ...ResNet model. Typical CNNs often suffer if the model is deep, largely due to vanishing gradients. ...

13 KB (2,006 words) - 00:11, 17 November 2021
measuring statistical dependence with Hilbert-Schmidt norms
...and kernel dependence measures based on RKHSs, and generalizes the measure to metric spaces. ...ween random variables is simple. However, testing independence is hard due to the unknown non-linearity of the data. There is a theorem that binds the tw ...

8 KB (1,240 words) - 09:46, 30 August 2017
On The Convergence Of ADAM And Beyond
...ial moving averages of squared past gradients, thereby limiting the update to only rely on the past few gradients. The following formula shows the per-pa ...is suggest that this can be prevented through novel but simple adjustments to the ADAM optimization algorithm, which can improve convergence. This paper ...

13 KB (2,153 words) - 16:54, 20 April 2018
Poison Frogs Neural Networks
...fic test instance, and ''clean-label'' attacks do not require the attacker to have control over the poison’s labeling. ...on a pretrained InceptionV3 network and sees up to 70% success rate on end-to-end trained scaled-down AlexNet architecture when using watermarks and mult ...

11 KB (1,590 words) - 18:29, 26 November 2021
kernel Dimension Reduction in Regression
...ression methods, marginal distribution of explanatory variables are needed to calculate the independency measurement. This paper proposes that conditiona ...ptions with respect to the probability distribution of X which can be hard to justify[ref]. Most of the previous regression methods assume linearity betw ...

6 KB (1,132 words) - 09:46, 30 August 2017
sparse PCA
...>d</math> variables has its own special meaning and it may be desirable to come up with some directions, as principal components, each of which is a combin ...have a limited number of non-zero elements. In other words, this helps us to perform feature selection, by selecting a subset of features in each direct ...

13 KB (2,202 words) - 09:45, 30 August 2017
deep Generative Stochastic Networks Trainable by Backprop
from <math>P_o(X|\bar{X})</math>, which is trained to estimate the ground truth <math>P(X|\bar{X})</math> P(X) and its partition function is thus easier to approximate. ...

12 KB (1,906 words) - 09:46, 30 August 2017
stat441w18/Saliency-based Sequential Image Attention with Multiset Prediction
...exhibit unexpected and unintuitive behaviour, allowing minor perturbations to cause a complete misclassification. In addition, the classifier may accurat ...a saliency detection method to determine the focus of the classifier, and to understand how the classifier makes its decisions. ...

12 KB (1,840 words) - 14:09, 20 March 2018
very Deep Convoloutional Networks for Large-Scale Image Recognition
...of very small (3 × 3) convolution filters in all layers. As a result, they come up with significantly more accurate ConvNet architectures. During training, the only preprocessing step is to subtract the mean RBG value computed on the training data. Then, the image ...

11 KB (1,680 words) - 09:46, 30 August 2017

Search results

Page title matches

Page text matches

Navigation menu

Search