Search results

Page title matches

Learning to Teach
This is a summary of the paper titled: "Learning to Teach", authored by Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, and Tie-Yan ...ent, determining the appropriate data, loss function, and hypothesis space to facilitate the learning of the student model. ...

21 KB (3,351 words) - 18:40, 16 December 2018
sandbox to test w2l

54 bytes (12 words) - 09:46, 30 August 2017
Learning What and Where to Draw
...label or a non-localized caption. The authors of 'Learning What and Where to Draw' believe that image synthesis will be drastically enhanced by incorpor ...scription what each image is intended to depict. The proposed model learns to perform location and content-controllable image synthesis on the Caltech-UC ...

18 KB (2,781 words) - 12:35, 4 December 2017
link to my paper
</ref>. Now we turn to ...

204 bytes (22 words) - 09:45, 30 August 2017
From Variational to Deterministic Autoencoders
...er are able to generate samples that are comparable or better when applied to domains of images and structured objects. The authors point to several drawbacks currently associated with VAE's including: ...

15 KB (2,313 words) - 19:11, 2 December 2020
Link to my paper
#REDIRECT [[link to my paper]] ...

30 bytes (5 words) - 09:45, 30 August 2017
Sandbox to test w2l
#REDIRECT [[sandbox to test w2l]] ...

33 bytes (6 words) - 09:46, 30 August 2017
Pixels to Graphs by Associative Embedding
...ons between them. An explicit representation of this semantics is referred to as a scene graph where we represent objects grounded in the scene as vertic ...all of the objects in the scene, then isolate individual pairs of objects to identify the relationships between them. This breakdown often restricts the ...

17 KB (2,749 words) - 18:26, 16 December 2018
learning Spectral Clustering, With Application To Speech Separation
...hat generalizes to the unseen datasets when spectral clustering is applied to them. Traditional spectral clustering techniques assume a metric or a simil Clustering refers to partition a given dataset into clusters such that data points in the same c ...

35 KB (5,767 words) - 09:45, 30 August 2017
Towards Deep Learning Models Resistant to Adversarial Attacks
This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmid ...e|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.[https://arxiv.org/abs/1412.6572 Source]]] ...

14 KB (2,192 words) - 03:01, 23 November 2018
Learning Spectral Clustering, With Application To Speech Separation
#REDIRECT [[learning Spectral Clustering, With Application To Speech Separation]] ...

81 bytes (9 words) - 09:45, 30 August 2017
Learning to Navigate in Cities Without a Map
[https://arxiv.org/pdf/1804.00168.pdf Learning to Navigate in Cities Without a Map] ...forcement learning (RL), it suffers from data inefficiency and sensitivity to changes in the environment. Thus, it is unclear whether this method could b ...

28 KB (4,494 words) - 00:24, 17 December 2018
DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION
...ue function, directly on latent state samples which help to enable scaling to more complex tasks. ...omes with using finite imagination horizons. The authors have also managed to demonstrate empirical performance for visual control by evaluating the mode ...

13 KB (2,072 words) - 06:07, 10 December 2020
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
...n from Tel Aviv University. This paper is part of the NIPS 2018 conference to be hosted in December 2018 at Montréal, Canada. This paper summary is based ...framework for capturing such effects is structured prediction, which seeks to predict structured objects (such as graphs with nodes and edges) rather tha ...

29 KB (4,603 words) - 21:21, 6 December 2018
neural Machine Translation: Jointly Learning to Align and Translate
...hod is more effective compared to other neural network models when applied to long sentences. ...word. The decoder then selectively combines the most relevant annotations to generate each target word; this implements a mechanism of attention in the ...

14 KB (2,221 words) - 09:46, 30 August 2017
stat946F18/Beyond Word Importance Contextual Decomposition to Extract Interactions from LSTMs
...for analyzing individual predictions made by the LSTMs without any change to the underlying original model. The problem of sentiment analysis is chosen ...n domain, this paper shows how the contextual decomposition method is used to successfully extract positive and negative negations from an LSTM. This pap ...

31 KB (5,069 words) - 18:21, 16 December 2018
ROBPCA: A New Approach to Robust Principal Component Analysis
#REDIRECT [[rOBPCA: A New Approach to Robust Principal Component Analysis]] ...

75 bytes (10 words) - 09:46, 30 August 2017
Summary of A Probabilistic Approach to Neural Network Pruning
...proposes that the subnetworks can achieve similar accuracy without having to be further trained. However, finding these lottery tickets inside a large n ...theoretical guarantees of pruning. This study, ''A Probabilistic Approach to Neural Network Pruning'' by Xin Qian and Diego Klabjan [18], focuses on the ...

28 KB (4,367 words) - 00:30, 23 November 2021
A Game Theoretic Approach to Class-wise Selective Rationalization
...alternative conclusions. Each class consists of three players who compete to find evidence for both factual and counterfactual circumstances. In a simpl ...ng explanations for a specific class by probing the importance with regard to the relevant class logit. ...

11 KB (1,594 words) - 13:14, 25 November 2021
Neural Machine Translation: Jointly Learning to Align and Translate
#REDIRECT [[neural Machine Translation: Jointly Learning to Align and Translate]] ...

81 bytes (10 words) - 09:46, 30 August 2017
rOBPCA: A New Approach to Robust Principal Component Analysis
...ix. Since the classical estimation for covariance matrix is very sensitive to the presence of outliers, it is not surprising that the principal component ...to show that Bayesian robust estimator may be alternative choice compared to classical robust estimators. ...

15 KB (2,414 words) - 09:46, 30 August 2017
STAT946F17/ Teaching Machines to Describe Images via Natural Language Feedback
...rd in that we can easily point to where the mistakes occur and suggest how to correct them. ...n also be seen as a multimodal problem where the whole network/model needs to combine the solution space of learning in both the image processing and tex ...

23 KB (3,760 words) - 10:33, 4 December 2017
a Rank Minimization Heuristic with Application to Minimum Order System Approximation
...stics and signal processing. Except in some special cases the RMP is known to be computationally hard. \mbox{subject to: } & X \in C, ...

8 KB (1,446 words) - 09:45, 30 August 2017
a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
...pular online shopping website Amazon.com for recommending related products to users of Amazon.com based on what these users have recently purchased from Our goal, then, is to predict or infer the other preferences---in a sense, completing the matrix. ...

24 KB (3,853 words) - 09:45, 30 August 2017
A Rank Minimization Heuristic with Application to Minimum Order System Approximation
#REDIRECT [[a Rank Minimization Heuristic with Application to Minimum Order System Approximation]] ...

98 bytes (12 words) - 09:45, 30 August 2017
A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
#REDIRECT [[a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization]] ...

105 bytes (12 words) - 09:45, 30 August 2017
a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis
...rform [http://en.wikipedia.org/wiki/Inference inference] across data sets. To this end, they demonstrate their penalized CCA method on a genomic data set ...r value decomposition will give the best rank-<math>r</math> approximation to the matrix. ...

30 KB (4,829 words) - 09:45, 30 August 2017
Obfuscated Gradients Give a False Sense of Security Circumventing Defenses to Adversarial Examples
...lassify with high confidence. These attacks pose a major threat that needs to be addressed before these systems can be deployed on a large scale, especia ...much lower than claimed. In fact, the majority of these attacks were found to be ineffective against true iterative white box attacks. ...

27 KB (3,974 words) - 17:54, 6 December 2018
A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis
#REDIRECT [[a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...

131 bytes (15 words) - 09:45, 30 August 2017
Graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns
...RECT [[graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns]] ...

145 bytes (17 words) - 09:46, 30 August 2017
graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns
...ide information and incorporate the side information in the classification to improve the algorithms. ...uctured classification problem in practice, we need both an expressive way to represent our beliefs about the structure, as well as an efficient probabil ...

17 KB (2,924 words) - 09:46, 30 August 2017
End to end Active Object Tracking via Reinforcement Learning
...g box labeling. In addition, Camera Control is non-trivial, which can lead to many expensive trial-and-errors in the real world. To address these challenges, this paper presents an end-to-end active tracking solution via deep reinforcement learning. More specific ...

29 KB (4,453 words) - 18:27, 16 December 2018
DeepVO Towards end to end visual odometry with deep RNN
...lude the VO field, thus the paper proposes a novel deep-learning based end-to-end VO algorithm and then empirically demonstrates its viability. ...ture based methods and direct methods, which differ in the method employed to select reference points. Sparse feature based methods establish reference p ...

16 KB (2,430 words) - 18:30, 16 December 2018
stat946f15/Sequence to sequence learning with neural networks
...amount of work to learn more than one language past childhood. The ability to efficiently and quickly translate between languages would then be of great ...s that capture their meaning, as sentences with similar meanings are close to each other while sentences with different meanings will be far. ...

23 KB (3,755 words) - 19:49, 5 February 2018
End-to-End Differentiable Adversarial Imitation Learning
...is that the training requires large amounts of expert data, which is hard to obtain. In addition, an agent trained using BC is unaware of how its action ...re it takes each action since the transition function to move from state A to state B is not learned. ...

24 KB (3,880 words) - 23:00, 20 April 2018
Stat946f15/Sequence to sequence learning with neural networks
#REDIRECT [[stat946f15/Sequence to sequence learning with neural networks]] ...

75 bytes (10 words) - 09:46, 30 August 2017
Augmix: New Data Augmentation method to increase the robustness of the algorithm
...s & Dietterich (2019), showing that the classification error rose from 25% to 62% when some corruption was introduced on the ImageNet test set. ...ce that networks trained on translation augmentations are highly sensitive to the shifting of pixels. ...

11 KB (1,652 words) - 18:44, 6 December 2020
U-Time:A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging Summary
...pecially trained to be applied on one dataset alone and might be difficult to use for non-experts in a more general setting (Perslev et al., 2019). ...r architectural tuning to be applied to variable data sets, and it is able to classify sleep stages at any temporal resolution (Perslev et al., 2019). ...

8 KB (1,170 words) - 01:41, 26 November 2021
From Machine Learning to Machine Reasoning
#REDIRECT [[from Machine Learning to Machine Reasoning]] ...

56 bytes (7 words) - 09:46, 30 August 2017
Convolutional Sequence to Sequence Learning
'''Sequence to sequence learning''' has been used to solve many tasks such as machine translation, speech recognition, and text ...other. This allows to precisely control the maximum length of dependencies to be modeled. ...

27 KB (4,178 words) - 20:37, 28 November 2017
from Machine Learning to Machine Reasoning
...82, 273–302.</ref>. Algorithms for inference do exist but they do however, come at a price of reduced expressive capabilities in logical inference and prob ...ut not yet formal or logical. Informal logic is attractive because we hope to avoid the computational complexity that is associated with combinatorial se ...

21 KB (3,225 words) - 09:46, 30 August 2017
learn what not to learn
...ees to climb"). Then a machine learning model can be trained to generalize to unseen states. ...with high probability. '''Note that the core assumption is that it is easy to predict which actions are invalid or inferior in each state and leverage th ...

29 KB (4,751 words) - 13:38, 17 December 2018

Page text matches

Imagination Augmented Agents for Deep Reinforcement Learning
...experience, yet in complex domains for which a simulator is not available to the agents, the performance of model-based agents employing standard planni ...ct useful knowledge gathered from model simulations. This allows the agent to benefit from model-based imagination without the pitfalls of conventional m ...

2 KB (210 words) - 20:39, 9 March 2018
is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction
...ed using a convex combination to a number of clusters rather than uniquely to one cluster. This is an unsupervised version of the so-called multi-class c ...e data, authors have recently proposed discrete analogues to PCA. We refer to the method as multinomial PCA(mPCA) because it is a precise multinomial ana ...

2 KB (321 words) - 09:45, 30 August 2017
Deep Transfer Learning with Joint Adaptation Networks
...ean discrepancy (JMMD) criterion. Adversarial training strategy is adopted to maximize JMMD such that the distributions of the source and target domains ...

760 bytes (109 words) - 15:32, 2 October 2017
Link to my paper
#REDIRECT [[link to my paper]] ...

30 bytes (5 words) - 09:45, 30 August 2017
Sandbox to test w2l
#REDIRECT [[sandbox to test w2l]] ...

33 bytes (6 words) - 09:46, 30 August 2017
importance Sampling June 2 2009
...lt but there exists a probability distribution function g(x) which is easy to sample from, then <math>I</math> can be written as ...playstyle E_g(w(x)) \rightarrow</math>the expectation of w(x) with respect to g(x) ...

2 KB (395 words) - 09:45, 30 August 2017
Learning Spectral Clustering, With Application To Speech Separation
#REDIRECT [[learning Spectral Clustering, With Application To Speech Separation]] ...

81 bytes (9 words) - 09:45, 30 August 2017
Neural Machine Translation: Jointly Learning to Align and Translate
#REDIRECT [[neural Machine Translation: Jointly Learning to Align and Translate]] ...

81 bytes (10 words) - 09:46, 30 August 2017
ROBPCA: A New Approach to Robust Principal Component Analysis
#REDIRECT [[rOBPCA: A New Approach to Robust Principal Component Analysis]] ...

75 bytes (10 words) - 09:46, 30 August 2017
A Rank Minimization Heuristic with Application to Minimum Order System Approximation
#REDIRECT [[a Rank Minimization Heuristic with Application to Minimum Order System Approximation]] ...

98 bytes (12 words) - 09:45, 30 August 2017
A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
#REDIRECT [[a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization]] ...

105 bytes (12 words) - 09:45, 30 August 2017
deepGenerativeModels
To properly train a neural network a large labeled dataset, however large data These models scale linearly in proportion to the number of classes in the data sets. The number of evaluations could be ...

466 bytes (70 words) - 09:46, 30 August 2017
A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis
#REDIRECT [[a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...

131 bytes (15 words) - 09:45, 30 August 2017
contributions on Video-Based Face Recognition Using Adaptive Hidden Markov Models
...the methods used for video-based face recognition were based on the still-to-still techniques which aimed at selecting good frame and then performed som ...s kind of application, which is called online video. The other scenario is to process the video content offline, like indexing the meeting records or ana ...

3 KB (512 words) - 09:45, 30 August 2017
Pre-Training-Tasks-For-Embedding-Based-Large-Scale-Retrieval
...ifically, this paper explores whether we can train machine learning models to learn from dialog. *Evaluated some baseline models on this data and compared them to standard supervised learning. ...

2 KB (309 words) - 19:52, 17 November 2020
Graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns
...RECT [[graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns]] ...

145 bytes (17 words) - 09:46, 30 August 2017
Stat946f15/Sequence to sequence learning with neural networks
#REDIRECT [[stat946f15/Sequence to sequence learning with neural networks]] ...

75 bytes (10 words) - 09:46, 30 August 2017
From Machine Learning to Machine Reasoning
#REDIRECT [[from Machine Learning to Machine Reasoning]] ...

56 bytes (7 words) - 09:46, 30 August 2017
sign up for your presentation
...following table. Put your name and a link to the paper that you are going to present. Chose a date between Nov 16 and Dec 2 (inclusive). .../Correlate/pmd.pdf], [[A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis|Summary]] ...

4 KB (570 words) - 09:45, 30 August 2017
a Dynamic Bayesian Network Click Model for web search ranking
...a document they will not examine the next document. This model is similar to hidden Markov model in that there is a conditional dependency between the p ...that if a URL clicked by a user that means it’s both examined and relevant to the query . In another word , given a query q , position i and URL u the pr ...

3 KB (593 words) - 09:46, 30 August 2017
Hash Embeddings for Efficient Word Representations
...hence because of this for whatever data we need to feed in the network has to be continuous in nature. Images can easily be represented as real-valued ve ...parameters it needs to learn is quite high. There have been some solutions to it: ...

4 KB (646 words) - 19:44, 26 October 2017
Batch Normalization Summary
...batch-normalization layers right before the activations (to have the input to the activations be normalized as desired). Both networks were trained with ...he 15th, 50th, and 85th percentiles of the input were recorded. The figure to the left demonstrates how these values changed during training. The y axis ...

4 KB (637 words) - 02:07, 28 November 2018
learning2reasoning
...properties (cite). Algorithms for inference do exist but they do however, come at a price of reduced expressive capabilities in logical inference and prob ...

852 bytes (116 words) - 09:46, 30 August 2017
paper Summaries
==A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis== [[A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...

2 KB (222 words) - 09:45, 30 August 2017
acceptance-Rejection Sampling
...>; on the other hand we would reject the samples if the ratio is not close to 1. At x=9; we will reject samples according to the ratio <math> \frac {f(x)}{c \cdot g(x)} </math> after sampling from <ma ...

6 KB (937 words) - 09:45, 30 August 2017
Task Understanding from Confushing Multitask Data
3 data sets are used to compare CSL to existing methods, 1 function regression task and 2 image classification tas ...s <math>f_j</math> as well as determine which mapping function corresponds to each of the <math>m</math> observations. 3 scalar-valued, scalar-input func ...

5 KB (878 words) - 19:25, 15 November 2020
Meta-Learning-For-Domain Generalization
...d during training time. Here by defining tasks as domains, the paper tries to overcome the problem in a model-agnostic way. ...

1 KB (200 words) - 15:47, 9 November 2020
copyofstat341
...sed for the uniform distribution, other methods must be developed in order to generate pseudo random numbers from other distributions. ...he fact that when a random sample from the uniform distribution is applied to the inverse of a cumulative density function (cdf) of some distribution, th ...

5 KB (836 words) - 09:45, 30 August 2017
again on Markov Chain
...on of its classes. This decomposition is always possible and it is reduced to one class only in the case of an irreducible chain. ...ath> The state 3 can go to every other state but none of the others can go to it ...

7 KB (1,129 words) - 09:45, 30 August 2017
proof of Lemma 1
-\textbf{u}^T\textbf{a} \; \textrm{ subject } \; \textrm{ to } \; \|\textbf{u}\|^2_2 \leq 1, \; \|\textbf{u}\|_1 \leq c_1 and we differentiate, set the derivative to 0 and solve for <math>\textbf{u}</math>: ...

2 KB (311 words) - 09:45, 30 August 2017
main Page
'''NOTE: Wiki has been migrated from wikicoursenote.com to wiki.math.uwaterloo.ca/statwiki''' ==Go to [[stat841f10|Stat441/841 & CM 463/763-Fall 2010]] == ...

5 KB (769 words) - 22:53, 5 September 2021
s13Stat946proposal
...pefully, the pattern of the teams and lineups in the latent space can lead to interesting conclusions. Secondly, we apply the selected methods to lineup data sets and get the plots of the lineups in the low-dimensional sp ...

6 KB (983 words) - 09:46, 30 August 2017
markov Chain Definitions
...<math>f(x)</math> so that a variation of importance estimation can be used to estimate an integral in the form All that is required is a Markov chain which eventually converges to <math>f(x)</math>. ...

5 KB (865 words) - 09:45, 30 August 2017
Batch Normalization
...ork, the inputs are no longer normalized at each hidden layer. So, we want to reduce this internal covariate shift by normalizing the input at each hidde ...However, this is a very expensive operation, and does not necessarily lead to a gradient function that is well defined. ...

6 KB (931 words) - 21:10, 28 November 2018
proposal for STAT946 (Deep Learning) final projects Fall 2015
...r the gander , some of which occasionally amuses but none of which amounts to much of a story” contains negative sentiment, but it is not immediately cle This competition seeks to implement machine learning algorithms that can determine the sentiment of a ...

7 KB (1,125 words) - 09:46, 30 August 2017
bayesian and Frequentist Schools of Thought
...n the Bayesian and Frequentist views on probability, along with references to '''Bayesian Inference'''. ...enough, by the central limit theorem, the Normal distribution can be used to approximate a Binomial distribution. ...

6 KB (924 words) - 09:45, 30 August 2017
f11Stat946presentation
...n up your name at the moment. When you chose the paper that you would like to present, add its title and a link to the paper. ...

3 KB (418 words) - 09:45, 30 August 2017
Deep Residual Learning for Image Recognition Summary
...ces as the parameters in the model are tuned, and thus the model is unable to evolve. ...would result in the error values of the deeper network being at most equal to those of the shallower network. However, this result is not seen in practic ...

6 KB (1,020 words) - 12:01, 3 December 2021
Convolutional Neural Networks for Sentence Classiﬁcation
...riants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning ta ...different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations us ...

7 KB (1,086 words) - 22:49, 13 November 2018
a Deeper Look into Importance Sampling
...playstyle E_g(h(x)) \rightarrow</math>the expectation of h(x) with respect to g(x), where <math>\displaystyle \frac{f(x)}{g(x)} </math> is a weight <math The method of Importance Sampling is simple but can lead to some problems. The <math> \displaystyle \hat I </math> estimated by Importa ...

6 KB (1,083 words) - 09:45, 30 August 2017
stat940F21
|width="30pt"|Link to the paper |width="30pt"|Link to the video ...

5 KB (642 words) - 23:29, 1 December 2021
test1
{{Cleanup|date=September 2010|reason=explain what needs to be done}} ...

255 bytes (46 words) - 09:45, 30 August 2017
Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network
...nvolutional Neural Network, and Support Vector Machine models are proposed to address this issue. In 2019, Aashrith and et al. used CNN to recognize traffic signs. They achieved 99.18% accuracy on Belgium Data and ...

4 KB (515 words) - 18:44, 17 December 2021
the Indian Buffet Process: An Introduction and Review
The Indian buffet process can also be used to define a prior distribution in any setting where the where <math> \alpha </math> is a hyper-parameter, which is similar to the parameter defined in DP. ...

6 KB (1,032 words) - 09:46, 30 August 2017
deflation Method for Penalized Matrix Decomposition Sparse PCA
...Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". ''Biostati The penalized matrix decomposition can be used to obtain a version of sparse PCA. In this case, ...

2 KB (277 words) - 09:45, 30 August 2017
metric and Kernel Learning Using a Linear Transformation
...tructure in the data explicityly but most of them are unable to generalize to new added data points as only implicit non-linear transformation is given. ...that it can handle out-of-sample extensions. Also, even though the matrix to be learned may be infinite-dimensional, it can be fully represented in term ...

6 KB (1,007 words) - 09:46, 30 August 2017
genetics
...o account to detect variants of mutations. This procedure should enable us to prognosis, diagnosis, and/or control a wide variety of diseases. ...type of interruptions on this important step of gene expression would lead to various kind of disease such as cancers and neurological disorders. ...

6 KB (980 words) - 09:46, 30 August 2017
proof
...<math>\lambda_{\max}(\cdot)</math> at the matrix <math>X \in S_n</math>. To do this we must first define the subgradient. ...tion we are interested in is <math>\lambda_{\max}(\cdot)</math>. In order to define the subgradiant of this function we must first ensure it is convex. ...

3 KB (589 words) - 09:45, 30 August 2017
importance Sampling and Monte Carlo Simulation
...order to get a distribution for the probability 'p' of a Binomial, we have to divide the Binomial distribution by n. This new distribution has the same s # Compute <math>\displaystyle \delta = p_1 - p_2</math> in order to get n values for <math>\displaystyle \delta</math>; ...

7 KB (1,232 words) - 09:45, 30 August 2017
link to my paper
</ref>. Now we turn to ...

204 bytes (22 words) - 09:45, 30 August 2017
Deep Learning for Extreme Multi-label Text Classification
...out. However, the shortcomings of the existing methods are inevitable due to data sparsity and scalability. With deep learning and Convolutional Neural ...interpret. Therefore, the concept of compressing label space is introduced to effectively create lower-dimensional label vectors using either linear or n ...

6 KB (969 words) - 21:50, 13 November 2021
binomial Probability Monte Carlo Sampling June 2 2009
...order to get a distribution for the probability 'p' of a Binomial, we have to divide the Binomial distribution by n. This new distribution has the same s # Compute <math>\displaystyle \delta = p_1 - p_2</math> in order to get n values for <math>\displaystyle \delta</math>; ...

5 KB (788 words) - 09:45, 30 August 2017
Dynamic Routing Between Capsulesl
...cases, we want to reduce the number of dimensions because we always want to save computations. The reason behind this kind of pooling method is based o ...od is that, it only passes the local patterns into the next layer. That is to say, if our original data set doesn't have the good property of neighborhoo ...

8 KB (1,394 words) - 19:54, 20 March 2018
Unsupervised Machine Translation Using Monolingual Corpora Only
The paper presents an unsupervised method to machine translation using only monoligual corpora without any alignment bet The general approach of the methodology is to first use a unsupervised word-by-word translation model proposed by [Connea ...

8 KB (1,359 words) - 22:48, 19 November 2018
generating Random Numbers
...lling a fair die repetitively to produce a series of random numbers from 1 to 6). One way to generate pseudo random numbers from the uniform distribution is using the ' ...

8 KB (1,324 words) - 09:45, 30 August 2017
a Dynamic Bayesian Network Click Model for Web Search Ranking
...users click on what appears as the first search results and it is unlikely to click on results that do not appear at the beginning, even though relevant. ...el'' of user behavior, assumes that the user scans search results from top to bottom and eventually stops because either their information need is satisf ...

11 KB (1,852 words) - 09:45, 30 August 2017
hierarchical Dirichlet Processes
...osal generally cannot model shared information between groups. One idea is to make <math>G_0</math> become discrete by limiting the choice of <math> G_0 ...e measure. Note that <math>G_0</math> is discrete with probability one due to the fact of Dirichlet process. ...

8 KB (1,341 words) - 09:46, 30 August 2017
large-Scale Supervised Sparse Principal Component Analysis
...s that it is computationally expensive. Many algorithms have been proposed to solve the sparse PCA problem, and the authors introduced a fast block coord ...nsion of the data. Since <math>\hat{n}</math> could be very small compared to the dimension <math>n</math> of the data, this algorithm is computationally ...

7 KB (1,209 words) - 09:46, 30 August 2017
nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization
...s with computing k-nearest neighbors of each input and adding a constraint to preserve distances and angles between k-nearest neighbors: and also a constraint on outputs to be centerd on the origin: ...

7 KB (1,093 words) - 09:45, 30 August 2017
deep Sparse Rectifier Neural Networks
...easy to train and easy to generalize, while neuroscientists' objective is to produce useful representation of the scientific data. In other words, machi ...e at 1/2 of their maximum rate when at zero. A solution to this problem is to use a rectifier neuron which does not fire at it's zero value. This rectifi ...

9 KB (1,338 words) - 09:46, 30 August 2017
stat441w18/summary 1
...based methods where they learn the i-th training examples are "remembered" to learn for corresponding weights. Prediction on untrained examples are then ...nal feature space and then apply existing linear methods. The main goal is to reduce the bottleneck of kernel-based inference methods. ...

5 KB (753 words) - 12:51, 7 March 2018
stat946F18
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

14 KB (1,851 words) - 03:22, 2 December 2018
stat441F21
|width="15pt"|Link to the paper |width="30pt"|Link to the summary ...

8 KB (1,194 words) - 04:28, 1 December 2021
measuring and testing dependence by correlation of distances
...o random variables could be in different dimensions. Second, dCov is equal to zero if and only is the two variables are independent. ...ritten in terms of the expectations of Euclidean distances which is easier to interpret: ...

4 KB (586 words) - 09:46, 30 August 2017
a Rank Minimization Heuristic with Application to Minimum Order System Approximation
...stics and signal processing. Except in some special cases the RMP is known to be computationally hard. \mbox{subject to: } & X \in C, ...

8 KB (1,446 words) - 09:45, 30 August 2017
on the Number of Linear Regions of Deep Neural Networks
...rger. Furthermore, having many layers can theoretically cause problems due to vanishing gradients. ...number of input regions. This is caused by the deep hierarchy which allows to apply the same computation across different regions of the input space. ...

8 KB (1,391 words) - 09:46, 30 August 2017
STAT946F20/BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
..."bank" as a "financial institution" or the "land alongside or sloping down to a river or lake". ...e positional encoding, which has the same dimension as the word embedding, to obtain the sequential information of the inputs. BERT is built by the N uni ...

9 KB (1,342 words) - 06:36, 10 December 2020
Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases
...f pneumonia in CT Scan images. Then they carried out 10 k cross validation to estimate the model will perform on unseen dataset. And finally they evaluat ...iologists, radiologists and computer scientists have been working together to detect microbial diseases such as tuberculosis, malaria and pneumonia using ...

7 KB (974 words) - 14:56, 21 November 2021
techniques for Normal and Gamma Sampling
...sform Method and sample from independent uniform distributions seen before to generate a sample following a Gamma distribution. ...le to use the Acceptance-Rejection method, but there are still better ways to sample from a Standard Normal Distribution. ...

7 KB (1,114 words) - 09:45, 30 August 2017
test
...imal). This creates a big problem, as this method becomes very susceptible to poor data (i.e., not very robust). This intuitively makes sense, as the age ...e noisy demonstration to be ranked according to their relative performance to each other. Another similar method requires extra labelling of the data wit ...

10 KB (1,526 words) - 17:39, 26 November 2021
context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
...by its ability to be understood. There are two major approaches introduced to deal with this problem. One is unit-selection synthesis and the other one i ...make the sewing part of the process more natural. The latter objective is to optimize the concatenation cost. The overall cost for this approach can be ...

10 KB (1,678 words) - 09:46, 30 August 2017
adaptive dimension reduction for clustering high dimensional data
...somewhere close to the initial configurations. The conventional method is to try a number of initial values, and pick up the best one of the results. ...ated with clustering process (ii) make effective use of cluster membership to connect reduced dimensional space and full dimension space. ...

9 KB (1,428 words) - 09:46, 30 August 2017
importance Sampling and Markov Chain Monte Carlo (MCMC)
Apply the idea of importance sampling to both numerator and denominator: This is very important and useful especially when f is known only up to a proportionality constant. Often, this is the case in Bayesian approach wh ...

6 KB (1,113 words) - 09:45, 30 August 2017
mULTIPLE OBJECT RECOGNITION WITH VISUAL ATTENTION
...eep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. It has been shown that the pr ...e process continues until the model decides that there are no more objects to process. ...

11 KB (1,714 words) - 09:46, 30 August 2017
This Looks Like That: Deep Learning for Interpretable Image Recognition
...ks. The goal of the algorithm is to utilize human-understandable reasoning to perform image classification tasks. ...ility is critical, where diagnosis using X-ray scans is based on comparing to other prototypical scans. [1] ...

10 KB (1,573 words) - 23:36, 9 December 2020
Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree
...wx + b = 0 separate the set exactly with the distance from the hyperplane to the nearest sample is maximum, then this hyperplane will be called an optim ...methods. In the kernel methods, we map the sample from the original space to a higher dimension and then we construct a hyperplane that separates the sa ...

9 KB (1,392 words) - 01:45, 23 November 2021
cardinality Restricted Boltzmann Machines
...ls. The first notion is the sparsity of the graph (i.e., something related to the number of edges in the graph). In this sense, a sparse model is one tha ...neural network literature, it means that we do not want the entire neurons to be activated in the same time. For our RBM, we want that the values of hidd ...

9 KB (1,501 words) - 09:46, 30 August 2017
semi-supervised Learning with Deep Generative Models
...n machine based Deep Belief Network (DBN). Where layers of RBM are trained to learn unsupervised features of the data and then a final classification lay ...e on benchmark tasks and uses deep neural networks in an innovative manner to create a layered semi-supervised classification/generation model. ...

9 KB (1,554 words) - 09:46, 30 August 2017
self-Taught Learning
...her type (i.e. has class labels that do not apply to data set that we wish to classify). ...hasize that the unlabeled data need not belong to the class labels we wish to assign, as long as it is related. This fact distinguishes it from semi-supe ...

12 KB (1,871 words) - 09:45, 30 August 2017
contributions on Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
...synthesis requires a much larger and more complex set of contexts in order to achieve high quality synthesised speech. Examples of such contexts are the * Identity of neighbouring phones to the central phone. Two phones to the left and the right of the centre phone are usually considered as phonet ...

8 KB (1,374 words) - 09:45, 30 August 2017
deep Learning of the tissue-regulated splicing code
...twork (DNN) in predicting outcome of splicing, and compare the performance to formerly trained model Bayesian Neural Network<ref>https://www.cs.cmu.edu/a The cost function we want to minimize here during training is <math>E=-\sum_a\sum_{k=1}^{C}{y_{n,k}log(h ...

8 KB (1,353 words) - 09:46, 30 August 2017
f17Stat946PaperSignUp
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

10 KB (1,213 words) - 19:28, 19 November 2020
Automatic Bank Fraud Detection Using Support Vector Machines
...rvised learning). These methods were then tested on various bank databases to determine how effective they were in detecting fraud. ...a higher probability of being fraudulent. Different methods have been used to tackle this problem impacting financial institutions such as: Bayesian Netw ...

12 KB (1,776 words) - 19:07, 24 November 2021
video-based face recognition using Adaptive HMM
...e or still images. It is very complex problem with high dimensionality due to the nature of digital images. Face recognition benefits many fields such a ...by speaker adaptation, this paper presents an Adaptive Hidden Markov model to recognize human face from frames sequence. The proposed model trains HMM on ...

10 KB (1,640 words) - 09:46, 30 August 2017
positive Semidefinite Metric Learning Using Boosting-like Algorithms
...etric learning algorithms because it uses a special optimization technique to solve the semi-definite programming (SDP) problem. ...etric to be positive semi-definite. Semi-definite programming is difficult to implement and does not scale well. Based on observation that any positive s ...

9 KB (1,558 words) - 09:46, 30 August 2017
maximum-Margin Matrix Factorization
...ome <math>\pm1</math> while some other cells are unknown. The main goal is to find matrix X such than it preserves the knowledge in Y, and predicts the v ...oblem of over-fitting in the prediction process. If the rank of X is equal to the rank of Y, we will have the trivial solution of X = Y. ...

12 KB (2,046 words) - 09:45, 30 August 2017
hamming Distance Metric Learning
...umber of samples. Like other metric learning methods this paper also tries to optimize some cost function which is based one a similarity measure between The task is to learn a mapping from b(x) that project p-dimensional real valued input x on ...

10 KB (1,792 words) - 09:46, 30 August 2017
strategies for Training Large Scale Neural Network Language Models
...-based implementation of a class based maximum entropy model, that allows to easily control the trade-off between memory complexity and computational ...training neural network language models with maximum entropy models leads to better performance in terms of computational complexity. ...

9 KB (1,542 words) - 09:46, 30 August 2017
monte Carlo Integration
Note that <math>\displaystyle\lim_{N\to\infty}\tilde{\mu} = \bar{x}</math>. Also note that when <math>N = 0</math> ...mann as a Los Alamos code word for the stochastic simulations they applied to building better atomic bombs. ...

5 KB (870 words) - 09:45, 30 August 2017
A Knowledge-Grounded Neural Conversation Model
...igh demand there is incentive to build systems that can respond seamlessly to requests. ...d opinion or fact-based content. The ability to do this would elevate them to the level of task-oriented conversational applications. ...

11 KB (1,713 words) - 13:09, 20 March 2018
U-Time:A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging Summary
...pecially trained to be applied on one dataset alone and might be difficult to use for non-experts in a more general setting (Perslev et al., 2019). ...r architectural tuning to be applied to variable data sets, and it is able to classify sleep stages at any temporal resolution (Perslev et al., 2019). ...

8 KB (1,170 words) - 01:41, 26 November 2021
Do Vision Transformers See Like CNN
...ransformers, to learn how exactly the vision transformer solves its tasks, to compare and contrast the results of these two different architectures, and ...ResNet model. Typical CNNs often suffer if the model is deep, largely due to vanishing gradients. ...

13 KB (2,006 words) - 00:11, 17 November 2021
measuring statistical dependence with Hilbert-Schmidt norms
...and kernel dependence measures based on RKHSs, and generalizes the measure to metric spaces. ...ween random variables is simple. However, testing independence is hard due to the unknown non-linearity of the data. There is a theorem that binds the tw ...

8 KB (1,240 words) - 09:46, 30 August 2017
On The Convergence Of ADAM And Beyond
...ial moving averages of squared past gradients, thereby limiting the update to only rely on the past few gradients. The following formula shows the per-pa ...is suggest that this can be prevented through novel but simple adjustments to the ADAM optimization algorithm, which can improve convergence. This paper ...

13 KB (2,153 words) - 16:54, 20 April 2018
Poison Frogs Neural Networks
...fic test instance, and ''clean-label'' attacks do not require the attacker to have control over the poison’s labeling. ...on a pretrained InceptionV3 network and sees up to 70% success rate on end-to-end trained scaled-down AlexNet architecture when using watermarks and mult ...

11 KB (1,590 words) - 18:29, 26 November 2021
kernel Dimension Reduction in Regression
...ression methods, marginal distribution of explanatory variables are needed to calculate the independency measurement. This paper proposes that conditiona ...ptions with respect to the probability distribution of X which can be hard to justify[ref]. Most of the previous regression methods assume linearity betw ...

6 KB (1,132 words) - 09:46, 30 August 2017
sparse PCA
...>d</math> variables has its own special meaning and it may be desirable to come up with some directions, as principal components, each of which is a combin ...have a limited number of non-zero elements. In other words, this helps us to perform feature selection, by selecting a subset of features in each direct ...

13 KB (2,202 words) - 09:45, 30 August 2017
deep Generative Stochastic Networks Trainable by Backprop
from <math>P_o(X|\bar{X})</math>, which is trained to estimate the ground truth <math>P(X|\bar{X})</math> P(X) and its partition function is thus easier to approximate. ...

12 KB (1,906 words) - 09:46, 30 August 2017
stat441w18/Saliency-based Sequential Image Attention with Multiset Prediction
...exhibit unexpected and unintuitive behaviour, allowing minor perturbations to cause a complete misclassification. In addition, the classifier may accurat ...a saliency detection method to determine the focus of the classifier, and to understand how the classifier makes its decisions. ...

12 KB (1,840 words) - 14:09, 20 March 2018
very Deep Convoloutional Networks for Large-Scale Image Recognition
...of very small (3 × 3) convolution filters in all layers. As a result, they come up with significantly more accurate ConvNet architectures. During training, the only preprocessing step is to subtract the mean RBG value computed on the training data. Then, the image ...

11 KB (1,680 words) - 09:46, 30 August 2017
a fast learning algorithm for deep belief nets
...astive version of the wake-sleep algorithm. The result is an efficient way to train a deep belief network with substantial accuracy, as is shown by top-n The following figure shows the network used to model the joint distribution ...

12 KB (1,919 words) - 09:46, 30 August 2017
the Wake-Sleep Algorithm for Unsupervised Neural Networks
...BN) are difficult to learn because the posterior distribution is difficult to infer. ...learn an efficient representation which accurately characterizes the input to the system. ...

16 KB (2,512 words) - 09:46, 30 August 2017
compressive Sensing (Candes)
In order to make quantitative observations about our environment, we must often acquire ...cquire these signals, we must have some minimum number of samples in order to exactly reconstruct the signal. ...

13 KB (2,258 words) - 09:45, 30 August 2017
Wide and Deep Learning for Recommender Systems
...o variables as a dot product between two low dimensional embedding vectors to achieve generalization. 4. '''Collaborative deep learning''' haven been used to couple deep learning for content information and collaborative filtering fo ...

8 KB (1,119 words) - 04:28, 1 December 2021
stat441F18
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

6 KB (827 words) - 11:33, 5 September 2020
learning Phrase Representations
...rformance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by ...ence. From a probabilistic perspective, this new model is a general method to learn the conditional distribution over a variable-length sequence conditio ...

12 KB (1,906 words) - 09:46, 30 August 2017
Robust Probabilistic Modeling with Bayesian Data Reweighting
Probabilistic models approximate the distribution of data to help with analysis and prediction by relying on a set of assumptions. Data ...s other animated kids movies for her. One day her parents forget to switch to their Netflix account and watch a horror movie. ...

9 KB (1,489 words) - 02:35, 19 November 2018
The Detection of Black Ice Accidents Using CNNs
...ne way an AV can prevent an accident is going from a passive safety system to an active safety system once a risk is identified. ...since it is a thin, transparent layer of ice. Because of this, focus needs to be placed on AVs identifying black ice. ...

12 KB (1,983 words) - 15:54, 14 November 2021
stat946w18
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

9 KB (1,240 words) - 18:05, 19 November 2018
an HDP-HMM for Systems with State Persistence
...have infinite number of parameters, only finite number of them is required to explain the observed data. ...rkov Model. The new model, which named HDP-HMM, allows the number of stats to be infinite. ...

12 KB (2,039 words) - 09:46, 30 August 2017
neural Machine Translation: Jointly Learning to Align and Translate
...hod is more effective compared to other neural network models when applied to long sentences. ...word. The decoder then selectively combines the most relevant annotations to generate each target word; this implements a mechanism of attention in the ...

14 KB (2,221 words) - 09:46, 30 August 2017
The Curious Case of Degeneration
...ine text. the author For example in the figure below, the GPT2 model tries to generate the continuation text given the context. On the left side, the bea ...is too probable which indicates the lack of diversity (variance) compared to human-generated texts ...

13 KB (2,144 words) - 05:41, 10 December 2020
parametric Local Metric Learning for Nearest Neighbor Classiﬁcation
...ther overcome overfitting. Also, large margin triplet constraints are used to find basis metrics, which further improves the results. ...a <math>(\alpha , \beta , p)</math>-Lipschitz smooth function with respect to a vector norm <math>\| . \|</math> if <math>\| f(x) - f(x^') \| \leq \alp ...

9 KB (1,589 words) - 09:46, 30 August 2017
Bayesian Network as a Decision Tool for Predicting ALS Disease
In order to propose the best decision tool for Amyotrophic Lateral Sclerosis (ALS) pred ...ntrol. Its origin is still unknown, though in some instances it is thought to be hereditary. Sadly, at this point of time, it is not curable and the prog ...

8 KB (1,188 words) - 10:31, 17 May 2022
neural Turing Machines
...limitation and name their approach Neural Turing Machine (NTM) as analogy to [https://en.wikipedia.org/wiki/Turing_machine Turing machines] that are fin ...rs propose to ignore the known capacity limitations of working memory, and to introduce sophisticated gating and memory addressing operations that are ty ...

12 KB (1,896 words) - 09:46, 30 August 2017
stat441w18/A New Method of Region Embedding for Text Classification
...ective approach for text classification is the bag-of-words model. That is to represent documents as vectors and train a classifier based on these repres To utilize the order information, people developed N-gram model, which is to predict the Nth word base on the last N-1 words with Markov Chain Model. Ye ...

13 KB (2,188 words) - 12:42, 15 March 2018
paper 13
...math> is called the Number of Random projections (<math>M</math>) required to project a K-dimensional manifold from <math>R^{N}</math> into <math>R^{M}</ ...structure, it is reasonable to use random projection (non-adaptive method) to map data into lower dimension (<math>M</math>) and then apply clustering al ...

13 KB (2,128 words) - 09:45, 30 August 2017
wikicoursenote:cleanup
...can find the discribtion for k classes in the next pages which is referred to as FDA for multi class problems. ...

551 bytes (116 words) - 09:45, 30 August 2017
visualizing Similarity Data with a Mixture of Maps
...to show how we can utilize several different two-dimensional maps in order to visualize a set of pairwise similarities. Aspect maps resemble both cluster ...ays: Despite difficulty of optimizing the SNE objective function, it leads to much better solutions and since SNE is based on probabilistic model, it is ...

15 KB (2,530 words) - 09:45, 30 August 2017
Learning Combinatorial Optimzation
The current approach to tackling NP-hard combinatorial optimization problems are good heuristics or ...mine the greedy action, the current state of the problem is taken as input to a graph embedding network from which an action will be given by its output. ...

12 KB (1,976 words) - 23:37, 20 March 2018
continuous space language models
The underlying idea of this approach is to attack the data sparseness problem by performing the language model probabi ...th> or the source sentence to be translated <math>\,e</math>, it is common to model these problems as finding the sequence of words <math>\,w^*</math> th ...

15 KB (2,517 words) - 09:46, 30 August 2017
stat441w18
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

5 KB (694 words) - 18:02, 31 August 2018
scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers Machines
...object like the sky or water could span the entire image, and figuring out to which class a particular blue pixel belongs could be challenging). ...single object). The convolutional network feature extractor is trained end-to-end from raw pixels, so there is no need for engineered features. ...

12 KB (1,895 words) - 09:46, 30 August 2017
Gradient Episodic Memory for Continual Learning
...sk Minimization (ERM) is one of the common supervised learning method used to minimize a loss function by having multiple passes over the training set. where <math>\ell :\mathcal {Y} \times \mathcal {Y} \to [0, \infty)</math> ...

13 KB (2,164 words) - 13:34, 21 November 2018
decentralised Data Fusion: A Graphical Model Approach (Summary)
...aking. SPIE Press, 2004. </ref>. The process of fusing data is categorized to centralized and decentralized data fusion. ...t reliable operation; In fact, the main problem of DDF is finding a method to understand and formulate this process. Having the significant advantages of ...

9 KB (1,332 words) - 09:45, 30 August 2017
kernel Spectral Clustering for Community Detection in Complex Networks
...mework and make use of out-of-sample extension. They also propose a method to extract from a network a subgraph representative for the overall community ...h> indicating the degree of node ''i'', i.e. the number of edges connected to node ''i''. ...

10 KB (1,675 words) - 09:46, 30 August 2017
GradientLess Descent
...o minimise an objective function <math display="inline">f : \mathbb{R}^n \to \mathbb{R}</math>, which means finding: ...consider convex smooth objective noiseless functions, where we have access to function computations but not gradient computations. This class of function ...

11 KB (1,754 words) - 22:06, 9 December 2020
Depthwise Convolution Is All You Need for Learning Multiple Visual Domains
...ng applied to new domains. Additionally, we introduce a gat- ing mechanism to promote soft sharing between different domains. The approach was evalueated 2.Add new tasks to the model without introducing additional parameters. ...

10 KB (1,371 words) - 00:44, 14 November 2021
f11Stat946ass
According to the product rule we have: Now, we need to show that this statement, <math>p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})</math>, ...

14 KB (2,497 words) - 09:45, 30 August 2017
dropout
...eural network which contains a large number of parameters. The key idea is to randomly drop units from the neural network during training. During trainin ...set using a validation set, or can be set to 0.5, which seems to be close to optimal for a wide range of networks and tasks). ...

13 KB (2,182 words) - 09:46, 30 August 2017
Streaming Bayesian Inference for Crowdsourced Classification
...ffective in processing high volumes of small tasks that would be expensive to achieve in other methods. The primary limitation of this method to acquire data is that respondents can submit incorrect responses so that we ...

13 KB (2,239 words) - 23:20, 4 December 2020
DREAM TO CONTROL: LEARNING BEHAVIORS BY LATENT IMAGINATION
...ue function, directly on latent state samples which help to enable scaling to more complex tasks. ...omes with using finite imagination horizons. The authors have also managed to demonstrate empirical performance for visual control by evaluating the mode ...

13 KB (2,072 words) - 06:07, 10 December 2020
Learning The Difference That Makes A Difference With Counterfactually-Augmented Data
...mentation. The authors apply this method to the IMDB sentiment dataset and to SNLI and show that many models can not perform well on the augmented datase ...t bias within the data. ML models then learn the inherent bias which leads to biased predictions. ...

10 KB (1,605 words) - 19:42, 6 December 2020
summary
...his tree boosting system is highly scalable , which means it could be used to solve various problems with significantly less time and fewer resources. ...on how he used algorithmic optimizations as well as some important systems to develop XGBoost. He explained it in the following manner: ...

12 KB (1,916 words) - 17:34, 18 March 2018
STAT946F17/ Learning Important Features Through Propagating Activation Differences
...ing to the difference. This is a form of sensitivity analysis and it helps to better understand the model. ...ityanalysis.asp Investopedia]], a sensitivity analysis is a technique used to determine how changes in an independent variable influence a particular dep ...

14 KB (2,347 words) - 10:26, 4 December 2017
Augmix: New Data Augmentation method to increase the robustness of the algorithm
...s & Dietterich (2019), showing that the classification error rose from 25% to 62% when some corruption was introduced on the ImageNet test set. ...ce that networks trained on translation augmentations are highly sensitive to the shifting of pixels. ...

11 KB (1,652 words) - 18:44, 6 December 2020
show, Attend and Tell: Neural Image Caption Generation with Visual Attention
...attended to to generate each specific word in the output. This can be used to get a sense of what is going on in the model and is especially useful for u ...hat distill information in image down to the most salient objects can lead to losing information which could be useful for richer, more descriptive capti ...

12 KB (1,882 words) - 09:46, 30 August 2017
nonparametric Latent Feature Models for Link Prediction
...the classes of the corresponding pair of nodes. The idea is fairly similar to the stochastic blockmodel <ref>Krzysztof Nowicki and Tom A. B. Snijders. Es ...h each node has binary-valued latent features that influences its relation to other nodes. Known covariates information can also be incorporated. The mod ...

12 KB (1,942 words) - 09:46, 30 August 2017
graph Laplacian Regularization for Larg-Scale Semidefinite Programming
...zation for solving very sophisticated problems of the above type that lead to much smaller and faster SDPs than previous approaches. This factorization c ...airwise distances to nearby sensors via radio transmitters, the problem is to identify the whole network topology. In other words, knowing that we have n ...

12 KB (1,953 words) - 09:45, 30 August 2017
stat841F18/
In the past two decades, due to their surprising classi- fication capability, support vector machine (SVM) ...cations directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. ...

10 KB (1,620 words) - 17:50, 9 November 2018
deep Convolutional Neural Networks For LVCSR
deep neural networks to large vocabulary speech recognition,” submitted ...ons: first, they are translation invariant which makes them an alternative to various speaker adaptation techniques. Second, spectral representation of t ...

11 KB (1,587 words) - 09:46, 30 August 2017
multi-Task Feature Learning
Re-wrote the introduction since it was very similar to that of the original paper Imagine that a user wishes to perform some '''task''', such as shopping for a book. ...

17 KB (2,834 words) - 09:45, 30 August 2017
discLDA: Discriminative Learning for Dimensionality Reduction and Classification
A recent trend in dimensionality reduction is to focus on probabilistic models. These models, which include [http://en.wikip ...data. A supervised dimensionality reduction technique can also be applied to build a classifier or regressor on the reduced space. ...

17 KB (2,695 words) - 09:45, 30 August 2017
Going Deeper with Convolutions
...itional 1 X 1 convolutional layers, serving as dimension reduction modules to significantly reduce the number of parameters of the model. The paper also ...two major bottlenecks. One disadvantage is that the enlarged network tends to overfit the train data, especially if there is only limited labeled example ...

9 KB (1,389 words) - 00:29, 7 December 2020
maximum likelihood estimation of intrinsic dimension
...by cross-validation-- a reliable estimate is still helpful as a guideline to reduce the computational cost of cross-validation. ...th>f</math> and the mapping <math>\psi</math> are all unknown. The task is to give an estimator of <math>m</math> based on above settings. ...

15 KB (2,484 words) - 09:46, 30 August 2017
CRITICAL ANALYSIS OF SELF-SUPERVISION
...level semantic information captured by manual labels. This paper also aims to figure out whether current self-supervision techniques can learn deep featu ...upervised learning is to take advantage of a vast amount of unlabeled data to train CNNs and find good generalized image representations. ...

12 KB (1,792 words) - 00:08, 13 December 2020
a Direct Formulation For Sparse PCA Using Semidefinite Programming
...to find the sparse principal components using sparse PCA, it is necessary to make some sacrifices such as: ...the original data captured by the sparse principal components as compared to PCA. ...

20 KB (3,146 words) - 09:45, 30 August 2017
F21-STAT 441/841 CM 763-Proposal
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves ...UAD, LUSD, and MESO), and the images have been split into patches in order to reduce the computational difficulty. The classification task is decomposed ...

12 KB (1,520 words) - 09:48, 22 December 2021
stat441w18/e-gan
...l training objective and always preserves the best offspring, contributing to progress in and the success of GANs. ...to determine fake currency. The generative model is analogous to learn how to detect the counterfeit currency. ...

15 KB (2,279 words) - 22:00, 14 March 2018
Representations of Words and Phrases and their Compositionality
...using a neural network framework. Notably, the Skip-gram model can be made to train faster and produce higher accuracy via a number of simple adjustments ...ngs for words that appear in similar contexts. While the model can be used to evaluate certain probabilities, this is considered a side effect of its lea ...

10 KB (1,716 words) - 13:24, 21 November 2018
f15Stat946PaperSignUp
|width="30pt"|Link to the paper |width="30pt"|Link to the summary ...

11 KB (1,453 words) - 13:01, 16 October 2018
Towards Deep Learning Models Resistant to Adversarial Attacks
This is a summary of the paper "Towards Deep Learning Models Resistant to Adversarial Attacks" by Aleksander Madry, Aleksandar Makelov, Ludwig Schmid ...e|'''Figure 1.''' Before and after an input image of a panda was subjected to perturbations.[https://arxiv.org/abs/1412.6572 Source]]] ...

14 KB (2,192 words) - 03:01, 23 November 2018
F21-STAT 940-Proposal
...from start to end with no human interference. We would be the first to try to automate such a process using deep learning. - Place the swab in our nose up to a particular depth ...

13 KB (2,036 words) - 12:50, 16 December 2021
learning Convolutional Feature Hierarchies for Visual Recognition
...n among them. In sparse coding, the sparse feature vector z is constructed to reconstruct the input x with a dictionary D. The procedure produces a code ...m is trained on single image patches in most applications of sparse coding to image analysis, which produces a dictionary of filters that are essentially ...

12 KB (1,872 words) - 09:46, 30 August 2017
Another look at distance-weighted discrimination
...elatively complicated for DWD. This paper proposed a new thrifty algorithm to solve the standard problem DWD and generalized DWD, which is faster than th ...s a 'data piling' issue and reveals competitive performance. SOCP was used to solve DWD by reformulating problem (Alizadeh and Goldfarb, 2004; Boyd and V ...

10 KB (1,433 words) - 03:02, 13 November 2021
dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces
...setting of supervised learning. Before explaining '''KDR''', it is useful to review supervised learning and establish notations. ...problem. Because the theory developed in the paper allows the output space to be continuous or discrete, '''KDR''' is, therefore, a dimensionality reduc ...

14 KB (2,403 words) - 09:45, 30 August 2017
Breaking Certified Defenses: Semantic Adversarial Examples With Spoofed Robustness Certificates
...al attack where a model is deceived by an attacker by adding a small noise to an input image and as a result, the prediction of the model changes. ...important to design the classifiers such that these classifiers are immune to such adversarial attacks. ...

15 KB (2,325 words) - 06:58, 6 December 2020
Robust Imitation Learning from Noisy Demonstrations
...imal). This creates a big problem, as this method becomes very susceptible to poor data (i.e., not very robust). This intuitively makes sense, as the age ...e noisy demonstration to be ranked according to their relative performance to each other. Another similar method requires extra labelling of the data wit ...

13 KB (2,031 words) - 19:23, 27 November 2021
matrix Completion with Noise
We are curious to find out if this accurate recovery is possible and, if so, the accuracy at ...s the few observed entries. The error of the recovery task is proportional to the noise level when the number of noisy samples is about <math>nr\log^{2}{ ...

14 KB (2,342 words) - 09:45, 30 August 2017
question Answering with Subgraph Embeddings
...stion answers (or open QA). However, the scale and difficulty for machines to interpret natural language still makes this problem challenging. ...-negligible interventions (hand-crafted lexicons, grammars and KB schemas) to be effective. ...

15 KB (2,417 words) - 09:46, 30 August 2017
graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns
...ide information and incorporate the side information in the classification to improve the algorithms. ...uctured classification problem in practice, we need both an expressive way to represent our beliefs about the structure, as well as an efficient probabil ...

17 KB (2,924 words) - 09:46, 30 August 2017
Pixels to Graphs by Associative Embedding
...ons between them. An explicit representation of this semantics is referred to as a scene graph where we represent objects grounded in the scene as vertic ...all of the objects in the scene, then isolate individual pairs of objects to identify the relationships between them. This breakdown often restricts the ...

17 KB (2,749 words) - 18:26, 16 December 2018
Dynamic Routing Between Capsules
...e computational power and huge data collected today, scientists are racing to start their careers in machine learning field. ...tion field, we are glad to present this paper, which offers a modification to solve for the flaws in CNN and largely improves the prediction precision. ...

14 KB (2,384 words) - 12:36, 29 March 2018
neighbourhood Components Analysis
...ere is also the problem of determining which distance metric is to be used to define "nearest" points. ...at defines which points are nearest. It can restrict this distance metric to be low rank, reducing the dimensionality of the data and thus reducing stor ...

16 KB (2,630 words) - 09:45, 30 August 2017
joint training of a convolutional network and a graphical model for human pose estimation
...el features which are more tolerant to variations. However, it’s difficult to incorporate prior knowledge about the structure of the human body. ...convolutional neural network is combined with a graphical models, in order to capture the spatial dependencies between the variables of interest which is ...

12 KB (1,800 words) - 09:46, 30 August 2017
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
...rs than BERT-large, but it still produces better results. The changes made to BERT model are Factorized embedding parameterization and Cross-layer parame ...construct a layer's activations from its next layer, to eliminate the need to store these activations, freeing up the memory. In addition, Raffel et al. ...

14 KB (2,170 words) - 21:39, 9 December 2020
A Game Theoretic Approach to Class-wise Selective Rationalization
...alternative conclusions. Each class consists of three players who compete to find evidence for both factual and counterfactual circumstances. In a simpl ...ng explanations for a specific class by probing the importance with regard to the relevant class logit. ...

11 KB (1,594 words) - 13:14, 25 November 2021
Roberta
...r, it is difficult to determine which parts of the methods contribute most to their success. This paper proposed Roberta, a model which replicates BERT p ...g the alternatives in design choices and training schemes of BERT, leading to better downstream task performance. ...

14 KB (2,156 words) - 00:54, 13 December 2020
Co-Teaching
...deep learning models trained by the co-teaching approach is much superior to state-of-the-art baselines ...he two networks are able to filter different type of errors, and flow back to itself and the other network. As a result, the two models learn together, f ...

15 KB (2,318 words) - 21:02, 11 December 2018
deflation Methods for Sparse PCA
...method matrix deflation] (which is done by modifying the covariance matrix to eliminate the influence of that eigenvector). ...avoid this drawback of PCA is the formulation of sparse PCA (SPCA), a link to the 2004 paper of which by Iain M. Johnstone ''et al.''<ref name="R1">Iain ...

20 KB (3,332 words) - 09:45, 30 August 2017
stat946w18/Predicting Floor-Level for 911 Calls with Neural Networks and Smartphone Sensor Data
...s can provide the rescuers with the geographic location. However GPS fails to give an accurate floor level inside a tall building. Previous work have exp ...nsors including barometers and magnetometers. Deep learning can be applied to predict floor level based on these sensor readings. ...

14 KB (2,153 words) - 15:01, 18 April 2018
generating text with recurrent neural networks
...ploding gradients by applying a technique called Hessian-Free optimization to effectively train a recurrent network that, when unrolled in time, has appr ...quence Data"] ICML, (2009) </ref> and a mixture of context models referred to as PAQ <ref> Mahoney, M. [https://repository.lib.fit.edu/bitstream/handle/1 ...

18 KB (2,926 words) - 09:46, 30 August 2017
Predicting Hurricane Trajectories Using a Recurrent Neural Network
...ibbean Sea and the Atlantic Ocean; they generally travel from their origin to the north, northwest, or northeast. Hurricanes are usually accompanied by s ...are a kind of artificial neural network, where the weights can be modified to make the model learn complex dynamic time-dependent behavior. An RNN can ef ...

12 KB (1,826 words) - 09:46, 20 November 2021
kernelized Sorting
...es of objects to be matched. Instead, we develop an approach which is able to perform matching by requiring a similarity measure only within each of the drawn independently according to the same law. ...

16 KB (2,875 words) - 09:45, 30 August 2017
stat946w18/IMPROVING GANS USING OPTIMAL TRANSPORT
Recently, the problem of how to learn models that generate media such as images, video, audio and text has ...ribution distance between the generated data and the real one is important to train the generator. ...

18 KB (2,794 words) - 00:23, 21 April 2018
quantifying cancer progression with conjunctive Bayesian networks.
...fic order, due to which a single node can have multiple parents. This lead to the use of a more generalization framework for tree models called as the co ...>r\neq q</math> and <math>p<r<q</math>. Denote <math>p\rightarrow q</math> to say that <math>p</math> is the parent of <math>q</math>. The set of all par ...

15 KB (2,588 words) - 09:46, 30 August 2017
contributions on Quantifying Cancer Progression with Conjunctive Bayesian Networks
...fic order, due to which a single node can have multiple parents. This lead to the use of a more generalization framework for tree models called as the co ...>r\neq q</math> and <math>p<r<q</math>. Denote <math>p\rightarrow q</math> to say that <math>p</math> is the parent of <math>q</math>. The set of all par ...

15 KB (2,589 words) - 09:45, 30 August 2017
Memory-Based Parameter Adaptation
...also presents experimental results where the model in question is applied to continual and incremental learning tasks. ...t <math>x</math> and the contents of the memory, <math>M</math>, according to ...

12 KB (1,963 words) - 23:48, 9 November 2018
quantifying cancer progression with conjunctive Bayesian networks
...fic order, due to which a single node can have multiple parents. This lead to the use of a more generalization framework for tree models called as the co ...>r\neq q</math> and <math>p<r<q</math>. Denote <math>p\rightarrow q</math> to say that <math>p</math> is the parent of <math>q</math>. The set of all par ...

15 KB (2,605 words) - 09:46, 30 August 2017
visualizing Data using t-SNE
...rowding problem". In addition, the author showed that t-SNE can be applied to large data sets as well, by using random walks on neighborhood graphs. The ...mathbf x_j </math> as its neighbor when neighbors are picked in proportion to their probability density under a Gaussian centered on <math> \mathbf x_i < ...

19 KB (3,223 words) - 09:45, 30 August 2017
the loss surfaces of multilayer networks (Choromanska et al.)
...mial, providing an analysis with results from random matrix theory applied to spherical spin glasses. ...pirically show that the cost functions of neural networks behave similarly to Gaussian error functions in high-dimensional spaces, but no theoretical jus ...

13 KB (2,168 words) - 09:46, 30 August 2017
proof of Theorem 1
...Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". ''Biostati ...

1 KB (192 words) - 09:45, 30 August 2017
natural language processing (almost) from scratch.
...reexisting systems. The authors argue that the goal of NLP is, ideally, is to translate natural language text into a data structure which fully and unamb This paper describes creates a single learning system to perform four standard NLP tasks(described below), using minimal pre-process ...

13 KB (2,118 words) - 09:46, 30 August 2017
imageNet Classification with Deep Convolutional Neural Networks
...rom millions of images, Convolutional Neural Network (CNN) is utilized due to its large learning capacity, fewer connections and parameters and outstandi ...CNNs. Thus, they trained one of the largest convolutional neural networks to date on the datasets of ILSVRC-2010 and ILSVRC-2012 and achieved the best r ...

13 KB (1,939 words) - 09:46, 30 August 2017
One pixel attack for fooling deep neural networks
...short amount time, the ML community went from no-one using neural networks to everyone using the neural network. Today we have 97% accuracy in using deep ...ng into Figure 1, as shown in Figure 2, a small amount of perturbation led to misclassify a dog as a hummingbird. ...

17 KB (2,650 words) - 23:54, 30 March 2018
Do Deep Neural Networks Suffer from Crowding
...mpact of crowding on DNNs trained for object recognition by adding clutter to the images and then analyzing which models and settings suffer less from su ...identify the "A" in the two circles. You should see that it is much easier to make out the "A" in the right than in the left circle. The same "A" exists ...

19 KB (3,066 words) - 00:21, 21 April 2018
Meta-Learning For Domain Generalization
...ickly master a new game. Hereby defining tasks as domains, the paper tries to overcome the problem in a model-agnostic way. ...domain [2]. Finally, a domain-invariant feature representation is learned to minimize the gap between multiple source domains and it should provide a do ...

14 KB (2,177 words) - 00:41, 7 December 2020
on using very large target vocabulary for neural machine translation
...translation have avoided this problem by restricting the model vocabulary to only include some shortlist of words in the target language. Words not in t In this paper Jean and his colleagues aim to solve this problem by proposing a training method based on importance sampl ...

14 KB (2,301 words) - 09:46, 30 August 2017
incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains(Summary)
...at online incremental learning of human motion patterns with applications to humanoids and other robotic agents. The algorithm automatically abstracts t ..., where K is the number of outputs. Mixture of Gaussians are commonly used to model the output distribution. This distribution is mixture of Bernoulli an ...

18 KB (2,835 words) - 09:46, 30 August 2017
From Variational to Deterministic Autoencoders
...er are able to generate samples that are comparable or better when applied to domains of images and structured objects. The authors point to several drawbacks currently associated with VAE's including: ...

15 KB (2,313 words) - 19:11, 2 December 2020
stat441w18/mastering-chess-and-shogi-self-play
...marking a huge step forward in artificial intelligence that would continue to be built upon for the coming decades. The methods in Deep Blue, as well as ...Elmo is also highly dependent real human input and modifications compared to solely computation learning methods. ...

14 KB (2,311 words) - 13:58, 15 March 2018
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
...ning, the goal is to train a model to perform the task of training a model to perform a task. Hence, in this case, the term "meta-Learning" has the exact ...ata for that specific task. In other words, we would like to train a model to perform the following procedure: ...

17 KB (2,846 words) - 00:12, 21 April 2018
kernelized Locality-Sensitive Hashing
...ucket share the same low dimensional representation which is used as a key to that bucket. At query time, the low dimensional representation of the query ...kernel maps the data to an infinite dimensional space which is intractable to explicitly work with. This paper generalizes locality sensitive hashing by ...

17 KB (2,894 words) - 09:46, 30 August 2017
learning Spectral Clustering, With Application To Speech Separation
...hat generalizes to the unseen datasets when spectral clustering is applied to them. Traditional spectral clustering techniques assume a metric or a simil Clustering refers to partition a given dataset into clusters such that data points in the same c ...

35 KB (5,767 words) - 09:45, 30 August 2017
One-Shot Imitation Learning
...proposed model aims to achieve 'one-shot' imitation learning, ie. learning to complete a new task from just a single demonstration of it without any othe ** Behavioural learning uses supervised learning to map from observations to actions (e.g. [https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-ve ...

20 KB (3,247 words) - 00:27, 21 April 2018
orthogonal gradient descent for continual learning
...ase diseases very well, but its utility is limited if it cannot be adapted to learn novel diseases - like local/rare/or new diseases (like Covid-19). ...>(X_1,X_2,\ldots, X_m)</math> iid from this distribution, which is assumed to be "fixed" during training. In continual learning, this distribution typica ...

15 KB (2,322 words) - 23:30, 7 December 2020
proposal for STAT946 projects Fall 2010
...\,n</math>, and a very small portion of them totalling <math>\,k</math> is to be sampled as landmarks for landmark MDS, there are two common approaches: ::1. Use simple random sampling (SRS) to sample the <math>\,k</math> landmarks. ...

17 KB (2,679 words) - 09:45, 30 August 2017
extracting and Composing Robust Features with Denoising Autoencoders
robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be ...

14 KB (2,189 words) - 09:46, 30 August 2017
on the difficulty of training recurrent neural networks
...ncies. In this paper the authors propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradi ...Through Time (BPTT) algorithm. The authors rewrote the equations in order to highlight the exploding gradients problem: ...

17 KB (2,685 words) - 09:46, 30 August 2017
stat946w18/Spectral normalization for generative adversial network
...uccess as a framework of generative models in recent years. The concept is to consecutively train the model distribution and the discriminator in turn, w ...turns out to be 0. This motivates us to introduce some form of restriction to the choice of discriminator. ...

16 KB (2,645 words) - 10:31, 18 April 2018
Semantic Relation Classification——via Convolution Neural Network
...M is an artificial recurrent neural network (RNN) architecture well suited to classifying, processing and making predictions based on time series data. I .... (2015a) and Santos et al. (2015) both applied CNN with negative sampling to finish task7. The 2017 SemEval Task 10 also featured relation extraction wi ...

15 KB (2,408 words) - 21:25, 5 December 2020
optimal Solutions forSparse Principal Component Analysis
...ipedia.org/wiki/Singular_value_decomposition singular value decomposition] to the data matrix. ...loadings) in ''sparse principal components'' to a very low number relative to the total number of coefficients whilst having these sparse vectors explain ...

22 KB (3,725 words) - 09:45, 30 August 2017
When Does Self-Supervision Improve Few-Shot Learning?
This paper proposes a technique utilizing self-supervised learning (SSL) to improve the generalization of few-shot learned representations on small lab ...rupted. It also encompasses classification of unlabeled images that belong to the domain which is not present in the training dataset. ...

17 KB (2,644 words) - 01:46, 13 December 2020
a fair comparison of graph neural networks for graph classification
...I faces a reproducibility crisis" [1]. It has been argued that the ability to reproduce existing AI code, and making these codes and new ones open source been developed to effectively tackle graph classification. However, experimental ...

16 KB (2,430 words) - 00:54, 7 December 2020
supervised Dictionary Learning
...decompositions, in conjunction with predefined dictionaries, were applied to face and signal recognition. ...ation <ref name="VZ2009">M. Varma and A. Zisserman. A statistical approach to material classification using image patch exemplars. ''IEEE Trans. PAMI'', ...

21 KB (3,291 words) - 09:45, 30 August 2017
Multi-scale Dense Networks for Resource Efficient Image Classification
...O-slidedeck.pdf ensemble of CNNs], which are likely far too resource-heavy to be used in any resource-limited application. ...varying levels of computational requirements. The two cases that are used to evaluate the network are: ...

18 KB (2,750 words) - 22:45, 20 April 2018
Deep Residual Learning for Image Recognition
...ul GPUs and the increasingly available data sets, two components essential to training of deep architectures. One of the more recent progresses in the fi ...increase the depth of a network by simply repeating a small module and aim to achieve higher accuracy</li> ...

19 KB (2,963 words) - 14:42, 22 November 2018
Influenza Forecasting Framework based on Gaussian Processes
...data from the CDC and other real-time data sources, such as Google Trends to forecast influenza activities. ...put. Spatio-temporal effects would therefore require adequate data sources to achieve good performance. ...

17 KB (2,683 words) - 14:13, 7 December 2020
Learning What and Where to Draw
...label or a non-localized caption. The authors of 'Learning What and Where to Draw' believe that image synthesis will be drastically enhanced by incorpor ...scription what each image is intended to depict. The proposed model learns to perform location and content-controllable image synthesis on the Caltech-UC ...

18 KB (2,781 words) - 12:35, 4 December 2017
Understanding Image Motion with Group Representations
...h as optical flow and visual odometry, where a sequence of images are used to calculate either the pixel level (local) motion or the motion of the entire ...model captures motion in both 2D and 3D settings. This method can be used to extract useful information for vehicle localization, tracking, and odometry ...

19 KB (2,946 words) - 16:09, 20 April 2018
stat946w18/MaskRNN: Instance Level Video Object Segmentation
* In unsupervised video object segmentation, the task is to find the salient objects and track the main objects in the video. ...lient objects is provided for the first frame. The task is thus simplified to only track the objects required. ...

21 KB (3,174 words) - 00:15, 21 April 2018
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
...n then be used for object detection. However, point clouds are challenging to process because: ...atial arrangement of the points contains useful information, thus it needs to be encoded. ...

19 KB (2,990 words) - 22:59, 20 April 2018
probabilistic PCA with GPLVM
...onality reduction, spans from the initial formulation of PCA 100 years ago to the more recent formulation of the dual problem (Dual PCA) and subsequent k ...al representation of this can be see in Figure 1. A probabilistic approach to the PCA problem is appealing as it provides several advantages such as enab ...

21 KB (3,433 words) - 09:45, 30 August 2017
stat946w18/Implicit Causal Models for Genome-wide Association Studies
...ied with neural networks, implicit densities, and with scalable algorithms to very large data for their Bayesian inference. However, most of the models a ...which one or more SNPs cause the disease; second, target the selected SNPs to cure the disease. ...

16 KB (2,613 words) - 23:52, 20 April 2018
probabilistic Matrix Factorization
The problem provides an approach to [http://en.wikipedia.org/wiki/Collaborative_filtering| collaborative filter ...ere each row corresponds to a user, each column to a movie and the entries to a rating. Ratings take on values <math>\,1,...,K</math>. The difficulty in ...

18 KB (2,938 words) - 09:45, 30 August 2017
memory Networks
...class label for an image or a translation of a sentence from one language to another). In this sense, ...network is constituted largely by the weights on the recurrent connections to its hidden layer (along with the layer's activities). As is well known, thi ...

23 KB (3,946 words) - 09:46, 30 August 2017
Wavelet Pooling CNN
...g and stochastic pooling. All these methods employ a neighborhood approach to the sub-sampling which, albeit fast and simple, can produce artifacts such ...as a lossy process, the reason for employing a wavelet approach is to try to minimize this loss. ...

15 KB (2,396 words) - 22:57, 20 April 2018
goingDeeperWithConvolutions
In the last three years, due to the advances of deep learning and more concretely convolutional networks [h ...otivation for their "Inception module" approach to CNN architecture is due to: ...

15 KB (2,289 words) - 09:46, 30 August 2017
stat946w18/AmbientGAN: Generative Models from Lossy Measurements
...lurred) with some noise from a <math>N(0, 0.5^2)</math> distribution added to each pixel: ...erative model directly on the measured data. This will obviously be unable to generate the true distribution before measurement has occurred. ...

19 KB (2,916 words) - 22:25, 20 April 2018
stat946w18/Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolutional Layers
...PS)] which is less than 5% of the Titan Xp. Clearly, it would be difficult to deploy and run these models on low-power devices. ...CNN). Past work has mostly focused on norm[based or error-based heuristics to prune channels; instead, Ye et al. (2018) show that their approach is easil ...

13 KB (1,942 words) - 00:18, 21 April 2018
deep Neural Nets as a Method for Quantitative Structure–Activity Relationships
...computer intensive or require the adjustment of many sensitive parameters to achieve good prediction.In this sense, the machine learning methods can be ...mization algorithms can substantially reduce the experiment work that need to be done. It was hypothesized that DNN models outperform RF models. ...

17 KB (2,705 words) - 09:46, 30 August 2017
independent Component Analysis: algorithms and applications
...here two people are speaking at the same time and two microphones are used to record the speech signals. Denoting the speech signals by <math>s_1(t) \,</ ...give us an objective in finding matrix <math>\ A</math>, that is, we want to find components which are as statistically independent and non-Gaussian as ...

15 KB (2,422 words) - 09:45, 30 August 2017
Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness
The approaches to solving optimal flow problems, albeit widely successful, has mostly been a ...of the standard Flownet architecture with a spatial transformer component to devise a "self-supervising" loss function. ...

16 KB (2,542 words) - 17:26, 26 November 2018
Self-Supervised Learning of Pretext-Invariant Representations
...problem is '''self-supervised Learning'''. Self-Supervised Learning tries to learn meaningful semantics by just using the inputs themselves rather than ...struction where a square chunk of the image is deleted and the model tries to reconstruct that part. ...

20 KB (3,045 words) - 23:02, 12 December 2020
Learning to Teach
This is a summary of the paper titled: "Learning to Teach", authored by Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, and Tie-Yan ...ent, determining the appropriate data, loss function, and hypothesis space to facilitate the learning of the student model. ...

21 KB (3,351 words) - 18:40, 16 December 2018
residual Component Analysis: Generalizing PCA for more flexible inference in linear-Gaussian models
...le component analysis (PPCA) is a method based on an isotropic error model to estimate the principal axes when any data vector has one or more missing va ...probabilistic PCA. This is analogous to the Dual form of PCA and similarly to the primal form, the max likelihood solution solves for the latent coordina ...

14 KB (2,347 words) - 09:46, 30 August 2017
inductive Kernel Low-rank Decomposition with Priors: A Generalized Nystrom Method
...ilable in the training stage, it difficult to generalize the decomposition to new samples. ...hod as a bilateral extrapolation of a dictionary kernel, and generalize it to incorporate prior information in computing improved low-rank decomposition ...

16 KB (2,675 words) - 09:46, 30 August 2017
Mask RCNN
...e image where the object of interest is proposed to lie) and then attempts to classify the object within it. ...reover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. Mask R-CNN achieved top results ...

20 KB (3,056 words) - 22:37, 7 December 2020
statf09841Proposal
==Project 1 : How to Make a Birdhouse == ...ne game users is rapidly increasing. Computer play-programs are often used to automatically perform actions on behalf of a human player. This type of che ...

15 KB (2,344 words) - 09:45, 30 August 2017
compressive Sensing
...applications such as medical scanners and radar, it is usually too costly to increase the sampling rate. ...f compressible signal, how to construct a compressible signal and then how to reconstruct the compressed signal. ...

18 KB (2,888 words) - 09:45, 30 August 2017
a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis
...rform [http://en.wikipedia.org/wiki/Inference inference] across data sets. To this end, they demonstrate their penalized CCA method on a genomic data set ...r value decomposition will give the best rank-<math>r</math> approximation to the matrix. ...

30 KB (4,829 words) - 09:45, 30 August 2017
Predicting Floor Level For 911 Calls with Neural Network and Smartphone Sensor Data
In this paper, a novel approach is presented to accurately predict floor level for 911 calls by leveraging neural networks ...with tall buildings, relying on GPS or Wi-Fi signals does not always lead to an accurate location of a caller. ...

18 KB (2,896 words) - 18:43, 16 December 2018
Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence
...hese challenges, this paper will introduce a novel autoencoder-based model to learn non-linear user-POI relations, which is called SAE-NAD. SAE stands fo ...ed average of similar users or POIs. Model-based methods use user-POI data to build a model for generating recommendations. Both methods typically model ...

17 KB (2,662 words) - 05:15, 16 December 2020
stat946w18/Synthetic and natural noise both break neural machine translation
* "Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the l A person's ability to read this text comes as no surprise to the Psychology literature ...

17 KB (2,634 words) - 00:15, 21 April 2018
rOBPCA: A New Approach to Robust Principal Component Analysis
...ix. Since the classical estimation for covariance matrix is very sensitive to the presence of outliers, it is not surprising that the principal component ...to show that Bayesian robust estimator may be alternative choice compared to classical robust estimators. ...

15 KB (2,414 words) - 09:46, 30 August 2017
Adversarial Fisher Vectors for Unsupervised Representation Learning
...th a parameterised variational distribution is also a minimax game similar to the one in GAN. Although they are similar, an advantage of this EBM view is ...a challenging task. In fact, as we will see, we can use the Fisher kernel to calculate the distance between two sets of images which is not a trivial ta ...

22 KB (3,540 words) - 17:50, 6 December 2020
Fairness Without Demographics in Repeated Loss Minimization
...he model. When unbalanced group risk gets worse over time this is referred to as '''''disparity amplification'''''. ...step of time, and is fair for models that ERM turns unfair by applying it to Amazon Mechanical Turk task. ...

20 KB (3,120 words) - 00:42, 17 December 2018
distributed Representations of Words and Phrases and their Compositionality
...ec(“Madrid”) - vec(“Spain”) + vec(“France”) is closer to vec(“Paris”) than to any other word vector. The authors of this paper show that subsampling of f ...ntrastive estimation of unnormalized statistical models, with applications to natural image statistics"] in The Journal ofMachine Learning Research, (201 ...

19 KB (2,931 words) - 09:46, 30 August 2017
markov Random Fields for Super-Resolution
...ations. Here we focus on super-resolution application where the problem is to estimate high resolution details from low resolution images. ...by lines indicate statistical dependencies. Each “scene” node is connected to its corresponding “image” node as well as its neighbors. ...

18 KB (3,001 words) - 09:46, 30 August 2017
Wasserstein Auto-Encoders
...t way to combine them together, but a principled unifying framework is yet to be discovered. ...l evaluation is performed on MNIST and CelebA datasets, where WAE is found to generate samples of better quality than VAE while preserving training stabi ...

21 KB (3,416 words) - 22:25, 25 April 2018
uncovering Shared Structures in Multiclass Classification
...earning-to-learn'' or ''interclass transfer'' (Thrun, 1996) <ref> Learning to learn: Introduction. Kluwer Academic Publishers.</ref>. ...a striped texture. If such true underlying characteristics that are common to the many different classes can be found, then the effective complexity of t ...

24 KB (3,815 words) - 09:45, 30 August 2017
learning Hierarchical Features for Scene Labeling
...etween the two. An example input image and resultant output is shown below to demonstrate this. ...': The desired result (which is the same format as the training data given to the network for supervised learning) is an image with large features labell ...

18 KB (2,935 words) - 09:46, 30 August 2017
F18-STAT946-Proposal
'''Description:''' We use paper cups to make a string phone and talk with friends while learning about sound waves ...n map for the ships first, augment the images and train a simple CNN model to detect them. ...

17 KB (2,400 words) - 15:50, 14 December 2018
IPBoost
...tion <math> \mathcal D_t</math> towards the misclassified examples leading to <math> \mathcal D_{t+1}</math>, usually a relatively good model will have a ...mal decision boundary represented by the dotted line, and XGBoost was used to learn a classifier: ...

18 KB (2,846 words) - 00:18, 5 December 2020
SuperGLUE
There have been several benchmarks attempting to standardize the field of language understanding tasks. SentEval [6] evaluat ...guage models with all the transformer-based models started with attempting to achieve high scores on GLUE. Original GPT and BERT models scored 72.8 and 8 ...

16 KB (2,331 words) - 16:58, 6 December 2020
THE LOGICAL EXPRESSIVENESS OF GRAPH NEURAL NETWORKS
...s motivated by convolutional and recurrent neural networks and generalizes to both of them (Battaglia et al., 2018). Despite the fact that GNNs have rece ...true or false to every node—that is refined by the WL test. This work aims to answer the question of what are the node classifiers that can be captured b ...

17 KB (2,786 words) - 17:02, 6 December 2020
MULTI-VIEW DATA GENERATION WITHOUT VIEW SUPERVISION
...r of objects under various views. The distribution of the data is assumed to be driven by two independent latent factors: the content, which represents ...same content but different view. A number of approaches have been proposed to disentangle the content from the view (i.e. methods based on unlabeled samp ...

24 KB (4,054 words) - 00:34, 14 December 2018
parsing natural scenes and natural language with recursive neural networks
...syntactic parser for natural language sentences from the Penn Treebank and to outperform alternative approaches for semantic scene segmentation, annotati ...ure from raw images. In addition, the same network can be used recursively to achieve classification instead of building a hierarchy by a convolutional n ...

16 KB (2,588 words) - 09:46, 30 August 2017
Dynamic Routing Between Capsules STAT946
the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. Active capsules at one level ...

22 KB (3,375 words) - 22:40, 20 April 2018
convex and Semi Nonnegative Matrix Factorization
...pplication areas of NMF algorithm and also provide better interpretability to matrix factors. ...always sparse and many different sparsification schemes have been applied to NMF. ...

23 KB (3,920 words) - 09:45, 30 August 2017
A universal SNP and small-indel variant caller using deep neural networks
...length k, called k-means, and then piecing them together or aligning them to a reference genome. Next-generation sequencing is relatively fast and inexp ...s and small indels are technically challenging since it requires a program to distinguish between genuinely novel mutations and errors in the sequencing ...

18 KB (2,856 words) - 04:24, 16 December 2020
schedule946
...toluie (Paper: "Visualizing Similarity Data with a Mixture of Maps") (Link to paper: [http://www.cs.toronto.edu/~hinton/absps/ampaper.pdf ]) ...

1 KB (193 words) - 09:45, 30 August 2017
Don't Just Blame Over-parametrization
...y of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''cali ...ms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binn ...

9 KB (1,353 words) - 19:11, 15 November 2021
Annotating Object Instances with a Polygon RNN
...o separate entities is key to understanding what is around us and it helps to reason about the behavior of objects in the scene. ...med "object detection". There are four distinct levels of detection (refer to Figure 1 for a visual cue): ...

21 KB (3,323 words) - 18:41, 16 December 2018
Neural Audio Synthesis of Musical Notes with WaveNet autoencoders
...trol signals for manipulating tone, timbre, and dynamics during playback. To train such a data expensive model the authors highlight the need for a larg Proposed Wavenet-style autoencoder that learn to encode temural data over a long term audio structures without requiring ext ...

18 KB (2,701 words) - 00:19, 21 April 2018
Extreme Multi-label Text Classification
...thod takes advantage of unbalanced label distributions by forming clusters to reduce training time. The authors experimented on five different datasets a ...ining the Probabilistic Label Tree [5] method and the Adaptive Softmax [6] to propose APLC. ...

15 KB (2,456 words) - 22:04, 7 December 2020
learning Long-Range Vision for Autonomous Off-Road Driving
Stereo-vision has been used extensively for mobile robots in identifying near-to-far obstacles in its path, but is limited by it's max range of 12 meters. F ...ers. This approach has been implemented and tested on the Learning Applied to Ground Robots (LAGR) provided by the National Robotics Engineering Center ( ...

20 KB (3,026 words) - 09:46, 30 August 2017
Dense Passage Retrieval for Open-Domain Question Answering
...to the ability of a model to also assign a confidence score for its answer to the question. ...l reader since neural reading comprehension is expensive. It's impractical to process billions of documents using a neural reader. ...

17 KB (2,691 words) - 22:57, 7 December 2020
compressed Sensing Reconstruction via Belief Propagation
...content of a signal lays in a few samples with large magnitude. This lead to study and investigation on a class of signals, known as compressible signal ...an we sample compressible signals in a compressed way? Is there any method to sense only those large value coefficients? In parallel works by Donoho <ref ...

23 KB (3,784 words) - 09:45, 30 August 2017
XGBoost: A Scalable Tree Boosting System
...olve a wide of range problems. We mainly introduce XGBoost, a scalable end-to-end tree boosting system in this page. We demonstrate the exact greedy algo So the target function that needed to optimize is:<math>\sum_{i=1}^n l(y_i,\hat y_i)+\sum^K_{k=1}\omega(f_k), f_k ...

15 KB (2,406 words) - 18:07, 28 November 2018
stat441F18/TCNLM
...the global semantic, the probability of each learned latent topic is used to learn the local structure of a word sequence. ...on of a document to predict the topic distribution of the document in wish to identify the global semantic meaning of documents. ...

18 KB (2,810 words) - 23:45, 14 November 2018
learning a Nonlinear Embedding by Preserving Class Neighborhood Structure
...escribes a method to learn a nonlinear transformation from the input space to a low-dimensional ...sed dimensions for nearest neighbor classifications are used by the method to explicitly represent ...

20 KB (3,263 words) - 09:45, 30 August 2017
Label-Free Supervision of Neural Networks with Physics and Domain Knowledge
...d look like. This work explores whether a similar principle can be applied to teaching machines: can we supervise networks without individual examples by ...thout labels by learning from constraints that are known to hold according to prior domain knowledge. By training without direct examples of the values o ...

21 KB (3,358 words) - 00:04, 21 April 2018
a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
...pular online shopping website Amazon.com for recommending related products to users of Amazon.com based on what these users have recently purchased from Our goal, then, is to predict or infer the other preferences---in a sense, completing the matrix. ...

24 KB (3,853 words) - 09:45, 30 August 2017
Adversarial Attacks on Copyright Detection Systems
...Adversarial attacks are instances where people intentionally design inputs to cause misclassification in the model. Copyright detection systems are vulnerable to attacks for three reasons: ...

23 KB (3,604 words) - 15:03, 7 December 2020
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
...ms. In such situations, researchers are faced with the challenge of trying to generate results based on partial or incomplete datasets. Regularisation te ...as a regularisation agent, constraining the space of acceptable solutions to help the optimisation converge more quickly and more accurately. ...

23 KB (3,762 words) - 15:51, 6 December 2020
overfeat: integrated recognition, localization and detection using convolutional networks
Recognizing the category of the dominant object in an image is a task to which Convolutional Networks (ConvNets) have been applied for many years. C 1. The first idea in addressing this is to apply a ConvNet at multiple locations in the image, in a sliding window fas ...

19 KB (2,961 words) - 09:46, 30 August 2017
stat946f15/Sequence to sequence learning with neural networks
...amount of work to learn more than one language past childhood. The ability to efficiently and quickly translate between languages would then be of great ...s that capture their meaning, as sentences with similar meanings are close to each other while sentences with different meanings will be far. ...

23 KB (3,755 words) - 19:49, 5 February 2018
stat946w18/Spectral
...amount of work to learn more than one language past childhood. The ability to efficiently and quickly translate between languages would then be of great ...s that capture their meaning, as sentences with similar meanings are close to each other while sentences with different meanings will be far. ...

23 KB (3,755 words) - 17:51, 22 February 2018
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
...amount of work to learn more than one language past childhood. The ability to efficiently and quickly translate between languages would then be of great ...s that capture their meaning, as sentences with similar meanings are close to each other while sentences with different meanings will be far. ...

23 KB (3,755 words) - 22:22, 23 February 2018
DeepVO Towards end to end visual odometry with deep RNN
...lude the VO field, thus the paper proposes a novel deep-learning based end-to-end VO algorithm and then empirically demonstrates its viability. ...ture based methods and direct methods, which differ in the method employed to select reference points. Sparse feature based methods establish reference p ...

16 KB (2,430 words) - 18:30, 16 December 2018
stat441F18/YOLO
...single regression problem, and uses a single convolutional neural network to predict bounding boxes and class probabilities for each box. #''Extreme speed''. This is attributable to the framing of detection as a single regression problem. Both training and ...

19 KB (2,746 words) - 16:04, 20 November 2018
consistency of Trace Norm Minimization
...ia.org/wiki/Document_classification classification], the main objective is to estimate a low-[http://en.wikipedia.org/wiki/Rank_%28linear_algebra%29 rank ...sum_{i=1}^n (z_i - \textrm{tr}(W^TM_i))^2 \;\; \textrm{subject} \; \textrm{to} \;\; \textrm{rank}(W) \leq \delta, ...

24 KB (4,053 words) - 09:45, 30 August 2017
BERTScore: Evaluating Text Generation with BERT
...arried out various experiments in Machine Translation and Image Captioning to show why BertScore is more reliable and robust than the previous approaches ...s create embeddings of a dimensionality much lower than sparse BoW and aim to capture semantics and context. Word embeddings differ in that they will be ...

17 KB (2,510 words) - 01:32, 13 December 2020
STAT946F17/ Automated Curriculum Learning for Neural Networks
at least to Elman (1993). The basic idea is to start small, learn easier aspects of the task or easier sub-tasks, and then ...tly become prevalent in the field (e.g., Bengio et al., 2009), due in part to the greater complexity of problems now being considered. In particular, ...

16 KB (2,534 words) - 14:37, 30 November 2017
CatBoost: unbiased boosting with categorical features
...theoretical derivation of the algorithm along with the intuitive examples to demonstrate the strength and the efficiency of the model. ...work in paper [1,15,18,22] inspired the authors to propose a new algorithm to deal with categorical features. Here are the two main aspects considered in ...

17 KB (2,504 words) - 02:36, 23 November 2021
proposal for STAT946 projects
points in the original high dimensional dataset are equal to the angles between those same three ...to preserve the global geometric property of the manifold while LLE tries to approximate the local geometric property [1]. ...

15 KB (2,332 words) - 09:45, 30 August 2017
F18-STAT841-Proposal
'''Description:''' We use paper cups to make a string phone and talk with friends while learning about sound waves ...on. The primary goal of this project is to develop a machine learning tool to detect patients with pneumonia based on their chest radiographs (CXR). ...

20 KB (2,757 words) - 14:41, 13 December 2018
A Neural Representation of Sketch Drawings
...generation mode, the authors explore the model's latent space that it uses to express the vector image. The paper also explores many applications of thes ...k that employed Sequence-to-Sequence models with Variational Auto-encoders to model English sentences in latent vector space. ...

22 KB (3,638 words) - 21:48, 20 April 2018
relevant Component Analysis
...e may be very unreliable. The goal of Relevant Component Analysis (RCA) is to find a transformation that amplifies relevant variability and suppresses ir ...m a single speaker" [1]. The authors coin the term ''adjustment learning'' to describe learning using chunklets; adjustment learning can be viewed as fal ...

21 KB (3,516 words) - 09:45, 30 August 2017
Synthesizing Programs for Images usingReinforced Adversarial Learning
* It is not clear how to inject knowledge about the data into the model. The provided solution in this paper is to generate programs to incorporate tools, e.g. graphics editors, illustration software, CAD. and ' ...

18 KB (2,816 words) - 18:31, 16 December 2018
Model Agnostic Learning of Semantic Features
...ation, and domain adaptation; however in domain adaptation, we have access to the target domain data somehow, while that is not the case in domain genera ...r domain generalization. The authors claim that their method is orthogonal to previous works. ...

15 KB (2,189 words) - 01:58, 13 December 2020
Deep Learning for Cardiologist-level Myocardial Infarction Detection in Electrocardiograms
...tion of data fusion and machine learning techniques exhibits great promise to healthcare innovation, and the analyses in this paper help further this rea ...gnal from lead II at a time as input. The decision to use lead II compared to the other leads was not explained. ...

21 KB (3,373 words) - 07:19, 15 December 2020
measuring Statistical Dependence with Hilbert-Schmidt Norm
(RKHSs). The measure is refereed to as Hilbert-Schmidt Independence ...nd W.-S. Lee (Eds.), 2005.</ref>, a criterion is introduced which is used to measure the dependence between two ''multivariate'' random variables. More ...

27 KB (4,561 words) - 09:45, 30 August 2017
Robust Imitation of Diverse Behaviors
...ing this challenge, the authors combine several deep generative approaches to imitation learning in a way that accentuates their individual strengths and ...tions of the two approaches and try to combine the two approaches in order to get the best of both worlds. ...

20 KB (3,075 words) - 01:17, 7 April 2018
ShakeDrop Regularization
...an Shake-Shake needs <math>2\times</math> memory as it can use less memory to achieve the same performance. ...seeks to formulate a general expansion of Shake-Shake that can be applied to any residual block based network. ...

21 KB (3,187 words) - 00:34, 17 December 2018
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
...arning step, update only this minimal subset of the parameters. This leads to sparsified gradients because only highly relevant parameters are updated an ...ggest that accuracy is improved rather than being degraded. The name given to the proposed technique is minimal effort back propagation method (meProp). ...

20 KB (3,272 words) - 20:40, 28 November 2017
STAT946F17/Decoding with Value Networks for Neural Machine Translation
...ents the sentence meaning; a decoder, then, processes the "meaning" vector to emit a translation. (Figure 1)[[#References|[1]]] *'''Generalization: Sequence-to-Sequence(Seq2Seq) Model''' ...

22 KB (3,543 words) - 00:09, 3 December 2017
Patch Based Convolutional Neural Network for Whole Slide Tissue Image Classification
...he end the authors proved their claims and findings by testing their model to the classification of glioma and non-small-cell lung carcinoma cases. ...ch labels are provided [1, 2], allowing patch-level supervised classifiers to learn the assortment of cancer subtypes. However, labeling patches requires ...

16 KB (2,470 words) - 14:07, 19 November 2021
Reinforcement Learning of Theorem Proving
...r performance. As a result, in recent years machine learning has been used to replace such heuristics and improve the performance of ATPs. ...at reinforcement learning results in a 42.1% performance increase compared to the base prover(without learning). ...

20 KB (3,127 words) - 20:45, 10 December 2018
One-Shot Object Detection with Co-Attention and Co-Excitation
...the class and location of all the objects present in the image. The aim is to take a query image patch whose class label is not included in the training ...urately predict the class and spatial location for unseen images belonging to the classes the model has been trained on. When a model is trained with K l ...

22 KB (3,609 words) - 21:53, 6 December 2020
Graph Structure of Neural Networks
...rmation through its input neurons through the hidden layers and ultimately to the output neurons. ...predictive performance, which is the main focus of this paper. The aim is to help explain how the addition or deletion of layers, their links, and the n ...

24 KB (3,827 words) - 17:06, 7 December 2020
Summary for survey of neural networked-based cancer prediction models from microarray data
...chers use this technology to compare normal and abnormal cancerous tissues to gain insights into cancer pathology. ...ant feature transformations, aims to combine existing features or map them to a new low-dimensional space. ...

25 KB (3,828 words) - 00:08, 8 December 2020
Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition
...address these issues, it is imperative to automate this task. In addition to applications in human behavior analysis, automating AU recognition also has ...-batches. Typically, in a batch strategy, each iteration uses a batch only to update the parameters and the features learned in that iteration are discar ...

21 KB (3,321 words) - 15:00, 4 December 2017
Generating Image Descriptions
...ons of these types of models were too restrictive, leading to an inability to generate rich descriptions that the human mind is capable of. ...l captions as weak labels in which contiguous segments of words correspond to some particular, but unknown location in the image. By inferring the locati ...

21 KB (3,271 words) - 10:58, 29 March 2018
Music Recommender System Based using CRNN
The objective of this paper is to implement a personalized music recommender system that takes user listening ...should combine music feature recognition and audio processing technologies to extract music features, and combine them with data on user preferences. ...

26 KB (4,154 words) - 04:38, 16 December 2020
the Manifold Tangent Classifier
...d into the models. However, some generic "prior" hypotheses are considered to aid in the general task of learning, and three very common ones are present ...ference on (Vol. 1, pp. 87-94). IEEE.</ref> This hypothesis lends credence to not only the theory of strict semi-supervised learning, but also unsupervis ...

22 KB (3,505 words) - 09:46, 30 August 2017
Time-series Generative Adversarial Networks
...em for various use cases, from handling irregular sampling [13] to building probabilistic forecasting models for univariate time-series.[ ...rmed as '''Time-series Generative Adversarial Network''' or '''TimeGAN'''. To incorporate supervised learning of data into the GAN architecture, this app ...

21 KB (3,059 words) - 00:28, 13 December 2020
from Machine Learning to Machine Reasoning
...82, 273–302.</ref>. Algorithms for inference do exist but they do however, come at a price of reduced expressive capabilities in logical inference and prob ...ut not yet formal or logical. Informal logic is attractive because we hope to avoid the computational complexity that is associated with combinatorial se ...

21 KB (3,225 words) - 09:46, 30 August 2017
Word translation without parallel data
...ilding block. The method proposed in this paper uses an adversarial method to find this mapping between the embedding spaces of two languages without the ...embeddings and the target embeddings, while the mapping is jointly trained to fool the discriminator. ...

24 KB (3,873 words) - 17:24, 18 April 2018
learning Fast Approximations of Sparse Coding
...r being negligibly small. This provides us with a procedure which attempts to flexibly represent unseen instances of the input space. ...eviated by the fact that we would like to assign the majority of influence to only a subset of the new features. We implement this goal using the notion ...

22 KB (3,321 words) - 09:46, 30 August 2017
STAT946F17/ Improved Variational Inference with Inverse Autoregressive Flow
...erence within the mixture of Gaussians model. Note that all the parameters to be estimated can be wrapped into a long vector $\theta = (\pi_{1}, \ldots, ...ll only develop the small part of it necessary for our purposes. But refer to [VISurvey] for a survey. ...

29 KB (5,002 words) - 03:56, 29 October 2017
STAT946F17/Cognitive Psychology For Deep Neural Networks: A Shape Bias Case Study
...olve a variety of complex tasks which earlier methodologies have struggled to excel in. ...parameters of DNNs hinders both basic research as well as its application to real-world problems. ...

22 KB (3,531 words) - 20:30, 28 November 2017
Summary - A Neural Representation of Sketch Drawings
...he possibility of many applications: to be used as a way to teach children to draw, or extend the capacity of an artist by generating many possible next ...sketch-rnn takes a new step by using the same combination and applying it to vector images. ...

25 KB (4,196 words) - 01:32, 14 November 2018
stat946w18/Unsupervised Machine Translation Using Monolingual Corpora Only
# To translate between languages for which large parallel corpora does not exist ...ower bound that any semi-supervised machine translation system is supposed to yield ...

28 KB (4,522 words) - 21:29, 20 April 2018
f10 Stat841 digest
...ning supervised learning] that systematically assigns unlabeled novel data to their label through the characteristics and attributes obtained from observ ...X} </math>, where <math> \mathcal{Y} </math> represents the label assigned to a new data input and <math> \mathcal{X} </math> represents the known featur ...

26 KB (4,027 words) - 09:45, 30 August 2017
Evaluating Machine Accuracy on ImageNet
...images and corresponding labels for over 1000 classes. This paper intends to explore the causes for performance differences between human experts and ma ...some images could belong to multiple classes. As a result, it is possible to underestimate the performance if we assign each image with only one label, ...

29 KB (4,464 words) - 00:08, 15 December 2020
Hierarchical Representations for Efficient Architecture Search
...rk, we need to train it first, and training takes time. So it is important to define a proxy task that can help us better evaluate a network. Here, this ...ues implement Reinforcement Learning where a policy based controller seeks to optimize the expected accuracy of new architectures based on rewards (accur ...

30 KB (4,568 words) - 12:53, 11 December 2018
Searching For Efficient Multi Scale Architectures For Dense Image Prediction
...years, the field of Neural Architecture Search (NAS) has emerged, which is to automatically find an optimal neural architecture for a given task in a wel This paper presents a meta-learning technique to have computers search for a neural architecture that performs well on the t ...

21 KB (3,227 words) - 18:12, 14 December 2018
MarrNet: 3D Shape Reconstruction via 2.5D Sketches
Humans are able to quickly recognize 3D shapes from images, even in spite of drastic differenc ...in constant, and can be seen as an abstraction of the object which is used to reconstruct the 3D shape.]] ...

21 KB (3,383 words) - 22:42, 20 April 2018
what game are we playing
...solution using a primal-dual Newton Method and then using back-propagation to analytically compute the gradients of all the relevant game parameters. ...ation areas as it allows for game-solving (both extensive and normal form) to be integrated as a module in a deep neural network. ...

25 KB (4,131 words) - 23:55, 6 December 2020
Deep Alternative Neural Network: Exploring Contexts As Early As Possible For Action Recognition
...video clip. Note that due to its pervasive nature, different domains refer to action recognition by different names like plan recognition, behavior recog ...by a recurrent layer. In addition, the authors also propose a new approach to select network input based on optical flow. The validity of DANN is carried ...

16 KB (2,500 words) - 13:19, 30 November 2017
Research Papers Classification System
...DA), and K-means clustering. The most important technology the system used to process big data is the Hadoop Distributed File System (HDFS). The system i Use the LDA to group the keywords into topics ...

27 KB (4,484 words) - 04:18, 15 December 2020
Pre-Training Tasks For Embedding-Based Large-Scale Retrieval
...to a real value which represents the score of how relevant the passage is to the query. We desire high scores for relevant passages and low scores other ...ient approximate nearest neighbor search algorithms in the embedding space to find the nearest documents. ...

22 KB (3,409 words) - 22:17, 12 December 2020
Visual Reinforcement Learning with Imagined Goals
...ble to set our own goals and learn from our experiences, and thus are able to accomplish specific tasks without ever having been trained explicitly for t ...t. The robot learns to set and achieve goals with only images as the input to the system. ...

26 KB (4,080 words) - 21:47, 11 December 2018
Superhuman AI for Multiplayer Poker
...the optimal choice is to remain silent, the individuals have an incentive to act in their own self-interest which results in a less than optimal outcome ...mat in the world. The algorithm that is used is not guaranteed to converge to a Nash algorithm outside of two-player zero-sum games. However, it uses a s ...

26 KB (4,248 words) - 00:06, 8 December 2020
stat946w18/Wavelet Pooling For Convolutional Neural Networks
...classification of images and objects. Researchers continue to focus on CNN to improve their performances. ...reduces spatial dimensions of the data throughout the network. It is done to reduce parameters, increase computational efficiency and regulate overfitti ...

26 KB (3,974 words) - 20:50, 11 December 2018
End-to-End Differentiable Adversarial Imitation Learning
...is that the training requires large amounts of expert data, which is hard to obtain. In addition, an agent trained using BC is unaware of how its action ...re it takes each action since the transition function to move from state A to state B is not learned. ...

24 KB (3,880 words) - 23:00, 20 April 2018
human-level control through deep reinforcement learning
...formation about the reward generated by the action. The natural connection to neuroscience and animal behaviour makes reinforcement learning an attractiv ..., using systems involving dopamine in the neurons with a similar structure to reinforcement learning algorithms <ref> Schultz, W., Dayan, P. & Montague, ...

25 KB (4,026 words) - 09:46, 30 August 2017
Efficient kNN Classification with Different Numbers of Nearest Neighbors
...ry popular due to its relatively robust performance given how simple it is to implement. It is robust because the predicted value is only depend on the l ...t sample. The authors of this paper presented the kTree and k*Tree methods to solve this research question. ...

23 KB (3,748 words) - 03:46, 16 December 2020
Universal Style Transfer via Feature Transforms
...mage is called the style image, whose style, but the not content is copied to the content image. ...n these two extremes by using only whitening and coloring transforms (WCT) to transfer a style within a feedforward image reconstruction architecture. No ...

25 KB (4,065 words) - 20:10, 28 November 2017
XGBoost
...orithm. Built up from the tree gradient boosting algorithm, it can be used to solve problems such as classification, regression and ranking problems in a ...to minimize, has two additional overfitting prevention techniques applied to it; shrinkage and column subsampling. ...

21 KB (3,313 words) - 02:21, 5 December 2021
FeUdal Networks for Hierarchical Reinforcement Learning
...g has been hugely successful in a variety of domains, it has not been able to succeed in environments which have sparsely spaced reward signals. Take for ...redit assignment. Essentially, the agent is not able to attribute a reward to an action taken several timesteps back. ...

20 KB (3,237 words) - 01:59, 3 December 2017
Unsupervised Neural Machine Translation
...from having poor resources for translation (e.g. Basque), which could lead to the problem of the dataset being too small (Koehn & Knowles, 2017). Other authors have recently tried to address this problem using semi-supervised approaches (small set of paralle ...

28 KB (4,293 words) - 00:28, 17 December 2018
stat946w18/Tensorized LSTMs
...ing the number of units in a hidden layer) causes the number of parameters to increase quadratically which in turn increases the time required for model ...xperiments that were conducted on five challenging sequence learning tasks to show the potential of the proposed model. ...

25 KB (4,099 words) - 22:50, 20 April 2018
Functional regularisation for continual learning with gaussian processes
...nction. The posterior belief is then used in optimisation as a regulariser to prevent the model from completely deviating from the earlier tasks. The est ...re often remains heuristic; 2) It requires a large quantity of stored data to achieve good performance. ...

26 KB (4,302 words) - 23:25, 7 December 2020
When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, l2-consistency and Neuroscience Applications: Summary
...ding to small datasets. Similar datasets from multiple sites can be pooled to potentially ...ow-up questions along with the original data set facilitate as necessities to deduce a viable prediction. ...

23 KB (3,530 words) - 20:45, 28 November 2017
regression on Manifold using Kernel Dimension Reduction
...hm for discovering a manifold that best preserves the information relevant to a non-linear regression. The approach introduced by the authors involves co ...ce of manifold to represent the covariance vector and to choose a function to represent the boundary for classification (i.e. regression surface). As a r ...

26 KB (4,280 words) - 09:45, 30 August 2017
Obfuscated Gradients Give a False Sense of Security Circumventing Defenses to Adversarial Examples
...lassify with high confidence. These attacks pose a major threat that needs to be addressed before these systems can be deployed on a large scale, especia ...much lower than claimed. In fact, the majority of these attacks were found to be ineffective against true iterative white box attacks. ...

27 KB (3,974 words) - 17:54, 6 December 2018
Task Understanding from Confusing Multi-task Data
...iciency and cut costs, the limitations of Narrow AI encouraged researchers to look into General AI. ...rresponds to 3 labels: “red”, “apple” and “sweet”. These labels correspond to 3 different classification tasks: color, fruit, and taste. ...

27 KB (4,358 words) - 15:35, 7 December 2020
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
...s general and model-agnostic, in the sense that it can be directly applied to any learning problem and model that is trained with a gradient descent proc ...model’s parameters such that a small number of gradient updates will lead to fast learning on a new task. The paper shows the effectiveness of the propo ...

26 KB (4,205 words) - 10:18, 4 December 2017
DETECTING STATISTICAL INTERACTIONS FROM NEURAL NETWORK WEIGHTS
...ant to know what interactions are accounted for in risk prediction models, to compare against known interactions from existing medical literature. ...trees, etc. which are much more interpretable. In this paper, we are going to present one way of implementing interpretability in a neural network. ...

21 KB (3,121 words) - 01:08, 14 December 2018
Loss Function Search for Face Recognition
...ed a new loss function using a scale parameter to produce higher gradients to well-separated samples which can reduce the softmax probability. ...matically determines the search space by leveraging reinforcement learning to the search loss functions during the training process, though the drawback ...

26 KB (4,157 words) - 09:51, 15 December 2020
Hierarchical Question-Image Co-Attention for Visual Question Answering
...the COCO-QA dataset. By using ResNet, the performance is further improved to 62.1% for VQA and 65.4% for COCO-QA.'' In VQA, an algorithm needs to answer text-based questions about images in ...

27 KB (4,375 words) - 19:50, 28 November 2017
Deep Double Descent Where Bigger Models and More Data Hurt
...d on this idea, concepts like overfitting and under-fitting are introduced to classification model training process. However, the paper presents the mode ...nventional idea. However, once the model has sufficiently large complexity to interpolate (- training error), then modern tuition as this paper suggests ...

19 KB (2,731 words) - 21:29, 20 November 2021
stat946s13
|width="800pt"|Second paper (The paper that you are going to write a critic on it. This is different from the paper that you have chosen ...doi/abs/10.1198/004017004000000563#.UdMOPu2RBDQ]||[[ROBPCA: A New Approach to Robust Principal Component Analysis|Summary]] ...

29 KB (4,816 words) - 09:46, 30 August 2017
A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques
...essential for the text mining field, from preprocessing and classification to clustering and extraction techniques, and also touches on applications of t ...odels used in the context of text mining aims to assign predefined classes to text documents, and some of the various models used include Naive Bayes, Ne ...

21 KB (3,252 words) - 14:03, 27 November 2018
stat441w18/Convolutional Neural Networks for Sentence Classification
...illion publicly available words on google news. These words were then used to train a simple CNN with one layer of convolution. The simple model achieved ...nd customer reviews. Despite its limitations, the results were competitive to more complex models that use pooling schemes. ...

21 KB (3,330 words) - 03:15, 13 March 2018
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
...or every input image giving a total of 5*5*3 = 75 weights. It is important to note that the extent of the connectivity along the depth axis is 3 as the d ...ve field of CNNs can be found [https://syncedreview.com/2017/05/11/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks/ here] ...

27 KB (4,400 words) - 15:12, 7 November 2017
Spherical CNNs
...d to another position in the image. However, this does not correspond well to spherical signals since projecting a spherical signal onto a plane will res ...perfectly symmetrical grids for the sphere exists which makes it difficult to define the rotation of a spherical filter by one pixel and the computationa ...

23 KB (3,814 words) - 22:53, 20 April 2018
Neural Speed Reading via Skim-RNN
...fed back to the network as inputs and hence creating a recurrent structure to handle varying lengths of data. This makes it suitable for tasks such as un ...em, it is not uncommon to encounter parts of a passage that are irrelevant to answering the query. ...

27 KB (4,321 words) - 05:09, 16 December 2020
deep neural networks for acoustic modeling in speech recognition
...replace GMMs with DNNs in the speech recognition systems. DNNs are proved to outperform GMMs in both small and large vocabulary speech recognition tasks ...used by stopping the training when the accuracy over validation set starts to decrease. The pretraining is essential when the amount of training data is ...

24 KB (3,699 words) - 09:46, 30 August 2017
Dialog-based Language Learning
...ifically, this paper explores whether we can train machine learning models to learn from dialog. *Evaluated some baseline models on this data and compared them to standard supervised learning. ...

26 KB (4,081 words) - 13:59, 21 November 2021
stat946F18/differentiableplasticity
...cal agents contrast this learning style by exhibiting a remarkable ability to learn quickly and efficiently from ongoing experience. ...ectively, learning stops with the training step. If a different task needs to be considered, then the agent must be trained again from scratch. ...

27 KB (4,100 words) - 18:28, 16 December 2018
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
...subfield of computer science concerned with using computational techniques to learn, understand, and produce human language content” (Hirschberg & Mannin ...ry-consuming. This serves as the major motivation for this paper’s authors to develop a new technique utilized in RNN, which is particularly efficient at ...

28 KB (4,651 words) - 20:18, 28 November 2017
Summary of A Probabilistic Approach to Neural Network Pruning
...proposes that the subnetworks can achieve similar accuracy without having to be further trained. However, finding these lottery tickets inside a large n ...theoretical guarantees of pruning. This study, ''A Probabilistic Approach to Neural Network Pruning'' by Xin Qian and Diego Klabjan [18], focuses on the ...

28 KB (4,367 words) - 00:30, 23 November 2021
Zero-Shot Visual Imitation
...at the fact that new tasks need a set of new demonstrations for the robot to learn from. In this paper, an alternative ...se the expert can now distill a large number of tasks easily (and quickly) to the agent. ...

31 KB (4,977 words) - 18:42, 16 December 2018
STAT946F17/ Teaching Machines to Describe Images via Natural Language Feedback
...rd in that we can easily point to where the mistakes occur and suggest how to correct them. ...n also be seen as a multimodal problem where the whole network/model needs to combine the solution space of learning in both the image processing and tex ...

23 KB (3,760 words) - 10:33, 4 December 2017
Conditional Image Synthesis with Auxiliary Classifier GANs
...ddition, 84.7% of the classes have samples exhibiting diversity comparable to real ImageNet data." [[#References | (Odena et al., 2016)]] ...m the ImageNet dataset. They show that this architecture makes it possible to split the generation process into many sub-models. They further suggest tha ...

33 KB (5,219 words) - 10:24, 4 December 2017
stat946w18/Towards Image Understanding From Deep Compression Without Decoding
...hitectures such as convolutional autoencoders or recurrent neural networks to compress and reconstruct RGB images and outperform classical techniques suc ...t of symbols <math>z </math>. These symbols are then losslessly compressed to a bitstream, from which a decoder reconstructs an image <math>{\hat{x}} </m ...

29 KB (4,246 words) - 20:18, 10 December 2018
Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias
...ese approaches have the capability to be robust enough to domain shift and to be used for real-world data. It is an undeniable fact that there is a wide ...earning-based approaches need to move out of simulators in the labs and go to real environments such as real homes so that they can learn from real datas ...

26 KB (4,201 words) - 18:21, 14 December 2018
Training And Inference with Integers in Deep Neural Networks
...ith low energy availability, the energy costs must be reduced while trying to maintain as high network performance as possible and/or practical. ...cessing Unit developed by Google for Tensor operations. TPU is comparative to a GPU but produces higher IO per second for low precision computations. ...

20 KB (2,998 words) - 21:23, 20 April 2018
learn what not to learn
...ees to climb"). Then a machine learning model can be trained to generalize to unseen states. ...with high probability. '''Note that the core assumption is that it is easy to predict which actions are invalid or inferior in each state and leverage th ...

29 KB (4,751 words) - 13:38, 17 December 2018
Improving neural networks by preventing co-adaption of feature detectors
...ning data. Called “dropout,” this process is also an efficient alternative to train many separate networks and average their predictions on the test set. ...n to become more robust. Another interpretation is that dropout is similar to training an ensemble of models since each epoch with randomly dropped neuro ...

29 KB (4,639 words) - 05:51, 15 December 2020
Adacompress: Adaptive compression for online computer vision services
...optimized for Human Visual System (HVS) but not the machines (i.e. DNNs). To be aligned with HVS the authors reconfigure the JPEG while maintaining the ...r the benefit of an improved compression ratio. Therefore, it is important to develop deep learning model-based image compression methods that reduce dat ...

27 KB (4,274 words) - 00:07, 8 December 2020
Neural ODEs
...the set of parameters or weights in state <math>t</math>. It is important to note that it has been shown (Lu et al., 2017)(Haber ...escription, if the number of layers and step size between layers are taken to their limits, then Equation 1 can instead be described continuously in the ...

24 KB (3,891 words) - 15:01, 7 December 2020
STAT946F17/ Learning a Probabilistic Latent Space of Object Shapes via 3D GAN
...teria, allows the generator to implicitly capture object structure leading to high quality and novel 3D objects ...m latent space to the space of generated objects automatically allowing it to bypass the need for reference CAD models when generating new 3D samples ...

26 KB (4,005 words) - 10:58, 28 October 2017
Learning the Number of Neurons in Deep Networks
...odel is still challenging, especially for very large datasets, which leads to high cost on memory and reduction in speed. ...the effects of a particular neuron. Therefore, the approach does not need to learn a redundant network successfully and then reduce its parameters, inst ...

24 KB (3,886 words) - 01:20, 3 December 2017
STAT946F17/Conditional Image Generation with PixelCNN Decoders
...d insight into the invariances of the embeddings which enabled the authors to generate different poses of the same person based on a single image. Finall | style="text-align: center;" | generated samples tend to be blurry. ...

31 KB (4,917 words) - 12:47, 4 December 2017
Imagination-Augmented Agents for Deep Reinforcement Learning
...RL model needs planning and inference. This kind of game raises challenges to RL. ...lized to new tasks in the same environment. A model-based method is trying to build a model for the environment. By querying the model, agents can avoid ...

29 KB (4,491 words) - 20:24, 28 November 2017
DON'T DECAY THE LEARNING RATE , INCREASE THE BATCH SIZE
...orter training times. The authors present conclusive experimental evidence to prove the empirical benefits of decaying learning rate can be achieved by i ...er et al., 2017; You et al., 2017a), this has motivated researchers to try to speed up this optimization process by taking bigger steps, and hence reduce ...

27 KB (4,025 words) - 13:28, 17 December 2018
Surround Vehicle Motion Prediction
...aster improves the recognition time of the leading vehicle and contributes to the improvement of prediction ability. ...used in sensor fusion for state estimation that allows the vehicle's state to be predicted while taking into account the uncertainty associated with inpu ...

29 KB (4,569 words) - 23:12, 14 December 2020
conditional neural process
...on at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. Hence the authors propose a family of neural mod ...e first phase learns the statistics of a generic domain without committing to a specific learning task; the second phase learns a function for a specific ...

32 KB (4,970 words) - 00:26, 17 December 2018
Learning to Navigate in Cities Without a Map
[https://arxiv.org/pdf/1804.00168.pdf Learning to Navigate in Cities Without a Map] ...forcement learning (RL), it suffers from data inefficiency and sensitivity to changes in the environment. Thus, it is unclear whether this method could b ...

28 KB (4,494 words) - 00:24, 17 December 2018
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
...n from Tel Aviv University. This paper is part of the NIPS 2018 conference to be hosted in December 2018 at Montréal, Canada. This paper summary is based ...framework for capturing such effects is structured prediction, which seeks to predict structured objects (such as graphs with nodes and edges) rather tha ...

29 KB (4,603 words) - 21:21, 6 December 2018
graves et al., Speech recognition with deep recurrent neural networks
...idirectional multilayer Long Short-term Memory (LSTM) ANNs with 1–5 layers to phoneme recognition on the TIMIT acoustic phonme corpus, which is the stand ...s different numbers of iterations taken in the optimization algorithm used to train the models, and multiple trials for statistical validity were not per ...

25 KB (3,828 words) - 09:46, 30 August 2017
Unsupervised Domain Adaptation with Residual Transfer Networks
...hout the ability to observe this shift directly. The goal of this paper is to simultaneously learn adaptive classifiers and transferable features from la # A kernel-based penalty to ensure that the abstract representations generated by the networks hidden l ...

35 KB (5,630 words) - 10:07, 4 December 2017
Describtion of Text Mining
...approaches can be considered. The different text mining approaches relate to two main methods: knowledge delivery and traditional data mining methods. ...orm of structured, semi-structured, and unstructured text. It is important to note that text mining covers different sets of algorithms and topics that i ...

31 KB (4,992 words) - 05:11, 15 December 2020
Being Bayesian about Categorical Probability
...s confident predictive behaviors (Xie et al., 2016; Pereyra et al., 2017). To achieve performance with better generalization, some more effective regular ...label. This technique gives a computationally cheap way of being Bayesian to get well-calibrated uncertainty estimates on neural network classifications ...

29 KB (4,651 words) - 10:57, 15 December 2020
Deep Exploration via Bootstrapped DQN
...ntains background knowledge from Section 2-7 (except Section 5). Feel free to skip if you already know. == Intro to Reinforcement Learning == ...

33 KB (5,439 words) - 14:17, 3 December 2017
DCN plus: Mixed Objective And Deep Residual Coattention for Question Answering
...ally very detailed, having a shallow knowledge from the context would lead to poor and unacceptable performance. Moreover, the model should gather all th ...s, the importance of the QA tasks and their practical uses encouraged many to gather and crowdsource useful and more realistic datasets. The Stanford Que ...

24 KB (3,769 words) - 17:49, 14 December 2018
End to end Active Object Tracking via Reinforcement Learning
...g box labeling. In addition, Camera Control is non-trivial, which can lead to many expensive trial-and-errors in the real world. To address these challenges, this paper presents an end-to-end active tracking solution via deep reinforcement learning. More specific ...

29 KB (4,453 words) - 18:27, 16 December 2018
policy optimization with demonstrations
...nts where reward signals are sparse and rare. There are currently two ways to solve such exploration problems in RL: 1) Guide the agent to explore states that have never been seen. ...

30 KB (4,632 words) - 00:32, 17 December 2018
Speech2Face: Learning the Face Behind a Voice
...revealed correlations between craniofacial features and voice in addition to the correlation between dominant features (gender, age, ethnicity, etc.) an ...y further investigation or practical use of this technology will be tested to represent the intended population and also if the data does not reflect thi ...

32 KB (5,152 words) - 03:36, 15 December 2020
Convolutional Sequence to Sequence Learning
'''Sequence to sequence learning''' has been used to solve many tasks such as machine translation, speech recognition, and text ...other. This allows to precisely control the maximum length of dependencies to be modeled. ...

27 KB (4,178 words) - 20:37, 28 November 2017
STAT946F17/ Coupled GAN
...The authors of the paper we are reviewing focus on proposing an extension to the class of GANs. ...for multi-domain image generation. One way around this bottleneck is thus to use their proposed CoGAN methodology. More details of how the authors achie ...

32 KB (4,965 words) - 15:02, 4 December 2017
Countering Adversarial Images Using Input Transformations
...ation is applied to the original image of a panda, changing the prediction to a gibbon. ...mage classification systems by transforming the images before feeding them to a Convolutional Network Classifier. ...

32 KB (4,769 words) - 18:45, 16 December 2018
proposal Fall 2010
...d at the same time also, after suitable weighting has been done, be closer to the directions of variation of its assigned class. ...h> dimensions, and a new data point is given. To assign the new data point to a class, we can proceed using the following steps: ...

28 KB (4,210 words) - 09:45, 30 August 2017
Deep Reinforcement Learning in Continuous Action Spaces a Case Study in the Game of Simulated Curling
...or large, non-convex continuous action spaces are not directly applicable. To solve this issue, we conduct a policy search with an efficient stochastic c ...not necessarily suitable for continuous action space problems. This is due to the fact that deterministic discretization of a continuous action space cau ...

35 KB (5,619 words) - 18:39, 10 December 2018
stat441w18/Image Question Answering using CNN with Dynamic Parameter Prediction
...t decade has seen a resurgence of interest in computer vision research due to breakthroughs in deep learning and advancements in computational capabiliti ...the question. A broader Image Q&A model must be flexible in order to adapt to different questions. ...

32 KB (5,284 words) - 22:03, 19 March 2018
Bag of Tricks for Efficient Text Classification
...ting their usage for very large datasets. The motivation for this paper is to determine whether a simpler text classifier, which is inexpensive in terms ...are used. The simplicity of linear classifiers allows a model to be scaled to very large data set while maintaining its good performance. ...

32 KB (5,160 words) - 22:32, 27 March 2018
Wasserstein Auto-encoders
...and GANs, but a model which combines the best of both GANs and VAEs is yet to be discovered. To be more specific on the OT: ...

30 KB (4,923 words) - 19:25, 10 December 2018
stat946F18/Beyond Word Importance Contextual Decomposition to Extract Interactions from LSTMs
...for analyzing individual predictions made by the LSTMs without any change to the underlying original model. The problem of sentiment analysis is chosen ...n domain, this paper shows how the contextual decomposition method is used to successfully extract positive and negative negations from an LSTM. This pap ...

31 KB (5,069 words) - 18:21, 16 December 2018
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
...the prediction. They also propose a method called sub-modular optimization to explain models globally by selecting a representative individual prediction ...en overestimate their model's accuracy, propagate feedback loops, and fail to notice data leaks. Common solutions in the literature are Gestalt and Model ...

36 KB (5,713 words) - 20:21, 28 November 2017
stat946w18/Self Normalizing Neural Networks
...rare occasions, they have very shallow network architectures with just up to four layers [10]. ...the aforementioned normalization techniques involve adding external layers to the model and can slow down computation, which may already be slow when wor ...

45 KB (6,836 words) - 23:26, 20 April 2018
f11Stat841proposal
should be able to predict the disease state of the patient/animal, of markers required in order to make good predictions. Our results ...

26 KB (4,036 words) - 14:56, 11 October 2020
CapsuleNets
...tecture, the authors create CapsuleNets, a network that they claim is able to learn image representations in a more robust, human-like manner. With only ...ent the existence of the entity and to force the orientation of the vector to represent the properties of the entity. The length of the vector output of ...

32 KB (5,106 words) - 00:36, 17 December 2018
stat946F18/Autoregressive Convolutional Neural Networks for Asynchronous Time Series
...t be sufficient. However, their relatively good performance could allow us to combine such linear econometric models with deep neural networks that can l ...ip between past and future observations is not deterministic, this amounts to expressing the conditional probability distribution as a function of the pa ...

29 KB (4,577 words) - 10:13, 14 December 2018
Modular Multitask Reinforcement Learning with Policy Sketches
...t policy structure, without indicating how high-level behaviors should try to use primitive percepts or actions. ...ub tasks. However, in the actual settings, the learner can only get access to encoded results. ...

32 KB (4,994 words) - 14:25, 3 December 2017
Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin
...pact, is known as DNA flanking region or ‘gene region’ which is considered to cover 10k base pair centered at the transcription start site (TSS) (i.e., a ...reads and the rest are signal reads of various histone marks. The goal is to understand which histone marks are the most important and how they interact ...

33 KB (4,924 words) - 20:52, 10 December 2018
a neural representation of sketch drawings
...e authors present a recurrent neural network, sketch-rnn, that can be used to construct stroke-based drawings. Besides new robust training methods, they ...ges so that it might generalize abstract concepts in a manner more similar to how humans do. ...

30 KB (4,807 words) - 00:40, 17 December 2018
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
...rs consider two questions: how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that gen ...ed by the Bayesian evidence, which penalizes sharp minima but is invariant to model parameterization. They also demonstrate that, when one holds the lear ...

34 KB (5,220 words) - 20:32, 10 December 2018
Fix your classifier: the marginal value of training the last weight layer
...example, convolutional neural networks (CNNs) are used to classify images to a semantic category. Typically, a learned affine transformation is placed a ...m, the authors propose that the final layer of the classifier be fixed (up to a global scale constant). They argue that with little or no loss of accurac ...

34 KB (5,105 words) - 00:39, 17 December 2018
stat946f11pool
...two nodes in the graph. This algorithm can be easily modified to allow us to determine the same information in an undirected graph. An undirected graph [[File:UnDirGraphCanon.png|thumb|right|Fig.20 The only way to connect 3 nodes in an undirected graph.]] ...

100 KB (18,249 words) - 09:45, 30 August 2017
stat946f10
...is also obtained from the data. The main proposal of this technique is not to choose a kernel function a priori like classical kernel PCA or construct a ...ective functions should be considered. The aim of dimensional reduction is to map high dimension data into a low dimension space with the minimum informa ...

65 KB (11,332 words) - 09:45, 30 August 2017
stat341f11
Please contribute to the discussion of splitting up this page into multiple pages on the [[{{TAL ...were posted on the Wiki Course Note page for [[Stat946f11|STAT 946]]. Add to them as necessary for consistent notation on this page. ...

139 KB (23,688 words) - 09:45, 30 August 2017
stat341 / CM 361
...lling a fair die repetitively to produce a series of random numbers from 1 to 6). One way to generate pseudo random numbers from the uniform distribution is using the ' ...

145 KB (24,333 words) - 09:45, 30 August 2017
stat946f11
data, etc. A problem related to medical diagnosis is, "detecting and quantifying the causes of a disease". ...but this approach will simplify the commutations and as mentioned help us to solve a lot of problems in different research areas. ...

162 KB (28,558 words) - 09:45, 30 August 2017
stat841f14
...in 1901, is a statistical technique for data analysis. Its main purpose is to reduce the dimensionality of the data. ...tors are called the ''''Principal Components''''. In other words, PCA aims to reduce the dimensionality of the data, while preserving its information (or ...

220 KB (37,901 words) - 09:46, 30 August 2017
stat841
...mounts of data are generated constantly, and the goal of classification is to learn from data. Potential application areas include handwritten post codes ...on <math>\,h</math>, by using a training data set, which will then be able to accurately classify new data inputs. ...

263 KB (43,685 words) - 09:45, 30 August 2017
stat340s13
in your `wikicoursenote' contribution , you have to cite the ...in addition to citing the original source you have to use quotation marks to ...

370 KB (63,356 words) - 09:46, 30 August 2017
stat841f11
Students will need to contribute to the wiki for 20% of their grade. Go to editor sign-up, and use your UW userid for your account name, and use your ...

314 KB (52,298 words) - 12:30, 18 November 2020
stat841f10
...lem of how to systematically assign unlabeled (classes unknown) novel data to their labels (classes or groups or types) by using knowledge of their featu ...nown feature values into the model to determine how much the input belongs to each class. ...

451 KB (73,277 words) - 09:45, 30 August 2017

Search results

Page title matches

Page text matches

Navigation menu

Search