Search results

Jump to navigation Jump to search
View (previous 50 | ) (20 | 50 | 100 | 250 | 500)

Page title matches

Page text matches

  • ...experience, yet in complex domains for which a simulator is not available to the agents, the performance of model-based agents employing standard planni ...ct useful knowledge gathered from model simulations. This allows the agent to benefit from model-based imagination without the pitfalls of conventional m ...
    2 KB (210 words) - 20:39, 9 March 2018
  • ...ed using a convex combination to a number of clusters rather than uniquely to one cluster. This is an unsupervised version of the so-called multi-class c ...e data, authors have recently proposed discrete analogues to PCA. We refer to the method as multinomial PCA(mPCA) because it is a precise multinomial ana ...
    2 KB (321 words) - 09:45, 30 August 2017
  • ...ean discrepancy (JMMD) criterion. Adversarial training strategy is adopted to maximize JMMD such that the distributions of the source and target domains ...
    760 bytes (109 words) - 15:32, 2 October 2017
  • #REDIRECT [[link to my paper]] ...
    30 bytes (5 words) - 09:45, 30 August 2017
  • #REDIRECT [[sandbox to test w2l]] ...
    33 bytes (6 words) - 09:46, 30 August 2017
  • ...lt but there exists a probability distribution function g(x) which is easy to sample from, then <math>I</math> can be written as<br> ...playstyle E_g(w(x)) \rightarrow</math>the expectation of w(x) with respect to g(x) ...
    2 KB (395 words) - 09:45, 30 August 2017
  • #REDIRECT [[learning Spectral Clustering, With Application To Speech Separation]] ...
    81 bytes (9 words) - 09:45, 30 August 2017
  • #REDIRECT [[neural Machine Translation: Jointly Learning to Align and Translate]] ...
    81 bytes (10 words) - 09:46, 30 August 2017
  • #REDIRECT [[rOBPCA: A New Approach to Robust Principal Component Analysis]] ...
    75 bytes (10 words) - 09:46, 30 August 2017
  • #REDIRECT [[a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization]] ...
    105 bytes (12 words) - 09:45, 30 August 2017
  • #REDIRECT [[a Rank Minimization Heuristic with Application to Minimum Order System Approximation]] ...
    98 bytes (12 words) - 09:45, 30 August 2017
  • To properly train a neural network a large labeled dataset, however large data These models scale linearly in proportion to the number of classes in the data sets. The number of evaluations could be ...
    466 bytes (70 words) - 09:46, 30 August 2017
  • #REDIRECT [[a Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...
    131 bytes (15 words) - 09:45, 30 August 2017
  • ...the methods used for video-based face recognition were based on the still-to-still techniques which aimed at selecting good frame and then performed som ...s kind of application, which is called online video. The other scenario is to process the video content offline, like indexing the meeting records or ana ...
    3 KB (512 words) - 09:45, 30 August 2017
  • ...ifically, this paper explores whether we can train machine learning models to learn from dialog. *Evaluated some baseline models on this data and compared them to standard supervised learning. ...
    2 KB (309 words) - 19:52, 17 November 2020
  • ...RECT [[graphical models for structured classification, with an application to interpreting images of protein subcellular location patterns]] ...
    145 bytes (17 words) - 09:46, 30 August 2017
  • #REDIRECT [[stat946f15/Sequence to sequence learning with neural networks]] ...
    75 bytes (10 words) - 09:46, 30 August 2017
  • #REDIRECT [[from Machine Learning to Machine Reasoning]] ...
    56 bytes (7 words) - 09:46, 30 August 2017
  • ...following table. Put your name and a link to the paper that you are going to present. Chose a date between Nov 16 and Dec 2 (inclusive). .../Correlate/pmd.pdf], [[A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis|Summary]] ...
    4 KB (570 words) - 09:45, 30 August 2017
  • ...a document they will not examine the next document. This model is similar to hidden Markov model in that there is a conditional dependency between the p ...that if a URL clicked by a user that means it’s both examined and relevant to the query . In another word , given a query q , position i and URL u the pr ...
    3 KB (593 words) - 09:46, 30 August 2017
  • ...hence because of this for whatever data we need to feed in the network has to be continuous in nature. Images can easily be represented as real-valued ve ...parameters it needs to learn is quite high. There have been some solutions to it: ...
    4 KB (646 words) - 19:44, 26 October 2017
  • ...batch-normalization layers right before the activations (to have the input to the activations be normalized as desired). Both networks were trained with ...he 15th, 50th, and 85th percentiles of the input were recorded. The figure to the left demonstrates how these values changed during training. The y axis ...
    4 KB (637 words) - 02:07, 28 November 2018
  • ...properties (cite). Algorithms for inference do exist but they do however, come at a price of reduced expressive capabilities in logical inference and prob ...
    852 bytes (116 words) - 09:46, 30 August 2017
  • ==A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis== [[A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis]] ...
    2 KB (222 words) - 09:45, 30 August 2017
  • ...>; on the other hand we would reject the samples if the ratio is not close to 1. At x=9; we will reject samples according to the ratio <math> \frac {f(x)}{c \cdot g(x)} </math> after sampling from <ma ...
    6 KB (937 words) - 09:45, 30 August 2017
  • 3 data sets are used to compare CSL to existing methods, 1 function regression task and 2 image classification tas ...s <math>f_j</math> as well as determine which mapping function corresponds to each of the <math>m</math> observations. 3 scalar-valued, scalar-input func ...
    5 KB (878 words) - 19:25, 15 November 2020
  • ...d during training time. Here by defining tasks as domains, the paper tries to overcome the problem in a model-agnostic way. ...
    1 KB (200 words) - 15:47, 9 November 2020
  • ...sed for the uniform distribution, other methods must be developed in order to generate pseudo random numbers from other distributions. ...he fact that when a random sample from the uniform distribution is applied to the inverse of a cumulative density function (cdf) of some distribution, th ...
    5 KB (836 words) - 09:45, 30 August 2017
  • ...on of its classes. This decomposition is always possible and it is reduced to one class only in the case of an irreducible chain. ...ath> The state 3 can go to every other state but none of the others can go to it ...
    7 KB (1,129 words) - 09:45, 30 August 2017
  • -\textbf{u}^T\textbf{a} \; \textrm{ subject } \; \textrm{ to } \; \|\textbf{u}\|^2_2 \leq 1, \; \|\textbf{u}\|_1 \leq c_1 and we differentiate, set the derivative to 0 and solve for <math>\textbf{u}</math>: ...
    2 KB (311 words) - 09:45, 30 August 2017
  • '''NOTE: Wiki has been migrated from wikicoursenote.com to wiki.math.uwaterloo.ca/statwiki''' ==Go to [[stat841f10|Stat441/841 & CM 463/763-Fall 2010]] == ...
    5 KB (769 words) - 22:53, 5 September 2021
  • ...pefully, the pattern of the teams and lineups in the latent space can lead to interesting conclusions. Secondly, we apply the selected methods to lineup data sets and get the plots of the lineups in the low-dimensional sp ...
    6 KB (983 words) - 09:46, 30 August 2017
  • ...<math>f(x)</math> so that a variation of importance estimation can be used to estimate an integral in the form<br /> All that is required is a Markov chain which eventually converges to <math>f(x)</math>. ...
    5 KB (865 words) - 09:45, 30 August 2017
  • ...ork, the inputs are no longer normalized at each hidden layer. So, we want to reduce this internal covariate shift by normalizing the input at each hidde ...However, this is a very expensive operation, and does not necessarily lead to a gradient function that is well defined. ...
    6 KB (931 words) - 21:10, 28 November 2018
  • ...r the gander , some of which occasionally amuses but none of which amounts to much of a story” contains negative sentiment, but it is not immediately cle This competition seeks to implement machine learning algorithms that can determine the sentiment of a ...
    7 KB (1,125 words) - 09:46, 30 August 2017
  • ...n the Bayesian and Frequentist views on probability, along with references to '''Bayesian Inference'''. ...enough, by the central limit theorem, the Normal distribution can be used to approximate a Binomial distribution. ...
    6 KB (924 words) - 09:45, 30 August 2017
  • ...n up your name at the moment. When you chose the paper that you would like to present, add its title and a link to the paper. ...
    3 KB (418 words) - 09:45, 30 August 2017
  • ...ces as the parameters in the model are tuned, and thus the model is unable to evolve. ...would result in the error values of the deeper network being at most equal to those of the shallower network. However, this result is not seen in practic ...
    6 KB (1,020 words) - 12:01, 3 December 2021
  • ...riants of this model have been introduced by the authors, two of which try to learn task-specific word vectors for words. It is observed that learning ta ...different models for doing different tasks. For instance, they can be fed to CNNs for document or sentence classification. The vector representations us ...
    7 KB (1,086 words) - 22:49, 13 November 2018
  • ...playstyle E_g(h(x)) \rightarrow</math>the expectation of h(x) with respect to g(x), where <math>\displaystyle \frac{f(x)}{g(x)} </math> is a weight <math The method of Importance Sampling is simple but can lead to some problems. The <math> \displaystyle \hat I </math> estimated by Importa ...
    6 KB (1,083 words) - 09:45, 30 August 2017
  • |width="30pt"|Link to the paper |width="30pt"|Link to the video ...
    5 KB (642 words) - 23:29, 1 December 2021
  • {{Cleanup|date=September 2010|reason=explain what needs to be done}} ...
    255 bytes (46 words) - 09:45, 30 August 2017
  • ...nvolutional Neural Network, and Support Vector Machine models are proposed to address this issue. In 2019, Aashrith and et al. used CNN to recognize traffic signs. They achieved 99.18% accuracy on Belgium Data and ...
    4 KB (515 words) - 18:44, 17 December 2021
  • The Indian buffet process can also be used to define a prior distribution in any setting where the where <math> \alpha </math> is a hyper-parameter, which is similar to the parameter defined in DP. ...
    6 KB (1,032 words) - 09:46, 30 August 2017
  • ...Trevor Hastie. (2009) "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis". ''Biostati The penalized matrix decomposition can be used to obtain a version of sparse PCA. In this case, ...
    2 KB (277 words) - 09:45, 30 August 2017
  • ...tructure in the data explicityly but most of them are unable to generalize to new added data points as only implicit non-linear transformation is given. ...that it can handle out-of-sample extensions. Also, even though the matrix to be learned may be infinite-dimensional, it can be fully represented in term ...
    6 KB (1,007 words) - 09:46, 30 August 2017
  • ...o account to detect variants of mutations. This procedure should enable us to prognosis, diagnosis, and/or control a wide variety of diseases. ...type of interruptions on this important step of gene expression would lead to various kind of disease such as cancers and neurological disorders. ...
    6 KB (980 words) - 09:46, 30 August 2017
  • ...<math>\lambda_{\max}(\cdot)</math> at the matrix <math>X \in S_n</math>. To do this we must first define the subgradient. ...tion we are interested in is <math>\lambda_{\max}(\cdot)</math>. In order to define the subgradiant of this function we must first ensure it is convex. ...
    3 KB (589 words) - 09:45, 30 August 2017
  • ...order to get a distribution for the probability 'p' of a Binomial, we have to divide the Binomial distribution by n. This new distribution has the same s # Compute <math>\displaystyle \delta = p_1 - p_2</math> in order to get n values for <math>\displaystyle \delta</math>; ...
    7 KB (1,232 words) - 09:45, 30 August 2017
  • </ref>. Now we turn to ...
    204 bytes (22 words) - 09:45, 30 August 2017
View (previous 50 | ) (20 | 50 | 100 | 250 | 500)