Search results

Jump to navigation Jump to search
View (previous 20 | ) (20 | 50 | 100 | 250 | 500)
  • ...ot)</math> at the matrix <math>X \in S_n</math>. To do this we must first define the subgradient. A matrix <math>V \in R^{n \times n}</math> is a subgradiant of a convex function <ma ...
    3 KB (589 words) - 09:45, 30 August 2017
  • ...(or in other words <math>n</math> <math>d</math>-dimensional data points), our goal is to find directions in the space of the data set that correspond to ...problem, which makes the PCA problem much harder to solve. That's because we have just added a combinatorial constraint to optimization problem. This pa ...
    13 KB (2,202 words) - 09:45, 30 August 2017
  • ...ne of maximizing a quadratic assignment problem with special structure and we present a simple algorithm for finding a locally optimal solution. ...ortedly covering the same content, written in two different languages. Can we determine the correspondence between these two sets of documents without us ...
    16 KB (2,875 words) - 09:45, 30 August 2017
  • ...task, documents can then be represented as a bag of region embeddings and we can train a classifier on the basis of these region embeddings. ...the local context units to produce region embedding. In the following, we first introduce local context unit, then two architectures to generate the region ...
    13 KB (2,188 words) - 12:42, 15 March 2018
  • ...scale well for large inputs. The main contribution of this paper is to use matrix factorization for solving very sophisticated problems of the above type tha ...em is to identify the whole network topology. In other words, knowing that we have n sensors with <math>d_{ij}</math> as an estimate of local distance be ...
    12 KB (1,953 words) - 09:45, 30 August 2017
  • The update for the parameter in the next step is calculated using the matrix vector product: ...ework as a generalization to all training algorithms, allowing us to fully define any specific variant such as AMSGrad or SGD entirely within it: ...
    13 KB (2,153 words) - 16:54, 20 April 2018
  • ...pendence between two ''multivariate'' random variables. More specifically, we are looking for an appropriate function of two random variables whose outpu If instead of "independence" we were looking for "uncorrelation" the situation would be much easier to hand ...
    27 KB (4,561 words) - 09:45, 30 August 2017
  • ...problem as a "regression" problem; when the output takes discrete values, we refer to the supervised learning problem as a "class classification" proble We are given data consisting of observations of <math>(X,Y)\,</math> pairs, wh ...
    14 KB (2,403 words) - 09:45, 30 August 2017
  • ...s slower but this is not a major concern in certain cases. So, the optimal first-order minimization algorithm is going to be applied for solving the optimiz ...rogramming]. Then, they show how this method can be used for decomposing a matrix into a limited number of variables. As their problem size is large and can ...
    20 KB (3,146 words) - 09:45, 30 August 2017
  • ...-[http://en.wikipedia.org/wiki/Rank_%28linear_algebra%29 rank] rectangular matrix. More formally, this problem can be written as follows: ...ther words, because using the rank-function results in an NP-hard problem, we resort to the trace norm as a proxy measure for the rank. ...
    24 KB (4,053 words) - 09:45, 30 August 2017
  • ...lustering which makes use of dimension reduction and learning a similarity matrix that generalizes to the unseen datasets when spectral clustering is applied However, by learning a specific kernel for generating the similarity matrix, this new approach is significantly more robust in the presence of irreleva ...
    35 KB (5,767 words) - 09:45, 30 August 2017
  • ...>, unobserved states <math>q_t</math>, transition matrix A, and emission matrix B. HMM characterized by <math>\lambda=(A,B,\pi)</math> :[[File:HMM2.png|thu A a transition matrix where <math>a_ij</math> is the (i,j) entry in A: ...
    10 KB (1,640 words) - 09:46, 30 August 2017
  • According to the product rule we have: This is the most general case for a directed graph, as we can represent each and every graphical model with a fully connected graph. ...
    14 KB (2,497 words) - 09:45, 30 August 2017
  • ...ormed on individual weights or on entire neurons (whole column in a weight matrix). In the paper, only pruning individual weights has been discussed. ...In the pruned network, the mask is multiplied element-wise with the weight matrix before re-training. ...
    28 KB (4,367 words) - 00:30, 23 November 2021
  • Two main challenges that we usually come across in supervised learning are making a choice of manifold We can define a ''minimal subspace'' as the intersection of all dimension reduction subsp ...
    26 KB (4,280 words) - 09:45, 30 August 2017
  • ...both are generated from Gaussian distribution and have the same covariance matrix. ...een classes <math>k</math> and <math>l</math> is linear (LDA). However, if we do not assume same covariance between the two classes, the decision boundar ...
    26 KB (4,027 words) - 09:45, 30 August 2017
  • ...iki/Singular_value_decomposition singular value decomposition] to the data matrix. In this paper we are going to focus on the problem of sparse PCA which can be written as: ...
    22 KB (3,725 words) - 09:45, 30 August 2017
  • ...view the data recorded about a user's preferences as a partially observed matrix of the user's preferences of all items available. ...is to predict or infer the other preferences---in a sense, completing the matrix. ...
    24 KB (3,853 words) - 09:45, 30 August 2017
  • In the previous sections we discussed the Bayes Ball algorithm and the way we can use it to determine if there exists a conditional independence between As before we must define a set of canonical graphs. The nice thing is that for undirected graphs the ...
    100 KB (18,249 words) - 09:45, 30 August 2017
  • ...on, decision trees, etc. which are much more interpretable. In this paper, we are going to present one way of implementing interpretability in a neural n ...n layer via all weights above and doing a 2D traversal of the input weight matrix.The authors also provide theoretical justifications as to why, interactions ...
    21 KB (3,121 words) - 01:08, 14 December 2018
View (previous 20 | ) (20 | 50 | 100 | 250 | 500)