Search results

Word translation without parallel data
Dg[W](H)= H^T W + W^T H. D^\ast g[W](H)= WH^T +WH. ...

24 KB (3,873 words) - 17:24, 18 April 2018
STAT946F17/Decoding with Value Networks for Neural Machine Translation
# $\bar{h} = \frac{1}{T_x}\sum\limits^{T_x}_{l=1}h_l$ # $𝜇_{CC} = f_{CC}([\bar{c_{t}},\bar{h}])$ ...

22 KB (3,543 words) - 00:09, 3 December 2017
learning Spectral Clustering, With Application To Speech Separation
<br><math>H\left({\boldsymbol{\alpha} }\right)=\frac{1}{N}\sum^N_{n=1}{F\left({{\mathbf ...g to make the solution sparse. The learning algorithm is to minimize <math>H\left({\boldsymbol{\alpha} }\right)</math> with respect to <math>{\boldsymbo ...

35 KB (5,767 words) - 09:45, 30 August 2017
Hierarchical Question-Image Co-Attention for Visual Question Answering
H &= tanh(W_xX + (W_gg)𝟙^T)\\ a_x &= softmax(w_{hx}^T H)\\ ...

27 KB (4,375 words) - 19:50, 28 November 2017
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
...rying k with different hidden unit sizes <math>h</math> by keeping <math>k*h</math> or a similarly related term constant. This is better studied in [5] # Speech and Language Processing. Daniel Jurafsky & James H. Martin. 2017. Draft of August 28, 2017. ...

20 KB (3,272 words) - 20:40, 28 November 2017
Pixels to Graphs by Associative Embedding
...f dimensions h x w, a stacked hourglass (Appendix 2) is used to generate a h x w x f representation of the image. It should be noted that the dimension ...

17 KB (2,749 words) - 18:26, 16 December 2018
stat946f11pool
<center><math> \frac{H}{\theta} = \frac{T}{1-\theta} </math></center> \begin{center} H = \# of all <math>x_i = 1</math>, e.g. \# of heads <br /> ...

100 KB (18,249 words) - 09:45, 30 August 2017
conditional neural process
...the model, the observed points are encoded using a three-layer MLP encoder h with a 128-dimensional output representation. The representations are aggre of the encoder h to include convolution layers as ...

32 KB (4,970 words) - 00:26, 17 December 2018
stat341f11
...ion. We use this to solve an integral of the form: <math> I = \int_{a}^{b} h(x) dx </math> \displaystyle I & = \int_{a}^{b} h(x)dx \\ ...

139 KB (23,688 words) - 09:45, 30 August 2017
learning Fast Approximations of Sparse Coding
Lee, H., Battle, A., Raina, R., and Ng, A.Y. Efficient Lee, H., Chaitanya, E., and Ng, A. Y. Sparse deep belief ...

22 KB (3,321 words) - 09:46, 30 August 2017
deep Learning of the tissue-regulated splicing code
...minimize here during training is <math>E=-\sum_a\sum_{k=1}^{C}{y_{n,k}log(h{n,k})}</math>, where <math>n</math> denotes the training example, and <math ...

8 KB (1,353 words) - 09:46, 30 August 2017
consistency of Trace Norm Minimization
...ion problem is generally NP-hard<ref name="fazel2004">Fazel, M. and Hindi, H. and Boyd, S. Rank minimization and applications in system theory. Proceedi ...ine Learning Research'', 7:2541-2563, 2006.</ref> and Zou<ref name="Z2006">H. Zou. The adaptive lasso and its oracle properties. ''Journal of the Amer ...

24 KB (4,053 words) - 09:45, 30 August 2017
Efficient kNN Classification with Different Numbers of Nearest Neighbors
[2] Y. Song, J. Huang, D. Zhou, H. Zha, and C. L. Giles, “IKNN: Informative K-nearest neighbor pattern classi [12] Z. H. Zhou and Y. Yu, “Ensembling local learners throughmultimodal perturbation, ...

23 KB (3,748 words) - 03:46, 16 December 2020
learn what not to learn
3. Dulac-Arnold, G.; Evans, R.; van Hasselt, H.; Sunehag, P.; Lillicrap, T.; Hunt, J.; Mann, T.; Weber, T.; Degris, T.; an 6. VanHasselt,H.,andWiering,M.A. 2009. Usingcontinuousactionspacestosolvediscreteproblems. ...

29 KB (4,751 words) - 13:38, 17 December 2018
stat341 / CM 361
:<math>\begin{align}I &= \displaystyle\int_a^b h(x)\,dx :<math>\displaystyle w(x) = h(x)(b-a)</math> ...

145 KB (24,333 words) - 09:45, 30 August 2017
decentralised Data Fusion: A Graphical Model Approach (Summary)
...e distributed data fusion technique, Channel Filter <ref> A. Makarenko and H. Durrant-Whyte, “Decentralized Bayesian algorithms for active sensor networ ...

9 KB (1,332 words) - 09:45, 30 August 2017
adaptive dimension reduction for clustering high dimensional data
Use the cluster membership <math>H=(h_i^k) </math> obtained to reconstruct the K centres <math>C_{\mu}^* = [ \ ...

9 KB (1,428 words) - 09:46, 30 August 2017
Convolutional Sequence to Sequence Learning
...y but all three have the same fundamental idea. This is given by <math>2^{{H(p)}}=2^{{-\sum _{x}p(x)\log _{2}p(x)}} </math> Suppose you have a four-side ...of input elements. The output of l-th block of decoder is denoted by <math>h^l = (h_1^l,....,h_n^l)</math> and <math>z^l = (z_1^l,....,z_m^l)</math>. Ea ...

27 KB (4,178 words) - 20:37, 28 November 2017
what game are we playing
\min_{u \in \mathbb{R}^n} \max_{v \in \mathbb{R}^m} \ u^T P v -H(v) + H(u) \\ where H(y) is the Gibbs entropy <math> \sum_i y_i log y_i</math>. ...

25 KB (4,131 words) - 23:55, 6 December 2020
policy optimization with demonstrations
To avoid overfitting, the authors add causal entropy <math>−H (\pi_{\theta}) </math> as the regularization term. Thus, the learning objec \[\min_{\theta}\mathcal{L}=-\eta(\pi_{\theta})-\lambda_{2}H(\pi_{\theta})+\lambda_{1} \sup_{{D\in(0,1)}^{S\times A}} \mathbb{E}_{\pi_{\ ...

30 KB (4,632 words) - 00:32, 17 December 2018

Search results

Navigation menu

Search