Search results

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
* <math>T :=(L_T, P_T(x), P_T(x_t | x_{t-1}, a_{t-1}), H )</math> (A Task) * <math>H</math>: The horizon of the MDP. This is a fixed natural number specifying t ...

17 KB (2,846 words) - 00:12, 21 April 2018
stat946w18/Self Normalizing Neural Networks
...ntly, if the the largest singular value of <math display="inline">\mathcal{H}</math> is less than 1. To find the singular values of <math display="inline">\mathcal{H}</math>, the authors used an explicit formula derived by Blinn [2] for <mat ...

45 KB (6,836 words) - 23:26, 20 April 2018
a neural representation of sketch drawings
...}, h_{ \leftarrow})</math> are concatenated to form a latent vector, <math>h</math>, of size <math>N_{z}</math>, &h = [h_{\rightarrow}; h_{\leftarrow}]. ...

30 KB (4,807 words) - 00:40, 17 December 2018
Robust Imitation Learning from Noisy Demonstrations
[3] Brodersen, K. H., Ong, C. S., Stephan, K. E., and Buhmann, J. M. (2010). The balanced accur [13] Wu, Y., Charoenphakdee, N., Bao, H., Tangkaratt, V., and Sugiyama, M. (2019). Imitation learning from imperfec ...

13 KB (2,031 words) - 19:23, 27 November 2021
on using very large target vocabulary for neural machine translation
...the translation vector of y based on the encoded sequence of hidden states h: <math>p(y_t\,|\,y_{<t},x)\propto \exp\{q(y_{t-1}, z_t, c_t)\}</math> where ...

14 KB (2,301 words) - 09:46, 30 August 2017
the Indian Buffet Process: An Introduction and Review
...t one non-zero component, follow a <math>Poisson(\alpha H_N)</math>, where H<sub>N</sub> is the ''N''th harmonic number, i.e. <math>H_N=\sum_{j=1}^N \fr ...

6 KB (1,032 words) - 09:46, 30 August 2017
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Let <math>h^{c}_{t-1}, h^{r}_{t-1} \in \mathbb{R}^m</math> denotes the two hidden layers where m = d : <math>h^{c}_{t-1} = f(W x_{t-1}^{c} + U h_{t-1}^{r} + b) </math> ...

28 KB (4,651 words) - 20:18, 28 November 2017
STAT946F17/Cognitive Psychology For Deep Neural Networks: A Shape Bias Case Study
$(x, y) = \displaystyle arg \min_{(x_i,y_i) \in S} d(h(x_i), h(\hat{x})) $ The function h is parameterized by Inception – one of the best performing ImageNet classif ...

22 KB (3,531 words) - 20:30, 28 November 2017
Unsupervised Machine Translation Using Monolingual Corpora Only
...onneau, 2017]''' Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H., "Word Translation without Parallel Data". arXiv:1710.04087 ...

8 KB (1,359 words) - 22:48, 19 November 2018
Word translation without parallel data
Dg[W](H)= H^T W + W^T H. D^\ast g[W](H)= WH^T +WH. ...

24 KB (3,873 words) - 17:24, 18 April 2018
STAT946F17/Decoding with Value Networks for Neural Machine Translation
# $\bar{h} = \frac{1}{T_x}\sum\limits^{T_x}_{l=1}h_l$ # $𝜇_{CC} = f_{CC}([\bar{c_{t}},\bar{h}])$ ...

22 KB (3,543 words) - 00:09, 3 December 2017
learning Spectral Clustering, With Application To Speech Separation
<br><math>H\left({\boldsymbol{\alpha} }\right)=\frac{1}{N}\sum^N_{n=1}{F\left({{\mathbf ...g to make the solution sparse. The learning algorithm is to minimize <math>H\left({\boldsymbol{\alpha} }\right)</math> with respect to <math>{\boldsymbo ...

35 KB (5,767 words) - 09:45, 30 August 2017
Hierarchical Question-Image Co-Attention for Visual Question Answering
H &= tanh(W_xX + (W_gg)𝟙^T)\\ a_x &= softmax(w_{hx}^T H)\\ ...

27 KB (4,375 words) - 19:50, 28 November 2017
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
...rying k with different hidden unit sizes <math>h</math> by keeping <math>k*h</math> or a similarly related term constant. This is better studied in [5] # Speech and Language Processing. Daniel Jurafsky & James H. Martin. 2017. Draft of August 28, 2017. ...

20 KB (3,272 words) - 20:40, 28 November 2017
Pixels to Graphs by Associative Embedding
...f dimensions h x w, a stacked hourglass (Appendix 2) is used to generate a h x w x f representation of the image. It should be noted that the dimension ...

17 KB (2,749 words) - 18:26, 16 December 2018
stat946f11pool
<center><math> \frac{H}{\theta} = \frac{T}{1-\theta} </math></center> \begin{center} H = \# of all <math>x_i = 1</math>, e.g. \# of heads <br /> ...

100 KB (18,249 words) - 09:45, 30 August 2017
conditional neural process
...the model, the observed points are encoded using a three-layer MLP encoder h with a 128-dimensional output representation. The representations are aggre of the encoder h to include convolution layers as ...

32 KB (4,970 words) - 00:26, 17 December 2018
stat341f11
...ion. We use this to solve an integral of the form: <math> I = \int_{a}^{b} h(x) dx </math> \displaystyle I & = \int_{a}^{b} h(x)dx \\ ...

139 KB (23,688 words) - 09:45, 30 August 2017
learning Fast Approximations of Sparse Coding
Lee, H., Battle, A., Raina, R., and Ng, A.Y. Efficient Lee, H., Chaitanya, E., and Ng, A. Y. Sparse deep belief ...

22 KB (3,321 words) - 09:46, 30 August 2017
deep Learning of the tissue-regulated splicing code
...minimize here during training is <math>E=-\sum_a\sum_{k=1}^{C}{y_{n,k}log(h{n,k})}</math>, where <math>n</math> denotes the training example, and <math ...

8 KB (1,353 words) - 09:46, 30 August 2017

Search results

Navigation menu

Search