Search results

Memory-Based Parameter Adaptation
kern(h,q) = \frac{1}{\epsilon + ||h-q||^2_2}. ...

12 KB (1,963 words) - 23:48, 9 November 2018
Summary - A Neural Representation of Sketch Drawings
...vectors are concatenated to form a vector <math>h</math>. The vector <math>h</math> is then projected to <math>\mu</math> and <math>\sigma</math> via t <math>\mu =W_\mu h + b\mu</math> ...

25 KB (4,196 words) - 01:32, 14 November 2018
Loss Function Search for Face Recognition
<math>a</math> is considered as a modulating factor and <math>h{(a,p)}=\frac{1}{ap+(1-a)} \in (0,1]</math> is a modulating function [1]. Th ...e because it could be larger than the softmax probability, while <math>p_m=h(a, p)*p < p </math> always holds. ...

26 KB (4,157 words) - 09:51, 15 December 2020
Do Vision Transformers See Like CNN
...ResNet50x1, ResNet152x2 to the ViTs ViT-B/32, ViT-B/16, ViT-L/16, and ViT-H/14. The data used to train the models, unless specified, is the JFT-300M da * M. Naseer, K. Ranasinghe, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang. Intriguing properties of vision transformers, 2021. ...

13 KB (2,006 words) - 00:11, 17 November 2021
generating text with recurrent neural networks
...previous states, and the use of Echo State networks, <ref> Jaeger, H. and H. Haas. [http://www.sciencemag.org/content/304/5667/78.short "Harnassing Non ...essian of the cost function.In fact instead of computing and inverting the H matrix when updating equations, the Gauss-Newton approximation is used for ...

18 KB (2,926 words) - 09:46, 30 August 2017
f10 Stat841 digest
...e input. The classification rule used by a classifier has the form <math>\,h: \mathcal{X} \mapsto \mathcal{Y} </math>. ...mpirical error rate is the frequency where the classification rule <math>\,h</math> does not correctly classify any data input in the training set. In e ...

26 KB (4,027 words) - 09:45, 30 August 2017
proposal for STAT946 projects Fall 2010
...n of the conformation problem formulation <ref name="bis"/> <ref>Leung N. H., and Toh K.-C. (2009) An SDP-based divide-and-conquer algorithm for large- ...d local tangent space alignment (LTSA) <ref name="zhan">Zhang, Z. and Zha, H. (2002) Principal manifolds and nonlinear dimension reduction via local tan ...

17 KB (2,679 words) - 09:45, 30 August 2017
Self-Supervised Learning of Pretext-Invariant Representations
h(v_I,v_{I^t})=\frac{\exp \biggl( \frac{s(v_I,v_{I^t})}{\tau} \biggr)}{\exp \ ...{t})=-\text{log}[h(f(v_I),g(v_{I^t}))]-\sum_{I^{'}\in D_N}^{} \text{log}[1-h(g(v_{I^t}),f(v_{I^{'}}))] ...

20 KB (3,045 words) - 23:02, 12 December 2020
Dense Passage Retrieval for Open-Domain Question Answering
...xtbf{P}} = [\textbf{P}^{[CLS]}_1,...,\textbf{P}^{[CLS]}_k] \in \mathbb{R}^{h \times k}</math>. Here <math> \textbf{w}_{start},\textbf{w}_{end},\textbf{w ...

17 KB (2,691 words) - 22:57, 7 December 2020
Extreme Multi-label Text Classification
<div align="center">Figure 2: Architecture of the 3-cluster APLC. h denotes the hidden state. Vh denotes the head cluster. V1 and V2 denote the [3] Jain, H., Prabhu, Y., and Varma, M. Extreme multi-label loss ...

15 KB (2,456 words) - 22:04, 7 December 2020
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
* <math>T :=(L_T, P_T(x), P_T(x_t | x_{t-1}, a_{t-1}), H )</math> (A Task) * <math>H</math>: The horizon of the MDP. This is a fixed natural number specifying t ...

17 KB (2,846 words) - 00:12, 21 April 2018
stat946w18/Self Normalizing Neural Networks
...ntly, if the the largest singular value of <math display="inline">\mathcal{H}</math> is less than 1. To find the singular values of <math display="inline">\mathcal{H}</math>, the authors used an explicit formula derived by Blinn [2] for <mat ...

45 KB (6,836 words) - 23:26, 20 April 2018
a neural representation of sketch drawings
...}, h_{ \leftarrow})</math> are concatenated to form a latent vector, <math>h</math>, of size <math>N_{z}</math>, &h = [h_{\rightarrow}; h_{\leftarrow}]. ...

30 KB (4,807 words) - 00:40, 17 December 2018
Robust Imitation Learning from Noisy Demonstrations
[3] Brodersen, K. H., Ong, C. S., Stephan, K. E., and Buhmann, J. M. (2010). The balanced accur [13] Wu, Y., Charoenphakdee, N., Bao, H., Tangkaratt, V., and Sugiyama, M. (2019). Imitation learning from imperfec ...

13 KB (2,031 words) - 19:23, 27 November 2021
on using very large target vocabulary for neural machine translation
...the translation vector of y based on the encoded sequence of hidden states h: <math>p(y_t\,|\,y_{<t},x)\propto \exp\{q(y_{t-1}, z_t, c_t)\}</math> where ...

14 KB (2,301 words) - 09:46, 30 August 2017
the Indian Buffet Process: An Introduction and Review
...t one non-zero component, follow a <math>Poisson(\alpha H_N)</math>, where H<sub>N</sub> is the ''N''th harmonic number, i.e. <math>H_N=\sum_{j=1}^N \fr ...

6 KB (1,032 words) - 09:46, 30 August 2017
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Let <math>h^{c}_{t-1}, h^{r}_{t-1} \in \mathbb{R}^m</math> denotes the two hidden layers where m = d : <math>h^{c}_{t-1} = f(W x_{t-1}^{c} + U h_{t-1}^{r} + b) </math> ...

28 KB (4,651 words) - 20:18, 28 November 2017
STAT946F17/Cognitive Psychology For Deep Neural Networks: A Shape Bias Case Study
$(x, y) = \displaystyle arg \min_{(x_i,y_i) \in S} d(h(x_i), h(\hat{x})) $ The function h is parameterized by Inception – one of the best performing ImageNet classif ...

22 KB (3,531 words) - 20:30, 28 November 2017
Unsupervised Machine Translation Using Monolingual Corpora Only
...onneau, 2017]''' Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H., "Word Translation without Parallel Data". arXiv:1710.04087 ...

8 KB (1,359 words) - 22:48, 19 November 2018
Word translation without parallel data
Dg[W](H)= H^T W + W^T H. D^\ast g[W](H)= WH^T +WH. ...

24 KB (3,873 words) - 17:24, 18 April 2018

Search results

Navigation menu

Search