Search results

importance Sampling June 2 2009
In <math>I = \displaystyle\int h(x)f(x)\,dx</math>, Monte Carlo simulation can be used only if it easy to sa :: <math>I = \displaystyle\int h(x)f(x)\,dx </math> ...

2 KB (395 words) - 09:45, 30 August 2017
a Deeper Look into Importance Sampling
...I = \displaystyle\int h(x)f(x)\,dx </math> <math>= \displaystyle\int \frac{h(x)f(x)}{g(x)}g(x)\,dx</math> We continue our discussion of Importance Sampl ...s just <math> \displaystyle E_g(h(x)) \rightarrow</math>the expectation of h(x) with respect to g(x), where <math>\displaystyle \frac{f(x)}{g(x)} </math ...

6 KB (1,083 words) - 09:45, 30 August 2017
importance Sampling and Markov Chain Monte Carlo (MCMC)
<math> I = \displaystyle\int^\ h(x)f(x)\,dx </math> :: <math>= \displaystyle\int \ h(x)\frac{f(x)}{g(x)}g(x)\,dx</math> ...

6 KB (1,113 words) - 09:45, 30 August 2017
monte Carlo Integration
:<math>I = \displaystyle\int_a^b h(x)\,dx</math> :<math>w(x) = h(x)(b-a)</math> ...

5 KB (870 words) - 09:45, 30 August 2017
importance Sampling and Monte Carlo Simulation
In <math>I = \displaystyle\int h(x)f(x)\,dx</math>, Monte Carlo simulation can be used only if it easy to sa :: <math>I = \displaystyle\int h(x)f(x)\,dx </math> ...

7 KB (1,232 words) - 09:45, 30 August 2017
Task Understanding from Confushing Multitask Data
h+1 & = \dfrac{abc}{\text{def}}\\ ...th>h</math> agrees with the task-assignment ability of humans <math>\tilde h</math> on whether each observation in the data "is" or "is not" in task <ma ...

5 KB (878 words) - 19:25, 15 November 2020
hamming Distance Metric Learning
...nary codes <math>h</math> and <math>g</math> with hamming distance <math>||h-g||_H</math> and a similarity label <math>s \in {0,1}</math> the pairwise h l_{pair}(h,g,\rho)= ...

10 KB (1,792 words) - 09:46, 30 August 2017
cardinality Restricted Boltzmann Machines
Assume <math> v \in \{0,1\}^{N_v}</math> and <math> h \in \{0,1\}^{N_h}</math> are the vectors of binary valued variables, corres P(v,h) = \frac{1}{Z} exp(v^{T}Wh+v^{T}b_{v}+h^{T}b_{h}) ...

9 KB (1,501 words) - 09:46, 30 August 2017
a Rank Minimization Heuristic with Application to Minimum Order System Approximation
...euristic with application to minimum order system approximation, M. Fazel, H. Hindi, and S. Body]</ref> focuses on the following problems: ...utorial.pdf Rank Minimization and Applications in System Theory, M. Fazel, H. Hindi, and S. Body]</ref>]] ...

8 KB (1,446 words) - 09:45, 30 August 2017
hierarchical Dirichlet Processes
...ath> drawn from other Dirichlet process <math>DP(\lambda, H)</math>, where H is any base measure. Note that <math>G_0</math> is discrete with probabilit <math> G_0 </math> ~ <math> DP(\lambda,H) </math> ...

8 KB (1,341 words) - 09:46, 30 August 2017
strategies for Training Large Scale Neural Network Language Models
<math>P(w|h)=\frac{e^{\sum_{k=1}^N \lambda_i f_i(s,w)}} {\sum_{w=1} e^{ \sum_{k=1}^N\l ...e\sum_{k=1}^N \lambda_i f_i(h,w)} {\sum_{w=1} e \sum_{k=1}^N\lambda_i f_i(h,w)}</math> ...

9 KB (1,542 words) - 09:46, 30 August 2017
STAT946F17/ Improved Variational Inference with Inverse Autoregressive Flow
...egrate. Additionally, we would like to be able to compute the posterior $p(h\mid x)$ over hidden variables and, by Bayes' rule, this requires computatio ...his lower bound. Observe that, for any parametrized distribution $q_{\phi}(h\mid x)$, we have ...

29 KB (5,002 words) - 03:56, 29 October 2017
on the Number of Linear Regions of Deep Neural Networks
...can be absorbed in the connections weights to the next layer. <math>\tilde{h}_j(\mathbf{x}) = h_1(\mathbf{x}) - h_2(\mathbf{x}) ...th>n_0</math> dimensional function <math>\tilde{h} = {[\tilde{h}_1, \tilde{h}_2, \ldots, ...

8 KB (1,391 words) - 09:46, 30 August 2017
kernelized Sorting
'''Proof''': Firstly, we need to establish <math>H</math> and <math>\pi</math> matrices commute. Since <math>H</math> is a centering matrix, we can write it as <math>H=I_{n}-11^{T}</math>. ...

16 KB (2,875 words) - 09:45, 30 August 2017
convex and Semi Nonnegative Matrix Factorization
...in both F and G simultaneously <ref name='S. S Lee'> Lee S. S and Seung S. H; “Algorithms for Non-negative Matrix Factorization”. </ref> Also, the facto ...rent value by some factor. In <ref name='S. S Lee'> Lee S. S and Seung S. H; “Algorithms for Non-negative Matrix Factorization”. </ref>, they prove tha ...

23 KB (3,920 words) - 09:45, 30 August 2017
GradientLess Descent
...nt to the eigenvalues of the Hessian matrix <math display="inline">\textbf{H}(f)</math> being bounded between <math display="inline">\alpha</math> and < ...y <math display="inline">H</math> iterations (where <math display="inline">H</math> is determined by <math display="inline">Q</math>). ...

11 KB (1,754 words) - 22:06, 9 December 2020
graves et al., Speech recognition with deep recurrent neural networks
...izing cursive handwriting <ref> A. Graves, S. Fernandez, M. Liwicki, H. Bunke, and J. Schmidhuber, [http://papers.nips.cc/paper/3213-unconstrai ...sis of the more complicated LSTM network that has composite <math>\mathcal{H}</math> functions instead of sigmoids and additional parameter vectors asso ...

25 KB (3,828 words) - 09:46, 30 August 2017
the Manifold Tangent Classifier
...<math>g\,</math> reconstructs <math>x\,</math>. When <math>L\left(x,g\left(h\left(x\right)\right)\right)</math> denotes the average reconstruction error ...mathcal{J}_{AE}\left(\theta\right) = \sum_{x\in\mathcal{D}}L\left(x,g\left(h\left(x\right)\right)\right) </math> ...

22 KB (3,505 words) - 09:46, 30 August 2017
stat841F18/
...layer output matrix <math>\mathbf{H}</math> of ELM is given as: <math>{\bf H}=\left[\begin{matrix} {\bf h}({\bf x}_1)\\ ...

10 KB (1,620 words) - 17:50, 9 November 2018
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
.../filter size to be 4*H and the number of attention heads to be H/64 (where H is the size of the hidden layer). Next, we explain the changes that have be ...which usually is harder. However, if we increase <math display="inline">\\H</math> and <math display="inline">\\E</math> together, it will result in a ...

14 KB (2,170 words) - 21:39, 9 December 2020
deep Generative Stochastic Networks Trainable by Backprop
variables H in addition to X, with the Markov chain state (and mixing) involving both X and H. Here H is the angle about ...

12 KB (1,906 words) - 09:46, 30 August 2017
the loss surfaces of multilayer networks (Choromanska et al.)
...is the <math>0^{\text{th}}</math> layer and the output layer is the <math>H^{\text{th}}</math> layer). The input <math>X</math> is a vector with <math> ...dom network output <math>Y</math> is <math>Y = q\sigma(W_H^{\top}\sigma(W_{H-1}^{\top}\dots\sigma(W_1^{\top}X)))\dots),</math> where <math>q</math> is a ...

13 KB (2,168 words) - 09:46, 30 August 2017
learning Phrase Representations
...selects whether the hidden state is to be updated with a new hidden state h˜. The reset gate r decides whether the previous hidden state is ignored. ]] ::<math> r_j=\sigma([\mathbf{W}_r\mathbf{x}]_j+[\mathbf{U}_r\mathbf{h}_{t-1}]_j )</math> ...

12 KB (1,906 words) - 09:46, 30 August 2017
stat441w18/A New Method of Region Embedding for Text Classification
The vocabulary is represented by a matrix <math> \mathbf{E}\in \mathbb{R}^{h \times v} </math> with a look up layer, denoted by the embedding <math> e_\ ...define the local context unit <math> \mathbf{K}_{\omega_i}\in \mathbb{R}^{h\times\left (2c+1\right )}</math>. Let <math> \mathbf{K}_{\omega_i,t} </math ...

13 KB (2,188 words) - 12:42, 15 March 2018
continuous space language models
...ight matrix from the projection layer to the hidden layer and the state of H would be: <math>\,h=tanh(Ha + b)</math> where A is the concatenation of all <math>\,a_i</math> ...

15 KB (2,517 words) - 09:46, 30 August 2017
f11Stat946ass
...ode G which passes the ball to nodes I & D. Node F passes the ball to node H which passes the ball to the already visited node, I. Therefore all nodes a H ...

14 KB (2,497 words) - 09:45, 30 August 2017
Patch Based Convolutional Neural Network for Whole Slide Tissue Image Classification
...dentically distributed), <math>X</math> and associated hidden labels <math>H</math> are generated by the following model: $$P(X, H) = \prod_{i = 1}^N P(X_{i,1}, \dots , X_{i,N_i}| H_i)P(H_i) \quad \quad \ ...

16 KB (2,470 words) - 14:07, 19 November 2021
measuring Statistical Dependence with Hilbert-Schmidt Norm
...ngle af+bg,h\rangle=a\langle f,h\rangle+b\langle g,h\rangle,\,\forall\,f,g,h\in\mathcal{F}</math> and all real <math>\,\!a</math> and <math>\,\!b</math> ...f\otimes g)h:=f\langle g,h\rangle_{\mathcal{G}} \quad</math> for all <math>h\in\mathcal{G}</math> ...

27 KB (4,561 words) - 09:45, 30 August 2017
measuring statistical dependence with Hilbert-Schmidt norms
<math>(f\otimes g)h:=f<g,h>_\mathcal{G}</math> for all <math>h\in \mathcal{G}</math> where <math>H,K,L\in \mathbb{R}^{m\times m},K_{ij}:=k(x_i,x_j),L_{i,j}:=l(y_i,y_j) and H_ ...

8 KB (1,240 words) - 09:46, 30 August 2017
Wide and Deep Learning for Recommender Systems
[3] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence [4] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. H. Cernocky. Strategies for training large scale neural network language mode ...

8 KB (1,119 words) - 04:28, 1 December 2021
rOBPCA: A New Approach to Robust Principal Component Analysis
The projection pursuit concept was developed by Jerome H. Friedman and John Tukey in 1974. ...x to obtain a subspace of dimension <math>k_{0}</math>. The value of <math>h</math> is chosen as ...

15 KB (2,414 words) - 09:46, 30 August 2017
kernelized Locality-Sensitive Hashing
A valid hash function <math>h</math> must satisfy the property Pr[h(x_i)= h(x_j)] = sim(x_i, x_j) ...

17 KB (2,894 words) - 09:46, 30 August 2017
Learning Combinatorial Optimzation
<math> \hat{Q}(h(S), v;\Theta) = \theta_5^{T} relu([\theta_6 \sum_{u \in V} \mu_u^{(T)}, \th <math>r(S,v) = c(h(S'),G) - c(h(S),G);</math> ...

12 KB (1,976 words) - 23:37, 20 March 2018
Breaking Certified Defenses: Semantic Adversarial Examples With Spoofed Robustness Certificates
...\times H} </math>, where the size of the image is <math>3 \times W \times H</math> as the preturbation. In this case, <math>Dissim(\delta)=0 </math>. ...nels of a pixel are not equal and it uses <math> \delta_{3 \times W \times H} </math> with the <math>Dissim(\delta) = || \delta_{R}- \delta_{B}||_p + | ...

15 KB (2,325 words) - 06:58, 6 December 2020
an HDP-HMM for Systems with State Persistence
...\alpha)</math> is defined using two parameters. The first parameter, <math>H</math>, is a base distribution. This parameter can be considered as the mea <math>\, \theta_k</math>~<math>\, H</math> ...

12 KB (2,039 words) - 09:46, 30 August 2017
Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition
...opposed to computing the inner product. Denoting the weak classifiers by $h(\cdot)$, we obtain the strong classifier as: H(x_i) = \sum\limits_{j = 1}^K \alpha_j h(x_{ij}; \lambda_j) ...

21 KB (3,321 words) - 15:00, 4 December 2017
relevant Component Analysis
where |Ω| is the size of the data set, Hn is the nth chunklet, |Hn| is the size of the nth chunklet, and N is the number of chunkl ...ximize the entropy of Y, H(Y). This is because I(X,Y) = H(Y) – H(Y|X), and H(Y|X) is constant since the transformation is deterministic. Intuitively, si ...

21 KB (3,516 words) - 09:45, 30 August 2017
stat946w18/Tensorized LSTMs
a_{t} =h_{t-1}^{cat} W^h + b^h \hspace{2cm} (2) <math>W^h∈R^{(R+M)\times M} </math> guarantees each hidden state provided by the prev ...

25 KB (4,099 words) - 22:50, 20 April 2018
Depthwise Convolution Is All You Need for Learning Multiple Visual Domains
Bilen, H., and Vedaldi, A. 2017. Universal representa- tions: The missing link betwe Rebuffi, S.-A.; Bilen, H.; and Vedaldi, A. 2017. Learning multiple visual domains with residual adap ...

10 KB (1,371 words) - 00:44, 14 November 2021
This Looks Like That: Deep Learning for Interpretable Image Recognition
..., which are then multiplied by the weight matrix <math>w_h</math> in <math>h</math> to produce the output logits as shown in Figure 1. ...

10 KB (1,573 words) - 23:36, 9 December 2020
independent Component Analysis: algorithms and applications
...<math>g \,</math> and <math>h \,</math>, <math>g(y_i) \,</math> and <math>h(y_j) \,</math> are uncorrelated. ...possible values <math>\{x_1, x_2, ..., x_n\} \,</math> is defined as <math>H(X) = -\sum_{i=1}^n {p(x_i) \log p(x_i)}</math> ...

15 KB (2,422 words) - 09:45, 30 August 2017
stat946w18/Unsupervised Machine Translation Using Monolingual Corpora Only
...finite sequences of words in the source and target language, and let <math>H'</math> denote the set of finite sequences of vectors in the latent space. ...s a sequence of hidden states <math display="inline">(h_1,\ldots, h_m) \in H'</math> in the latent space. Crucially, because the word vectors of the tw ...

28 KB (4,522 words) - 21:29, 20 April 2018
stat946w18/Spectral normalization for generative adversial network
...to the largest singular value of A. Therefore, for a linear layer <math> g(h)=Wh </math>, the norm is given by <math> ||g||_{Lip}=\sigma(W) </math>. Obs ...ator more sensitive, one would hope to make the norm of <math> \bar{W_{WN}}h </math> large. For weight normalization, however, this comes at the cost of ...

16 KB (2,645 words) - 10:31, 18 April 2018
Task Understanding from Confusing Multi-task Data
...The authors define the deconfusing function as an indicator function <math>h(x, y, g_k) </math> which takes some sample <math>(x,y)</math> and determine $$ R(g,h) = \int_x \sum_{j,k} (f_j(x) - g_k(x))^2 \; h(x, f_j(x), g_k) \;p(f_j) \; p(x) \;\mathrm{d}x $$ ...

27 KB (4,358 words) - 15:35, 7 December 2020
learning a Nonlinear Embedding by Preserving Class Neighborhood Structure
stochastic binary feature vector <math> \mathbf h </math> are modeled by products of conditional Bernoulli distributions: <center> <math> \mathbf p(x_i=1|h)= \sigma(b_i+\sum_{j}W_{ij}j_j) </math> </center> ...

20 KB (3,263 words) - 09:45, 30 August 2017
dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces
Let <math>({ H}_1, k_1)</math> and <math>({H}_2, k_2)</math> be RKHS over <math>(\Omega_1, { B}_1)</math> and <math>(\Om <math><f, \Sigma_{YU}g>_{{H}_1} \approx \frac{1}{n} ...

14 KB (2,403 words) - 09:45, 30 August 2017
STAT946F17/Conditional Image Generation with PixelCNN Decoders
...purpose of the latent vector is to model the conditional distribution $p(x|h)$ such that we get a probability as to if the images suites this descriptio $$p(x|h) = \prod\limits_{i=1}^{n^2} p(x_i | x_1, ..., x_{i-1}, h)$$ ...

31 KB (4,917 words) - 12:47, 4 December 2017
deep neural networks for acoustic modeling in speech recognition
<math> E\left(\mathbf{v}, \mathbf{h}; \mathbf{W}\right) = - \sum_{i \in visible}a_iv_i - \sum_{j \in hidden}b_j * <math>\mathbf{h}</math> is the vector of hidden units, with components <math>h_j</math> and ...

24 KB (3,699 words) - 09:46, 30 August 2017
The Curious Case of Degeneration
:<math>PP(p) := 2^{H(p)}=2^{-\sum_x p(x)\log_2 p(x)}</math> Here <math>H(p)</math> is the entropy in bits and <math>p(x)</math> is the probability o ...

13 KB (2,144 words) - 05:41, 10 December 2020
stat441F18/YOLO
h <math>(x, y)</math> and <math>(w, h)</math> are normalized to the range <math>(0, 1)</math>. Further, <math>p_c ...

19 KB (2,746 words) - 16:04, 20 November 2018
Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree
[1] S. Y. Xia, H. Pan, and L. Z. Jin, “Multi-class SVM method based on a non-balanced binary H. Yu and C. K. Mao, “Automatic three-way decision clustering algorithm based ...

9 KB (1,392 words) - 01:45, 23 November 2021
markov Chain Definitions
<math> I = \displaystyle\int^\ h(x)f(x)\,dx </math> by <math>\hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh ...

5 KB (865 words) - 09:45, 30 August 2017
Summary of A Probabilistic Approach to Neural Network Pruning
...n its value never changes quicker than the function <math display="inline">h(x)=Kx</math>. The reason the activation functions are Lipschitz continuous [3] Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with ...

28 KB (4,367 words) - 00:30, 23 November 2021
Unsupervised Domain Adaptation with Residual Transfer Networks
...all $f \in \mathcal{H}_K$. Now, if we take $\phi: \mathcal{X} \to \mathcal{H}_K$, then we can define the MMD between two distributions $p$ and $q$ as fo ...thbf{E}_{x\sim p}(\phi(x^s)) - \mathbf{E}_{x\sim q}(\phi(x^t))||_{\mathcal{H}_K} ...

35 KB (5,630 words) - 10:07, 4 December 2017
stat946s13
...to the subspace spanned by the columns of <math>U_d</math>. A unique <math>H^+</math> solution can be obtained by finding the pseudo inverse of <math>X< ...ath> <math>X= U \Sigma V^T</math> <math>X^+ = V \Sigma^+ U^T</math> <math>H^+= U \Sigma V^T V \Sigma^+ U^T =UU^T</math> For each rank <math>d</math>, ...

29 KB (4,816 words) - 09:46, 30 August 2017
stat946w18/Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolutional Layers
* Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2016). Pruning filters for efficient convnets. arXiv preprint arXiv:16 * Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks ...

13 KB (1,942 words) - 00:18, 21 April 2018
summary
...hbf{h}_1), (\mathbf{x}_{2k}, \mathbf{h}_2) , ... (\mathbf{x}_{nk}, \mathbf{h}_n) } ...

12 KB (1,916 words) - 17:34, 18 March 2018
Neural Speed Reading via Skim-RNN
...\bf h}_{t-1} \in \mathbb{R}^d</math> and outputs the new state <math>{\bf h}_t </math> (although the dimensions of the hidden state and input are the ...\alpha({\bf x}_t, {\bf h}_{t-1})) = \text{softmax}({\bf W}[{\bf x}_t; {\bf h}_{t-1}]+{\bf b}) \in \mathbb{R}^k</math> ...

27 KB (4,321 words) - 05:09, 16 December 2020
visualizing Similarity Data with a Mixture of Maps
...^m-y_j^m ||^2, \quad z_i=\sum_{h}\sum_{m} \pi_{i}^{m} \pi_{h}^{m} e^{-d_{i,h}^{m}} </math> </center> ...

15 KB (2,530 words) - 09:45, 30 August 2017
CRITICAL ANALYSIS OF SELF-SUPERVISION
...uch that <math>\beta \leq \frac{wh}{WH}</math> and <math>\gamma \leq \frac{h}{w} \leq \gamma^{-1}</math>. The smalles size of crops is at least <math>\b ...

12 KB (1,792 words) - 00:08, 13 December 2020
When Does Self-Supervision Improve Few-Shot Learning?
...oth mappings of labelled and unlabelled images by <math>g</math> and <math>h</math> respectively will be utilized. ...tion loss <math>\mathcal{L}_{ss}</math> utilizes a separate function <math>h</math> which maps the embeddings of unlabeled images to a separate label sp ...

17 KB (2,644 words) - 01:46, 13 December 2020
XGBoost: A Scalable Tree Boosting System
where x's are the feature values of each data point, and h's are the weights of the corresponding x's. <math>r_k(z) = \frac{1}{\sum_{(x,h) \in D_k} h} \sum_{(x,h) \in D_k, x<z} h,</math> ...

15 KB (2,406 words) - 18:07, 28 November 2018
Countering Adversarial Images Using Input Transformations
...</math> equal to the prediction on the corresponding clean example <math> h(x) </math>. ...h>x</math> is a perturbed image <math>x'</math>, such that <math>h(x) \neq h(x')</math> and <math>d(x, x') \leq \rho</math> for some dissimilarity func ...

32 KB (4,769 words) - 18:45, 16 December 2018
Adversarial Fisher Vectors for Unsupervised Representation Learning
...{x})}}[E(\mathbf{x})]- E_{\mathbf{x} \sim q(\mathbf{x})}[E(\mathbf{x})] + H(q) ...lity was used to obtain the variational lower bound on the NLL given <math>H(q) </math>. This bound is tight if <math> q(x) \propto e^{-E(\mathbf{x})} \ ...

22 KB (3,540 words) - 17:50, 6 December 2020
stat441w18/Convolutional Neural Networks for Sentence Classification
...h>-dimensional vector <math> \boldsymbol{c} = \left[ c_1, c_2, \dots, c_{n-h+1} \right] </math>, called a ''feature map''. ...et, we set all the hyperparameters: rectified linear units, filter windows(h) of 3, 4, 5 with 100 feature maps each, dropout rate (p) of 0.5, l2 constr ...

21 KB (3,330 words) - 03:15, 13 March 2018
stat441F18/TCNLM
...h> \mathcal{U} \in \mathbb{R}^{n_{h} x n_{x} x T} </math>, where <math> n_{h} </math> is the number of hidden units and <math> n_{x} </math> is the size ...multiplication of three terms: <math>\boldsymbol W_{a} \in \mathbb{R}^{n_{h}xn_{f}}, \boldsymbol W_{b} \in \mathbb{R}^{n_{f} x T}, </math>and <math> \b ...

18 KB (2,810 words) - 23:45, 14 November 2018
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
...on distribution $q(\mathbf{x}_{t+1}|\mathbf{x}_t)$, and an episode length $H$. In i.i.d. supervised learning problems, the length $H =1$. The model may generate samples of length $H$ by choosing an output at at each time $t$. The cost $\mathcal{L}$ provides ...

26 KB (4,205 words) - 10:18, 4 December 2017
markov Random Fields for Super-Resolution
...low, L, frequency components. The assumption is that high frequency band, H, is conditionally independent of the lower frequency bands, given the middl P(H|M,L) = P(H|M) ...

18 KB (3,001 words) - 09:46, 30 August 2017
Bag of Tricks for Efficient Text Classification
...th> n </math> and the output value of the hidden layer of the model, <math>h</math>. The idea of this method is to represent the output classes as the l ...\frac{\partial Err}{\partial v_{n_i}^{'}h} \cdot \frac{\partial v_{n_i}^{'}h }{\partial v_{n_i}^{'}} </math> </div> ...

32 KB (5,160 words) - 22:32, 27 March 2018
Neural ODEs
...set of transformations through hidden states (a.k.a layers) <math>\mathbf{h}</math>, given by the equation ...le="text-align:center;"><math> \mathbf{h}_{t+1} = \mathbf{h}_t + f(\mathbf{h}_t,\theta_t) </math> (1) </div> ...

24 KB (3,891 words) - 15:01, 7 December 2020
FeUdal Networks for Hierarchical Reinforcement Learning
Manager and Worker are recurrent networks (<math>{h^M}</math> and <math>{h^W}</math> being their internal states). <math>\phi</math> is a linear trans ...ed by the following equations: <math>\hat{h}_t^{t\%r},g_t = LSTM(s_t, \hat{h}_{t-1}^{t\%r};\theta^{LSTM})</math> where % denotes the modulo operation an ...

20 KB (3,237 words) - 01:59, 3 December 2017
Generating Image Descriptions
To create a common embedding, every image is represented by a set of h-dimensional vectors <math> \{v_i | i = 1 ... 20\}</math> where each <math ...fully connected layer. The matrix <math> W_m </math> has dimension <math> h \times 4096</math>. ...

21 KB (3,271 words) - 10:58, 29 March 2018
CatBoost: unbiased boosting with categorical features
[12] J. H. Friedman. Greedy function approximation: a gradient boosting machine. Anna [13] J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data An ...

17 KB (2,504 words) - 02:36, 23 November 2021
extracting and Composing Robust Features with Denoising Autoencoders
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layerwise Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). ...

14 KB (2,189 words) - 09:46, 30 August 2017
A Neural Representation of Sketch Drawings
...y each encoder model is then concatenated into a single hidden state <math>h</math>. ...ightarrow(S), h_\leftarrow = \text{encode}_\leftarrow(S_{\text{reverse}}), h=[h_\rightarrow; h_\leftarrow] ...

22 KB (3,638 words) - 21:48, 20 April 2018
stat946f10
...problem, let <math>\mathbf M_S=\mathbf {HH^T}</math> and <math>\mathbf {Q=H^TW}</math>, we get: ...n Q-((H^T)^{-1}Q)^T M_D (H^T)^{-1}Q)=\min_W Trace(Q^T I_n Q-Q^TH^{-1} M_D (H^{-1})^T Q)}</math> ...

65 KB (11,332 words) - 09:45, 30 August 2017
stat946w18/Towards Image Understanding From Deep Compression Without Decoding
...math>C</math> dimensional representation, where <math>w </math> and <math>h </math> are the spatial dimensions of <math>x </math>, and the number of ch <math>H(q)</math>. <math>H(q)</math> is the entropy of the probability distribution over the symbols a ...

29 KB (4,246 words) - 20:18, 10 December 2018
Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin
...> x_T^j </math>, which outputs the embedding vector <math> \overrightarrow{h^t_j} </math>, of size <math> d </math> for each bin <math> t </math> ...h> x_1^j </math>, which outputs the embedding vector <math> \overleftarrow{h^j_t} </math>, of size <math> d </math> for each bin <math> t </math> ...

33 KB (4,924 words) - 20:52, 10 December 2018
XGBoost
...= \frac{1}{\sum_{(x,h) \in D_k} h} \displaystyle\sum_{(x,h) \in D_k, x<z} h,</math> [7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using grad ...

21 KB (3,313 words) - 02:21, 5 December 2021
Augmix: New Data Augmentation method to increase the robustness of the algorithm
filter(z, \delta) [i,j] = \frac{z[i,j]}{freq(w,h) [i,j]^\delta} mask(\lambda , g)[i,j] = \chi_{ top(\lambda w h, g g) } ...

11 KB (1,652 words) - 18:44, 6 December 2020
Memory-Based Parameter Adaptation
kern(h,q) = \frac{1}{\epsilon + ||h-q||^2_2}. ...

12 KB (1,963 words) - 23:48, 9 November 2018
Summary - A Neural Representation of Sketch Drawings
...vectors are concatenated to form a vector <math>h</math>. The vector <math>h</math> is then projected to <math>\mu</math> and <math>\sigma</math> via t <math>\mu =W_\mu h + b\mu</math> ...

25 KB (4,196 words) - 01:32, 14 November 2018
Loss Function Search for Face Recognition
<math>a</math> is considered as a modulating factor and <math>h{(a,p)}=\frac{1}{ap+(1-a)} \in (0,1]</math> is a modulating function [1]. Th ...e because it could be larger than the softmax probability, while <math>p_m=h(a, p)*p always holds. ...

26 KB (4,157 words) - 09:51, 15 December 2020
Do Vision Transformers See Like CNN
...ResNet50x1, ResNet152x2 to the ViTs ViT-B/32, ViT-B/16, ViT-L/16, and ViT-H/14. The data used to train the models, unless specified, is the JFT-300M da * M. Naseer, K. Ranasinghe, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang. Intriguing properties of vision transformers, 2021. ...

13 KB (2,006 words) - 00:11, 17 November 2021
generating text with recurrent neural networks
...previous states, and the use of Echo State networks, <ref> Jaeger, H. and H. Haas. [http://www.sciencemag.org/content/304/5667/78.short "Harnassing Non ...essian of the cost function.In fact instead of computing and inverting the H matrix when updating equations, the Gauss-Newton approximation is used for ...

18 KB (2,926 words) - 09:46, 30 August 2017
f10 Stat841 digest
...e input. The classification rule used by a classifier has the form <math>\,h: \mathcal{X} \mapsto \mathcal{Y} </math>. ...mpirical error rate is the frequency where the classification rule <math>\,h</math> does not correctly classify any data input in the training set. In e ...

26 KB (4,027 words) - 09:45, 30 August 2017
proposal for STAT946 projects Fall 2010
...n of the conformation problem formulation <ref name="bis"/> <ref>Leung N. H., and Toh K.-C. (2009) An SDP-based divide-and-conquer algorithm for large- ...d local tangent space alignment (LTSA) <ref name="zhan">Zhang, Z. and Zha, H. (2002) Principal manifolds and nonlinear dimension reduction via local tan ...

17 KB (2,679 words) - 09:45, 30 August 2017
Self-Supervised Learning of Pretext-Invariant Representations
h(v_I,v_{I^t})=\frac{\exp \biggl( \frac{s(v_I,v_{I^t})}{\tau} \biggr)}{\exp \ ...{t})=-\text{log}[h(f(v_I),g(v_{I^t}))]-\sum_{I^{'}\in D_N}^{} \text{log}[1-h(g(v_{I^t}),f(v_{I^{'}}))] ...

20 KB (3,045 words) - 23:02, 12 December 2020
Dense Passage Retrieval for Open-Domain Question Answering
...xtbf{P}} = [\textbf{P}^{[CLS]}_1,...,\textbf{P}^{[CLS]}_k] \in \mathbb{R}^{h \times k}</math>. Here <math> \textbf{w}_{start},\textbf{w}_{end},\textbf{w ...

17 KB (2,691 words) - 22:57, 7 December 2020
Extreme Multi-label Text Classification
<div align="center">Figure 2: Architecture of the 3-cluster APLC. h denotes the hidden state. Vh denotes the head cluster. V1 and V2 denote the [3] Jain, H., Prabhu, Y., and Varma, M. Extreme multi-label loss ...

15 KB (2,456 words) - 22:04, 7 December 2020
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
* <math>T :=(L_T, P_T(x), P_T(x_t | x_{t-1}, a_{t-1}), H )</math> (A Task) * <math>H</math>: The horizon of the MDP. This is a fixed natural number specifying t ...

17 KB (2,846 words) - 00:12, 21 April 2018
stat946w18/Self Normalizing Neural Networks
...ntly, if the the largest singular value of <math display="inline">\mathcal{H}</math> is less than 1. To find the singular values of <math display="inline">\mathcal{H}</math>, the authors used an explicit formula derived by Blinn [2] for <mat ...

45 KB (6,836 words) - 23:26, 20 April 2018
a neural representation of sketch drawings
...}, h_{ \leftarrow})</math> are concatenated to form a latent vector, <math>h</math>, of size <math>N_{z}</math>, &h = [h_{\rightarrow}; h_{\leftarrow}]. ...

30 KB (4,807 words) - 00:40, 17 December 2018
Robust Imitation Learning from Noisy Demonstrations
[3] Brodersen, K. H., Ong, C. S., Stephan, K. E., and Buhmann, J. M. (2010). The balanced accur [13] Wu, Y., Charoenphakdee, N., Bao, H., Tangkaratt, V., and Sugiyama, M. (2019). Imitation learning from imperfec ...

13 KB (2,031 words) - 19:23, 27 November 2021
on using very large target vocabulary for neural machine translation
...the translation vector of y based on the encoded sequence of hidden states h: <math>p(y_t\,|\,y_{<t},x)\propto \exp\{q(y_{t-1}, z_t, c_t)\}</math> where ...

14 KB (2,301 words) - 09:46, 30 August 2017
the Indian Buffet Process: An Introduction and Review
...t one non-zero component, follow a <math>Poisson(\alpha H_N)</math>, where HN is the ''N''th harmonic number, i.e. <math>H_N=\sum_{j=1}^N \fr ...

6 KB (1,032 words) - 09:46, 30 August 2017
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Let <math>h^{c}_{t-1}, h^{r}_{t-1} \in \mathbb{R}^m</math> denotes the two hidden layers where m = d : <math>h^{c}_{t-1} = f(W x_{t-1}^{c} + U h_{t-1}^{r} + b) </math> ...

28 KB (4,651 words) - 20:18, 28 November 2017
STAT946F17/Cognitive Psychology For Deep Neural Networks: A Shape Bias Case Study
$(x, y) = \displaystyle arg \min_{(x_i,y_i) \in S} d(h(x_i), h(\hat{x})) $ The function h is parameterized by Inception – one of the best performing ImageNet classif ...

22 KB (3,531 words) - 20:30, 28 November 2017
Unsupervised Machine Translation Using Monolingual Corpora Only
...onneau, 2017]''' Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H., "Word Translation without Parallel Data". arXiv:1710.04087 ...

8 KB (1,359 words) - 22:48, 19 November 2018
Word translation without parallel data
Dg[W](H)= H^T W + W^T H. D^\ast g[W](H)= WH^T +WH. ...

24 KB (3,873 words) - 17:24, 18 April 2018
STAT946F17/Decoding with Value Networks for Neural Machine Translation
# $\bar{h} = \frac{1}{T_x}\sum\limits^{T_x}_{l=1}h_l$ # $𝜇_{CC} = f_{CC}([\bar{c_{t}},\bar{h}])$ ...

22 KB (3,543 words) - 00:09, 3 December 2017
learning Spectral Clustering, With Application To Speech Separation
 <math>H\left({\boldsymbol{\alpha} }\right)=\frac{1}{N}\sum^N_{n=1}{F\left({{\mathbf ...g to make the solution sparse. The learning algorithm is to minimize <math>H\left({\boldsymbol{\alpha} }\right)</math> with respect to <math>{\boldsymbo ...

35 KB (5,767 words) - 09:45, 30 August 2017
Hierarchical Question-Image Co-Attention for Visual Question Answering
H &= tanh(W_xX + (W_gg)𝟙^T)\\ a_x &= softmax(w_{hx}^T H)\\ ...

27 KB (4,375 words) - 19:50, 28 November 2017
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
...rying k with different hidden unit sizes <math>h</math> by keeping <math>k*h</math> or a similarly related term constant. This is better studied in [5] # Speech and Language Processing. Daniel Jurafsky & James H. Martin. 2017. Draft of August 28, 2017. ...

20 KB (3,272 words) - 20:40, 28 November 2017
Pixels to Graphs by Associative Embedding
...f dimensions h x w, a stacked hourglass (Appendix 2) is used to generate a h x w x f representation of the image. It should be noted that the dimension ...

17 KB (2,749 words) - 18:26, 16 December 2018
stat946f11pool
<center><math> \frac{H}{\theta} = \frac{T}{1-\theta} </math></center> \begin{center} H = \# of all <math>x_i = 1</math>, e.g. \# of heads ...

100 KB (18,249 words) - 09:45, 30 August 2017
conditional neural process
...the model, the observed points are encoded using a three-layer MLP encoder h with a 128-dimensional output representation. The representations are aggre of the encoder h to include convolution layers as ...

32 KB (4,970 words) - 00:26, 17 December 2018
stat341f11
...ion. We use this to solve an integral of the form: <math> I = \int_{a}^{b} h(x) dx </math> \displaystyle I & = \int_{a}^{b} h(x)dx \\ ...

139 KB (23,688 words) - 09:45, 30 August 2017
learning Fast Approximations of Sparse Coding
Lee, H., Battle, A., Raina, R., and Ng, A.Y. Efficient Lee, H., Chaitanya, E., and Ng, A. Y. Sparse deep belief ...

22 KB (3,321 words) - 09:46, 30 August 2017
deep Learning of the tissue-regulated splicing code
...minimize here during training is <math>E=-\sum_a\sum_{k=1}^{C}{y_{n,k}log(h{n,k})}</math>, where <math>n</math> denotes the training example, and <math ...

8 KB (1,353 words) - 09:46, 30 August 2017
consistency of Trace Norm Minimization
...ion problem is generally NP-hard<ref name="fazel2004">Fazel, M. and Hindi, H. and Boyd, S. Rank minimization and applications in system theory. Proceedi ...ine Learning Research'', 7:2541-2563, 2006.</ref> and Zou<ref name="Z2006">H. Zou. The adaptive lasso and its oracle properties. ''Journal of the Amer ...

24 KB (4,053 words) - 09:45, 30 August 2017
Efficient kNN Classification with Different Numbers of Nearest Neighbors
[2] Y. Song, J. Huang, D. Zhou, H. Zha, and C. L. Giles, “IKNN: Informative K-nearest neighbor pattern classi [12] Z. H. Zhou and Y. Yu, “Ensembling local learners throughmultimodal perturbation, ...

23 KB (3,748 words) - 03:46, 16 December 2020
learn what not to learn
3. Dulac-Arnold, G.; Evans, R.; van Hasselt, H.; Sunehag, P.; Lillicrap, T.; Hunt, J.; Mann, T.; Weber, T.; Degris, T.; an 6. VanHasselt,H.,andWiering,M.A. 2009. Usingcontinuousactionspacestosolvediscreteproblems. ...

29 KB (4,751 words) - 13:38, 17 December 2018
stat341 / CM 361
:<math>\begin{align}I &= \displaystyle\int_a^b h(x)\,dx :<math>\displaystyle w(x) = h(x)(b-a)</math> ...

145 KB (24,333 words) - 09:45, 30 August 2017
decentralised Data Fusion: A Graphical Model Approach (Summary)
...e distributed data fusion technique, Channel Filter <ref> A. Makarenko and H. Durrant-Whyte, “Decentralized Bayesian algorithms for active sensor networ ...

9 KB (1,332 words) - 09:45, 30 August 2017
adaptive dimension reduction for clustering high dimensional data
Use the cluster membership <math>H=(h_i^k) </math> obtained to reconstruct the K centres <math>C_{\mu}^* = [ \ ...

9 KB (1,428 words) - 09:46, 30 August 2017
Convolutional Sequence to Sequence Learning
...y but all three have the same fundamental idea. This is given by <math>2^{{H(p)}}=2^{{-\sum _{x}p(x)\log _{2}p(x)}} </math> Suppose you have a four-side ...of input elements. The output of l-th block of decoder is denoted by <math>h^l = (h_1^l,....,h_n^l)</math> and <math>z^l = (z_1^l,....,z_m^l)</math>. Ea ...

27 KB (4,178 words) - 20:37, 28 November 2017
what game are we playing
\min_{u \in \mathbb{R}^n} \max_{v \in \mathbb{R}^m} \ u^T P v -H(v) + H(u) \\ where H(y) is the Gibbs entropy <math> \sum_i y_i log y_i</math>. ...

25 KB (4,131 words) - 23:55, 6 December 2020
policy optimization with demonstrations
To avoid overfitting, the authors add causal entropy <math>−H (\pi_{\theta}) </math> as the regularization term. Thus, the learning objec \[\min_{\theta}\mathcal{L}=-\eta(\pi_{\theta})-\lambda_{2}H(\pi_{\theta})+\lambda_{1} \sup_{{D\in(0,1)}^{S\times A}} \mathbb{E}_{\pi_{\ ...

30 KB (4,632 words) - 00:32, 17 December 2018
One-Shot Imitation Learning
...h or horizon of a demonstration, and some evaluation function $$R_t(d): R^H \rightarrow R$$ are given, and that succesful demonstrations are available ...

20 KB (3,247 words) - 00:27, 21 April 2018
DeepVO Towards end to end visual odometry with deep RNN
[1] S. Wang, R. Clark, H. Wen and N. Trigoni, "DeepVO: Towards end-to-end visual odometry with deep [15] R. Roberts, H. Nguyen, N. Krishnamurthi, and T. Balch, “Memory-based learning for visual ...

16 KB (2,430 words) - 18:30, 16 December 2018
proposal for STAT946 projects
with Hilbert-Schmidt norms. In S. Jain, H. U. Simon, and E. Tomita, editors, Proceedings ...esented above. ii) The kernel matrices have to become centered via matrix H. ...

15 KB (2,332 words) - 09:45, 30 August 2017
Dialog-based Language Learning
...roduced by I to an individual memory slot, and just updates the memory at $H(I(x))$. # Li, Jiwei; Miller, Alexander H.; Chopra, Sumit; Ranzato, Marc'Aurelio; Weston, Jason. "Dialogue Learning W ...

26 KB (4,081 words) - 13:59, 21 November 2021
Wasserstein Auto-Encoders
...\cdot)dP_z(z) - \int_{{\mathcal{Z}}} k(z,\cdot)dQ_z(z) \parallel_{\mathcal{H}_k}, where <math>\mathcal{H}_k</math> is the reproducing kernel Hilbert space of real-valued functions ...

21 KB (3,416 words) - 22:25, 25 April 2018
Learning What and Where to Draw
# Z. Akata, S. Reed, S. Mohan, S. Tenka, B. Schiele, H.Lee. Learning What and Where to Draw. In NIPS 2016 # Z. Akata, S. Reed, D. Walter, H. Lee, and B. Schiele. Evaluation of Output Embeddings for Fine-Grained Imag ...

18 KB (2,781 words) - 12:35, 4 December 2017
context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
A.W. Black, H. Zen, and K. Tokuda, “Statistical parametric speech synthesis,” in Proc. IC ...

10 KB (1,678 words) - 09:46, 30 August 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
...\sqrt{\lambda/2\pi e^{-C(\omega;M)}} </math>, where <math>C(\omega;M) = H(\omega;M) + \lambda\omega^2/2 </math> denotes the L2 regularized cross en #Richard H Byrd, Gillian M Chin, Jorge Nocedal, and Yuchen Wu. Sample size selection i ...

34 KB (5,220 words) - 20:32, 10 December 2018
Fix your classifier: the marginal value of training the last weight layer
To recall, Hadamard matrix (Hedayat et al., 1978) <math> H </math> is an <math> n × n </math> matrix, where all of its entries are eit ...he entire Hadamard matrix <math>H</math>, a truncated version, <math> \hat{H} ∈ </math> {<math> {-1, 1}</math>}<math>^{C \times N}</math> where all ...

34 KB (5,105 words) - 00:39, 17 December 2018
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
# Construct <math>H</math> be a perfect hash function with <math>L</math> buckets, and <math>\p # <math>*</math>Construct <math>\phi(z_i, z_{i,j}, z_j) = \mathbf{1}[H(z_j)] z_{i,j}</math>, which intuitively means that <math>\phi</math> stores ...

29 KB (4,603 words) - 21:21, 6 December 2018
Searching For Efficient Multi Scale Architectures For Dense Image Prediction
11. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In ICCV, 2017. ...

21 KB (3,227 words) - 18:12, 14 December 2018
regression on Manifold using Kernel Dimension Reduction
Let <math>\,({ H}_1, k_1)</math> and <math>\,({H}_2, k_2)</math> be RKHS over <math>\,(\Omega_1, { B}_1)</math> and <math>\, ...

26 KB (4,280 words) - 09:45, 30 August 2017
deep Convolutional Neural Networks For LVCSR
O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, “Applying convolutional ...

11 KB (1,587 words) - 09:46, 30 August 2017
End-to-End Differentiable Adversarial Imitation Learning
...hbb{E}_{\pi}[log(D(s,a)]\ +\ \mathbb{E}_{\pi_E}[log(1 - D(s,a))] - \lambda H(\pi)) where <math> H(\pi) \triangleq \mathbb{E}_{\pi}[-log\: \pi(a|s)]</math> is the entropy. ...

24 KB (3,880 words) - 23:00, 20 April 2018
graph Laplacian Regularization for Larg-Scale Semidefinite Programming
...a. Readers are referred to the book "Introduction to algorithms" by Thomas H. Cormen for the formal definition of Schur complement and the proof of the ...

12 KB (1,953 words) - 09:45, 30 August 2017
show, Attend and Tell: Neural Image Caption Generation with Visual Attention
...., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). [http://arxiv.org/pdf/1406.1078.pdf Learning phrase ...

12 KB (1,882 words) - 09:46, 30 August 2017
learning Convolutional Feature Hierarchies for Visual Recognition
...el, x is a w×h image, zk is a feature map of dimension (w+s-1)×(h+s-1), and * denotes the discrete convolution operator. ...

12 KB (1,872 words) - 09:46, 30 August 2017
scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers Machines
...y "Segmentation, minimum spanning tree and hierarchies."] In L. Najman and H. Talbot, editors, Mathematical Morphology: from theory to application, chap ...

12 KB (1,895 words) - 09:46, 30 August 2017
Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias
...we encode symbols from <math>y</math> using the wrong tool <math> {\hat h}</math> . This consists of encoding the <math> {i_{th}}</math> symbol using H(y,\hat y) = \sum_i{y_i\log{\frac{1}{\hat y_i}}} ...

26 KB (4,201 words) - 18:21, 14 December 2018
neural Machine Translation: Jointly Learning to Align and Translate
...4) <ref>Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., and Bengio, Y. (2014a). ...

14 KB (2,221 words) - 09:46, 30 August 2017
Conditional Image Synthesis with Auxiliary Classifier GANs
# Reed, S. E., Akata, Z., Mohan, S., Tenka, S., Schiele, B., & Lee, H. (2016). Learning what and where to draw. In Advances in Neural Information # Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Berg, A. C. (2015). Imagenet larg ...

33 KB (5,219 words) - 10:24, 4 December 2017
End to end Active Object Tracking via Reinforcement Learning
...(s_t))\bigtriangledown_\theta log\pi(a_t|s_t)+\beta\bigtriangledown_\theta H(\pi(.|s_t))</math> ...factor <math>0 < \gamma \leq 1, \alpha</math> is the learning rate, <math>H (·)</math> is an entropy regularizer, and <math>\beta</math> is the regular ...

29 KB (4,453 words) - 18:27, 16 December 2018
stat340s13
...The Multiplicative Congruential Method, invented by Berkeley professor D. H. Lehmer, may also refer to the special case where <math>b=0</math> and the Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. ...

370 KB (63,356 words) - 09:46, 30 August 2017
Wasserstein Auto-encoders
...(z)} - \int \limits_{\mathcal{Z}} {k(z, \cdot)dQ_Z(z)} \right \|_{\mathcal{H}_k} where <math>\mathcal{H}_k</math> is the RKHS (reproducing kernel Hilbert space) of real-valued fun ...

30 KB (4,923 words) - 19:25, 10 December 2018
Wavelet Pooling CNN
...T is first applied on the rows and then the columns. If a low (L) and high(H) sub-band is extracted from the rows and similarly for the columns than at ...

15 KB (2,396 words) - 22:57, 20 April 2018
stat441w18/Image Question Answering using CNN with Dynamic Parameter Prediction
...ions (h,w,c), i.e. height, width, and # of channels, and the output being (h’, w’, k), i.e. output height, width, and # of filters, we know that the the ...tion ''h'' mapps the keys to an element from the set {1...M} -- i.e. <math>h(k) ∈ {1...M}</math>, <math>∀ k ∈ U</math>. This allows for ...

32 KB (5,284 words) - 22:03, 19 March 2018
supervised Dictionary Learning
.... PAMI'', vol.31, no. 2, pp. 210-227, 2009.</ref><ref>R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: transfer learning from ...

21 KB (3,291 words) - 09:45, 30 August 2017
Co-Teaching
[2] S. Reed, H. Lee, D. Anguelov, C. Szegedy, D. Erhan, and A. Rabinovich. Training deep n ...

15 KB (2,318 words) - 21:02, 11 December 2018
compressive Sensing
...tor in the <math>\,(N-M)</math> dimensional translated null space <math>\,H=N(\theta)+s</math>. Related to the concept of [http://en.wikipedia.org/wiki ...

18 KB (2,888 words) - 09:45, 30 August 2017
stat841
In classification,, we attempt to approximate a function <math>\,h</math>, by using a training data set, which will then be able to accurately ...e set of labels, We try to determine a ''''classification rule'''' <math>\,h</math> such that, ...

263 KB (43,685 words) - 09:45, 30 August 2017
Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness
l_{smoothness}(\mathbf{u}, \mathbf{v}) = \sum\limits_j^H\sum\limits_i^W \Big(\rho_S(u_{i,j}, u_{i+1, j}) + \rho_S(u_{i,j} - u_{i, j+ ...

16 KB (2,542 words) - 17:26, 26 November 2018
f11Stat841proposal
[1] Ince, H., Trafalis, T.B., "Kernel principal component analysis and support vector m H. White, “Learning in artificial neural networks: A statistical ...

26 KB (4,036 words) - 14:56, 11 October 2020
Influenza Forecasting Framework based on Gaussian Processes
[6] Bussell E. H., Dangerfield C. E., Gilligan C. A. and Cunniffe N. J. 2019Applying optimal ...

17 KB (2,683 words) - 14:13, 7 December 2020
stat841f14
...ality from the space X to the dimensionality of space Y by passing through H without having to know '''<math>\Phi(X)</math>''' exactly. :<math>K = -\frac{1}{2}HD^{(X)}H</math> ...

220 KB (37,901 words) - 09:46, 30 August 2017
uncovering Shared Structures in Multiclass Classification
The ultimate goal of multiclass classification is to learn a mapping <math>\,H : \mathcal{X} \mapsto \mathcal{Y}</math> from instances in <math>\,\mathcal ...

24 KB (3,815 words) - 09:45, 30 August 2017
Fairness Without Demographics in Repeated Loss Minimization
...epeated Loss Minimization]" by Hashimoto, T. B., Srivastava, M., Namkoong, H., & Liang, P. which was published at the International Conference of Machin ...

20 KB (3,120 words) - 00:42, 17 December 2018
THE LOGICAL EXPRESSIVENESS OF GRAPH NEURAL NETWORKS
...Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, C¸ aglar Gulc¸ehre, H. Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish ¨ V ...

17 KB (2,786 words) - 17:02, 6 December 2020
STAT946F17/ Automated Curriculum Learning for Neural Networks
1 &\quad \text{if } \hat{r}_t > q^{h}_t\\ ...

16 KB (2,534 words) - 14:37, 30 November 2017
Multi-scale Dense Networks for Resource Efficient Image Classification
# Teerapittayanon, Surat, Bradley McDanel, and H. T. Kung. "Branchynet: Fast inference via early exiting from deep neural ne ...

18 KB (2,750 words) - 22:45, 20 April 2018
Predicting Floor Level For 911 Calls with Neural Network and Smartphone Sensor Data
[4] W Falcon, H Schulzrinne, Predicting Floor-Level for 911 Calls with Neural Networks and ...

18 KB (2,896 words) - 18:43, 16 December 2018
a New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
...optimization problem can be written as:<center> <math> \min_{f\in\mathcal{H}_{\otimes}} R_{N}(f)+\lambda\parallel f \parallel^2_\otimes </math> </cente ...

24 KB (3,853 words) - 09:45, 30 August 2017
ShakeDrop Regularization
...hen <math>\alpha \in [-1,1]</math> and <math>\beta \in [0,1]</math>. Cases H and F outperform PyramidNet, suggesting that the strong perturbations impos ...

21 KB (3,187 words) - 00:34, 17 December 2018
stat841f11
The '''true error rate''' for classifier <math>h</math> is the error with respect to the unknown underlying distribution whe <math>L(h) = P(h(X) \neq Y )</math> ...

314 KB (52,298 words) - 12:30, 18 November 2020
Annotating Object Instances with a Polygon RNN
.../math> denote the input, forget, and output gate, <math display = "inline">h</math> is the hidden state and <math display = "inline">c</math> is the cel ...

21 KB (3,323 words) - 18:41, 16 December 2018
stat946f15/Sequence to sequence learning with neural networks
...h>\,W_{oh}</math> is the hidden to output weight matrix. Vector <math>\,b_{h}</math> and <math>\,b_{o}</math> are the biases. When t=1, the undefined <m ...

23 KB (3,755 words) - 19:49, 5 February 2018
stat946w18/Spectral
...h>\,W_{oh}</math> is the hidden to output weight matrix. Vector <math>\,b_{h}</math> and <math>\,b_{o}</math> are the biases. When t=1, the undefined <m ...

23 KB (3,755 words) - 17:51, 22 February 2018
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
...h>\,W_{oh}</math> is the hidden to output weight matrix. Vector <math>\,b_{h}</math> and <math>\,b_{o}</math> are the biases. When t=1, the undefined <m ...

23 KB (3,755 words) - 22:22, 23 February 2018
compressed Sensing Reconstruction via Belief Propagation
An LDPC code is a block code that has a sparse parity check matrix <math>\ H </math>. The parity check matrix of an LDPC code can be represented by a bi ...

23 KB (3,784 words) - 09:45, 30 August 2017
MarrNet: 3D Shape Reconstruction via 2.5D Sketches
...ion into the human representation and processing of visual information. W. H. Freeman and Company, 1982. ...

21 KB (3,383 words) - 22:42, 20 April 2018
from Machine Learning to Machine Reasoning
...hat intrinsically difficult; it just seems so when we do it." <ref>Moravec H. (1988). Mind Children: The future of robot and human intelligence. Massach ...

21 KB (3,225 words) - 09:46, 30 August 2017
stat946f11
\psi_{c_i} (x_{c_i}) = exp (- H(x_i)) P(x_{V}) = \frac{1}{Z} \prod_{c_i \epsilon C} exp(-H(x_i)) = \frac{1}{Z} exp (- \sum_{c_i} {H_{c_i} (x_i)}) ...

162 KB (28,558 words) - 09:45, 30 August 2017
Training And Inference with Integers in Deep Neural Networks
# Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks ...

20 KB (2,998 words) - 21:23, 20 April 2018
Universal Style Transfer via Feature Transforms
[14] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Diversified texture synthesis with feed-forward networks. In CVPR, 2 ...

25 KB (4,065 words) - 20:10, 28 November 2017
Functional regularisation for continual learning with gaussian processes
[6] Kuo, H. "Introduction to Stochastic Integration Springer." Berlin Heidelberg (2006 ...

26 KB (4,302 words) - 23:25, 7 December 2020
human-level control through deep reinforcement learning
Hasselt, H. V., et al. [http://arxiv.org/pdf/1509.06461.pdf " Deep reinforcement learn ...

25 KB (4,026 words) - 09:46, 30 August 2017
stat841f10
...hose feature values is <math>\,x</math> is given the label <math>\,\hat{Y}=h(x)</math>. ...r training data, we could use the classifier's classification rule <math>\ h </math> to classify any newly-given vegetable or fruit such as the one show ...

451 KB (73,277 words) - 09:45, 30 August 2017
Learning the Number of Neurons in Deep Networks
H. Zhou, J. M. Alvarez, and F. Porikli. Less is more: Towards compact CNNs. I ...

24 KB (3,886 words) - 01:20, 3 December 2017
Evaluating Machine Accuracy on ImageNet
[1] Shankar, V., Roelofs, R., Mania, H., Fang, A., Recht, B., & Schmidt, L. (2020). Evaluating Machine Accuracy on ...

29 KB (4,464 words) - 00:08, 15 December 2020
DCN plus: Mixed Objective And Deep Residual Coattention for Question Answering
E_1^D = BiLSTM_1(L^D) \in R^{(h×(m+1))} E_1^Q = tanh(W BiLSTM_1(L^Q) \in R^{(h×(n+1))} ...

24 KB (3,769 words) - 17:49, 14 December 2018
DON'T DECAY THE LEARNING RATE , INCREASE THE BATCH SIZE
#Richard H Byrd, Gillian M Chin, Jorge Nocedal, and Yuchen Wu. Sample size selection i ...

27 KB (4,025 words) - 13:28, 17 December 2018
Being Bayesian about Categorical Probability
[15] Jospin, L. V., Buntine, W. V., Boussaid, F. V., Laga, H. V., & Bennamoun, M. V. (2020). Hands-on Bayesian Neural Networks - a Tutor ...

29 KB (4,651 words) - 10:57, 15 December 2020
Zero-Shot Visual Imitation
[3] Albert Bandura and Richard H Walters. Social learning theory, volume 1. Prentice-hall Englewood ...

31 KB (4,977 words) - 18:42, 16 December 2018
Surround Vehicle Motion Prediction
[3] B. Kim, C. M. Kang, J. Kim, S. H. Lee, C. C. Chung, and J. W. Choi, “Probabilistic vehicle trajectory predic ...

29 KB (4,569 words) - 23:12, 14 December 2020
stat946F18/Autoregressive Convolutional Neural Networks for Asynchronous Time Series
[9] Heaton, J. B., Polson, N. G., and Witte, J. H. Deep learning in finance, February 2016. ...

29 KB (4,577 words) - 10:13, 14 December 2018
proposal Fall 2010
2. Papadimitriou, Christos H., Mihalis Yannakakis. Structure in Complexity Theory Conference. IEEE. May ...

28 KB (4,210 words) - 09:45, 30 August 2017
Speech2Face: Learning the Face Behind a Voice
[4] L. Castrejon, Y. Aytar, C. Vondrick, H. Pirsiavash, and A. Torralba. Learning aligned cross-modal representations ...

32 KB (5,152 words) - 03:36, 15 December 2020
Modular Multitask Reinforcement Learning with Policy Sketches
...d by parameter vector $\theta$, $$\displaystyle \max_{\Theta}E[\sum_{t=0}^{H}R(s_{t})|\pi_{\theta}]$$ $\pi_{\theta}(u|s)$ is the probability of action u ...

32 KB (4,994 words) - 14:25, 3 December 2017
Deep Reinforcement Learning in Continuous Action Spaces a Case Study in the Game of Simulated Curling
# Yamamoto, M., Kato, S., and Iizuka, H. Digital curling strategy based on game tree search. In Proceedings of the ...

35 KB (5,619 words) - 18:39, 10 December 2018
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
[13] M. T. Dzindolet, S. A. Peterson, R. A. Pomranky, L. G. Pierce, and H. P. Beck. The role of trust in automation reliance. Int. J. Hum.-Comput. St ...

36 KB (5,713 words) - 20:21, 28 November 2017

Search results

Navigation menu

Search