main Page: Difference between revisions

From statwiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 13: Line 13:
  '''Archive  
  '''Archive  
''' ==
''' ==
== State Space Models ==
=== Introduction ===
State Space Models (SSMs) are introduced as powerful alternatives to traditional sequence modeling approaches. These models demonstrate good performance in various modalities, including time series analysis, audio generation, and image processing and they can capture long-range dependencies more efficiently.
SSMs initially struggled to match the performance of Transformers in language modeling tasks and there were some gaps between them. To address their challenges, recent advances in their architecture such as the Structured State Space Model (S4) have been introduced, which succeeded in long-range reasoning tasks and allowed for more efficient computation while preserving theoretical strengths. However, its implementation remains complex and computationally demanding. So further research led to simplified variants such as the Diagonal State Space Model (DSS), which achieves comparable performance with a more straightforward formulation. In parallel, hybrid approaches, like the H3 model, that integrate SSMs with attention mechanisms try to bridge the mentioned gaps. To understand better what I mean from the hybrid word, for example in H3 the authors try replacing almost all the attention layers in transformers with SSMs. More recently, models like Mamba have pushed the boundaries of SSMs by selectively parameterizing state matrices as functions of the input and allowing more flexible and adaptive information propagation.
Research in SSMs continues to resolve the remaining challenges and the potential to substitute attention-based architectures with SSMs grows stronger. They will likely play a crucial role in the next generation of sequence modeling frameworks.
== [[stat946F18 | Deep Learning (STAT 946- Fall 2018) ]] ==
== [[stat946F18 | Deep Learning (STAT 946- Fall 2018) ]] ==
== [[stat441F18 | Statistical Learning - Classification (STAT 441/841 CM 763- Fall 2018) ]] ==
== [[stat441F18 | Statistical Learning - Classification (STAT 441/841 CM 763- Fall 2018) ]] ==

Latest revision as of 15:54, 2 March 2025

NOTE: Wiki has been migrated from wikicoursenote.com to wiki.math.uwaterloo.ca/statwiki

LLMs & Generative Models (STAT 946- Winter 2025)

Deep Learning (STAT 940- Winter 2025)

Deep Learning (STAT 940- Fall 2021)

Statistical Learning - Classification (STAT 441/841 CM 763- Fall 2021)

==

Archive 

==

Deep Learning (STAT 946- Fall 2018)

Statistical Learning - Classification (STAT 441/841 CM 763- Fall 2018)

Deep Learning (STAT 946- Winter 2018)

Statistical Learning - Classification (STAT 441/841 CM 763- Winter 2018)

Deep Learning (STAT 946- Fall 2017)

Deep Learning (STAT 946- Fall 2015)

Data Visualization (Stat 442 / 842, CM 762 - Fall 2014)

Computer Simulation of Complex Systems (Stat 340 - Spring 2013)

Dimensionality Reduction and Metric Learning (Stat 946 - Spring 2013)

Classification (Stat441/841 & CM 463/763-Fall 2011)

Probabilistic Graphical Models (Stat946-Fall 2011)

Computational Statistics and Data Analysis (Stat 341 & CM 361- Fall 2011)

Probabilistic Graphical Models (Stat946-Fall 2011) -- Material Pool

Go to Stat441/841 & CM 463/763-Fall 2010

Go to stat946-Fall 2010


Go to Stat441/841 & CM 463-Fall 2009

Go to stat946-Spring 2009

Go to Stat341 & CM 361

Go to Stat441/841 & CM 463/763-Fall 2011

HowTo Use Wiki

You can take a look to Simple Editing Howto to learn quickly how you should edit a wiki.

For writing formulae in wikicoursenote, please take a look at Help:Displaying a formula. It will definitely help you.

A solution to a common problem (New)

You may have faced the situation when the math formulas in the body of wikinotes appears extraordinary small (compared to usual font for math formulas). Sometimes this small font helps and sometimes it hurts! One solution to correct this is to simply insert a \, at the beginning of the formula. This will solve the problem without having any effect on the rest of the formula. For example you should write <mth>\,p_{x,y}</math> instead of <mth>p_{x,y}</math>, to see [math]\displaystyle{ \,\!p_{x,y} }[/math] instead of [math]\displaystyle{ p_{x,y} }[/math].

Examples

Carl Gustav Jung

According to scientists, the Sun is pretty big.<ref>E. Miller, The Sun, (New York: Academic Press, 2005), 23-5.</ref> The Moon, however, is not so big.<ref>R. Smith, "Size of the Moon", Scientific American, 46 (April 1978): 44-6.</ref>

[math]\displaystyle{ \sqrt{x^2+2x+1}=|x+1| - \left(\left(\frac{2x^2}{x}\right)^2\right)^2 }[/math]

Summary During the lecture on May 9th, we have introduced the concepts of pseudo-random variables. We have used the example of “mod” to clarify the basic idea of generating random variable of uniform (0,1). Also, we have used the example of convertible cdf to show how to generate random variables from uniform(0,1). For each of the example in class, the instructor has used Matlab to show how to reach the desired results in Matlab.

Multiplicative Congruential Algorithm We use the operator “mod” e.g. (10 mod 3) = 1

if using the recursive form, (a*x+b mod m) = y Let a=2, b=1, m=3

If x=10 (2*10+1 mod 3) =0 (2*0+1 mod 3) = 1 (2*1+1 mod 3) = 0

Example a=13 b=0 m=31 The first 30 numbers in the sequence are a permutation of integers from 1 to 30 and then the sequence repeats itself.  Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the results is numbers uniformly distributed in the interval [0,1].  There is only a finite number of values—30 in this case.

Question: How to generate exp (lambda) from uniform [0,1]?

Inverse Transform Method

Theorem

Take u~U(0,1), let x=F-1(u)
Then x has distribution function F( ), where F(x)= Pr(X<=x), F-1( ) denotes the inverse function of F( ).

Proof

F(x) = Pr(X<=x)
    =Pr (F-1(u)<=x)
    =Pr(F(F-1(u))<=F(x))
    =Pr(u<=F(x))
    =F(x)  (since U~U(0,1))

Example 1

Let f(x)=a*exp^(-a*x)
F(x)=1-exp^(-a*x)
 u=1-exp^(-a*x)
 x= -1/a*ln(1-u)
 F-1(x)= -1/a*ln(1-u)

Therefore, the algorithm is: 1. Draw u~U(0,1) 2. Let x= -1/a*ln(1-u)

Additional Example: Write an algorithm to generate a random variable from F(x)=x^12, 0<x<1 Solution: 1. Generate u~U(0,1) 2. u=x^12

    x=u^(1/12)

3. output x we need to show that Pi si the stationary distribution of this Markov Chain, [pi]=[pi]P detailed balance Remark 1; A common choice for q(y|x) is a normal distribution centered at X with standard deviation b q(y|x)= N (x, b^2) in this case q(y|x) is symmetric.