# Difference between revisions of "stat340s13"

## Introduction

Four Fundamental Problems:

1. Classification: Given an input object X, we have a function which will take in this input X and identity which 'class (Y)' it belongs to. (Discrete Case) 2.Regression: Same as classification but in the continuous case 3.Clustering: Use common features of objects in same class or group to form clusters. 4. Dimensionality Reduction

## Applications

Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity Examples Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning Search and recommendation (eg. Google) Automatic speech recognition, speaker verification Text parsing Face identification Tracking objects in video Financial prediction, fraud detection

## Course Information

No required textbook, recommended: "Simulation" by Sheldon M. Ross Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it) Learn for announcements, assignments, and emails. Other course material on: <<wikicoursenote.com>> Log on to both Learn and wikicoursenote frequently.

Must do both Two types: primary contributor: 1 lecture during the whole semester, must put a summary of the lecture up within 48 hours. General contributor: elaborate on concepts, add example, add code, add pictures, reference, etc… responsible for 50% of lectures. A general contribution could be correcting mistakes, etc… or technical in examples, etc… At least half of your contributions should be technical. Make contribution no later than 2 weeks after the lecture. Do not submit copyrighted work without permission, cite original sources. Each time you make a contribution, check mark the table. (Marks calculated on honour system, although everything can be verified through the history [randomly verified])

## Tentative Marking Scheme

Item Value
Assignments (~6) 30%
WikiCourseNote 10%
Midterm 20%
Final 40%

## Class 2 - Thursday, May 9

Sampling (Generating random numbers) Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are deterministic – that is, the result can be reliably calculated using things such as physics and math.

A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate Pseudo Random Numbers; numbers that seem random but are actually deterministic.

Multiplicative Congenital Algorithm – algorithm to generate uniform pseudo random numbers Take a number x and divide it by m The remainder is a number between 0 and m-1 We use the operator “mod” X mod m 1 = 10 mod 3 x = x mod m – we take the remainder and go remainder mod m, but we will get x every time we do this We can modify this: Z = ax+b mod m Example: a=2, b=1, m=3; if x=10 Step 1: 0 = 2(10)+1 mod 3 Step 2: 1 = 2(0) + 1 mod 3 Step 3: 0 = 2(1) + 1 mod 3 You will get a sequence of numbers

MatLab for Multiplicative Congenital Algorithm Before your start: Clear all Close all

A=17 B=3 M=31 X=5 Mod(a*x+b,m) Ans = 26 X=mod(a*x+b,m) Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer.

X=mod(ax+b,m) For ii=2:1000 X(ii)=mod(a*x(ii-1)+b,m); end Note: This generates 1000 numbers. Size(x) Hist(x)

X(1)=5 For ii=2:5000 X(ii)=mod(a*x(ii-1)+b,m); End Hist(x)

This algorithm involves three integer parameters a, b, and m and an initial value, x_0 called seed. A sequence of numbers is defined by x(k+1) = a*x(k) + b mod m Mod m means take the remainder after division by m

Example: a=13, b=0, m=31 The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself.

Values are between 0 and m-1. If the values are normalized by dividing by m-1, then result is numbers uniformly distributed in the interval [0,1].

There are only a finite number of values (30 in this case).

For many years “rand” function in Matlab used this algorithm with these parameters A=7^5=16807, b=0, m=2^31 -1, 2147483647 – recommended in a 1993 paper by Park and Miller (Important part is that m should be large)

Inverse Transform Method U ~ U[0,1]

Theorem: Take U ~ U[0,1] and let x=F^(-1)(U) Then x has distribution function F(.) Where F(x) = P(X<x) cdf; F^(-1)(U) denotes the function inverse of F(.)

Ie. F(x)=U -> x=F^(-1)(U)

Example: f(x) = λe^(-λx) F(x)=∫0tox〖f(x)dx〗

= ∫0tox〖λe^(-λx) dx〗

=λ/(-λ) e^(-λx) [0 to x]

=(-e^(-λx)+e^0 )

=1-e^(-λx)

y=1-e^(-λx);

1-y=e^(-λx);

x=-ln(1-y)/λ;

y=-ln(1-x)/λ;

F^(-1) (x)=-ln(1-x)/λ;

Step 1: Draw U ~U[0,1];

Step 2: x=-ln(1-U)/ λ;

MatLab:

rand

rand(1,1000)

u=rand(1,1000)

hist(u)

F(x) =P(X<=x) =P(F^(-1)(U)<=x) =P(F(F^(-1)(U))<=F(x)) =P(U<=F(x))