User talk:Dsevern
Newton-Raphson Method
Previously we had derivated the log likelihood function for the logistic function.
[math]\displaystyle{ \begin{align} {\ell(\mathbf{\beta\,})} & {} = \sum_{i=1}^n \left(y_i {\mathbf{\beta\,}^T \mathbf{x_i}} - \ln({1+e^{\mathbf{\beta\,}^T \mathbf{x_i}}})\right) \end{align} }[/math]
Our goal is to find the [math]\displaystyle{ \beta\, }[/math] that maximizes [math]\displaystyle{ {\ell(\mathbf{\beta\,})} }[/math]. We use calculus to do this ie solve [math]\displaystyle{ {\frac{\partial \ell}{\partial \mathbf{\beta\,}}}=0 }[/math]. To do this we use the famous numerical method of Newton-Raphson.
[math]\displaystyle{ \begin{align} {\frac{\partial \ell}{\partial \mathbf{\beta\,}}}&{} = \sum_{i=1}^n \left(y_i \mathbf{x_i} - \frac{e^{\mathbf{\beta\,}^T \mathbf{x}}}{1+e^{\mathbf{\beta\,}^T \mathbf{x}}} \mathbf{x_i} \right) \\[8pt] \end{align} }[/math]
The first derivative is typically called the score vector.
[math]\displaystyle{ \begin{align} {\frac{\partial \ell}{\partial \mathbf{\beta\,} \partial \mathbf{\beta\,}^T}}&{} = -\sum_{i=1}^n \left(y_i \mathbf{x_i}\mathbf{x_i}^T (\frac{e^{\mathbf{\beta\,}^T \mathbf{x}}}{1+e^{\mathbf{\beta\,}^T \mathbf{x}}})(\frac{1}{1+e^{\mathbf{\beta\,}^T \mathbf{x}}}) \right) \\[8pt] \end{align} }[/math]
The negative of the second derivative is typically called the information matrix.