Terms Explanations in Deep Learning
2 min read

Terms Explanations in Deep Learning

Binary Classification

A classification method that yields a model $\hat{y} = wx+b$ for a specific x input and gets the corresponding predicted y (so called $\hat{y}$ ).

Logistic Regression

Transpose prediction $\hat{y}$ to a logical and explainable value to a Binary classification model.

Transpose function like Sigmoid function:

$\sigma(z) = \frac{1}{1+e^{-z}}$

Loss Functions

Loss (error) function works on a particular piece of data from training set, which tells the model what the direction to alternate $\hat{y}$.

Loss functions like Square loss:

$L(\hat{y}, y) = \frac{(\hat{y} - y)^2}{2}$

Loss functions like Logistic loss:

$L(\hat{y}, y) = -(y\log{\hat{y}} + (1-y)\log{1-\hat{y}})$

Cost Functions

Cost function works in back-propagation process to find a global optimal in the entire training set.

$J(w, b) = \frac{1}{m} \sum\limits_{i=0}^{m} L(\hat{y}^{(i)}, y^{(i)})$

where the L is the Loss function.

Gradient Descent

The process (Algorithm) of finding the minimum cost with corresponding w and b. Mention here is that the w and b are a bunch of weights and biases in each regression unit (Neuron).

We need to prove that the error function minimized using gradient descent by proving logistic regression convex. http://mathgotchas.blogspot.com/2011/10/why-is-error-function-minimized-in.html

Repeart Until Converge {

$w:=w - \alpha\frac{\partial J(w, b)}{\partial w}$

$b:=b - \alpha\frac{\partial J(w, b)}{\partial b}$

}

Where $\alpha$ is learning rate always significantly small.

Download code to implement:

wget https://raw.githubusercontent.com/Ex10si0n/machine-learning/main/Lab-Assignment/lr-gd.py

Vectorization

Using some scientific calculation packages such as numpy, to make the code shorter and faster. Whenever it is possible, avoid using explicit for loop.

z = np.dot(w, x)

Using Vectorization in feed-forward, to compute $A$ by:

$z^{(i)} = w^\top x^{(i)}+b$

$Z = \sum{z^{(i)}}$

$A = \sigma(Z)$

We can have:

$Z=[z^{(1)}, z^{(2)}, ..., z^{(m)}] = w^\top + [b, b, ..., b]$

# +b(Integer) here is by broadcasting technique in numpy: b -> [b, b, ...]
Z = np.dot(w.T, X) + b

Using Vectorization in back-propagagtion, to compute:

$\partial z = A - Y$

$\partial w = \frac{1}{m}X\partial z^\top$

$\partial b = \frac{1}{m}\sum\partial z$

$w:=w-\alpha \partial w$

$b:=b-\alpha \partial b$

Linear Algebra Terms

Vector dot product

$a \cdot b = \sum\limits_{i=1}^{n}a_ib_i =ab^\top$

$[1, 3, -5]\cdot[4, -2, -1] = (1\times4)+(3\times-2)+(-5\times-1)=3$

$[1, 3, -5]\cdot[4, -2, -1]=\begin{bmatrix}1 &3 &-5\end{bmatrix}\begin{bmatrix}
4
\-2
\1
\end{bmatrix}=3$

c = a.dot(b)
c = a @ b
Vector norms

For a vector $x = [x_1, x_2, ..., x_m]$

https://www.math.usm.edu/lambers/mat610/sum10/lecture2.pdf