Terms Explanations in Deep Learning
Binary Classification
A classification method that yields a model $\hat{y} = wx+b$ for a specific x input and gets the corresponding predicted y (so called $\hat{y}$ ).
Logistic Regression
Transpose prediction $\hat{y}$ to a logical and explainable value to a Binary classification model.
Transpose function like Sigmoid function:
$\sigma(z) = \frac{1}{1+e^{-z}}$
Loss Functions
Loss (error) function works on a particular piece of data from training set, which tells the model what the direction to alternate $\hat{y}$.
Loss functions like Square loss:
$L(\hat{y}, y) = \frac{(\hat{y} - y)^2}{2}$
Loss functions like Logistic loss:
$L(\hat{y}, y) = -(y\log{\hat{y}} + (1-y)\log{1-\hat{y}})$
Cost Functions
Cost function works in back-propagation process to find a global optimal in the entire training set.
$J(w, b) = \frac{1}{m} \sum\limits_{i=0}^{m} L(\hat{y}^{(i)}, y^{(i)})$
where the L is the Loss function.
Gradient Descent
The process (Algorithm) of finding the minimum cost with corresponding w and b. Mention here is that the w and b are a bunch of weights and biases in each regression unit (Neuron).
We need to prove that the error function minimized using gradient descent by proving logistic regression convex. http://mathgotchas.blogspot.com/2011/10/why-is-error-function-minimized-in.html
Repeart Until Converge {
$w:=w - \alpha\frac{\partial J(w, b)}{\partial w}$
$b:=b - \alpha\frac{\partial J(w, b)}{\partial b}$
}
Where $\alpha$ is learning rate always significantly small.
Download code to implement:
wget https://raw.githubusercontent.com/Ex10si0n/machine-learning/main/Lab-Assignment/lr-gd.py
Vectorization
Using some scientific calculation packages such as numpy
, to make the code shorter and faster. Whenever it is possible, avoid using explicit for loop.
z = np.dot(w, x)
Using Vectorization in feed-forward, to compute $A$ by:
$z^{(i)} = w^\top x^{(i)}+b$
$Z = \sum{z^{(i)}}$
$A = \sigma(Z)$
We can have:
$Z=[z^{(1)}, z^{(2)}, ..., z^{(m)}] = w^\top + [b, b, ..., b]$
# +b(Integer) here is by broadcasting technique in numpy: b -> [b, b, ...]
Z = np.dot(w.T, X) + b
Using Vectorization in back-propagagtion, to compute:
$\partial z = A - Y$
$\partial w = \frac{1}{m}X\partial z^\top$
$\partial b = \frac{1}{m}\sum\partial z$
$w:=w-\alpha \partial w$
$b:=b-\alpha \partial b$
Linear Algebra Terms
Vector dot product
$a \cdot b = \sum\limits_{i=1}^{n}a_ib_i =ab^\top$
$[1, 3, -5]\cdot[4, -2, -1] = (1\times4)+(3\times-2)+(-5\times-1)=3$
$[1, 3, -5]\cdot[4, -2, -1]=\begin{bmatrix}1 &3 &-5\end{bmatrix}\begin{bmatrix}
4
\-2
\1
\end{bmatrix}=3$
c = a.dot(b)
c = a @ b
Vector norms
For a vector $x = [x_1, x_2, ..., x_m]$
Reference: