# Terms Explanations in Deep Learning

#### Binary Classification

A classification method that yields a model $\hat{y} = wx+b$ for a specific x input and gets the corresponding predicted y (so called $\hat{y}$ ).

#### Logistic Regression

Transpose prediction $\hat{y}$ to a logical and explainable value to a Binary classification model.

Transpose function like **Sigmoid function**:

$\sigma(z) = \frac{1}{1+e^{-z}}$

#### Loss Functions

Loss (error) function works on **a particular piece of data** from training set, which tells the model what the direction to alternate $\hat{y}$.

Loss functions like **Square loss**:

$L(\hat{y}, y) = \frac{(\hat{y} - y)^2}{2}$

Loss functions like **Logistic loss**:

$L(\hat{y}, y) = -(y\log{\hat{y}} + (1-y)\log{1-\hat{y}})$

#### Cost Functions

Cost function works in back-propagation process to find a **global optimal** in the **entire training set**.

$J(w, b) = \frac{1}{m} \sum\limits_{i=0}^{m} L(\hat{y}^{(i)}, y^{(i)})$

where the L is the Loss function.

#### Gradient Descent

The process (Algorithm) of finding the **minimum cost** with corresponding w and b. Mention here is that the w and b are a bunch of weights and biases in each regression unit (Neuron).

We need to prove that the error function minimized using gradient descent by proving logistic regression convex. http://mathgotchas.blogspot.com/2011/10/why-is-error-function-minimized-in.html

Repeart Until Converge {

$w:=w - \alpha\frac{\partial J(w, b)}{\partial w}$

$b:=b - \alpha\frac{\partial J(w, b)}{\partial b}$

}

Where $\alpha$ is learning rate always significantly small.

**Download** code to implement:

```
wget https://raw.githubusercontent.com/Ex10si0n/machine-learning/main/Lab-Assignment/lr-gd.py
```

#### Vectorization

Using some scientific calculation packages such as `numpy`

, to make the code shorter and faster. Whenever it is possible, avoid using explicit for loop.

```
z = np.dot(w, x)
```

Using Vectorization in feed-forward, to compute $A$ by:

$z^{(i)} = w^\top x^{(i)}+b$

$Z = \sum{z^{(i)}}$

$A = \sigma(Z)$

We can have:

$Z=[z^{(1)}, z^{(2)}, ..., z^{(m)}] = w^\top + [b, b, ..., b]$

```
# +b(Integer) here is by broadcasting technique in numpy: b -> [b, b, ...]
Z = np.dot(w.T, X) + b
```

Using Vectorization in back-propagagtion, to compute:

$\partial z = A - Y$

$\partial w = \frac{1}{m}X\partial z^\top$

$\partial b = \frac{1}{m}\sum\partial z$

$w:=w-\alpha \partial w$

$b:=b-\alpha \partial b$

#### Linear Algebra Terms

##### Vector dot product

$a \cdot b = \sum\limits_{i=1}^{n}a_ib_i =ab^\top$

$[1, 3, -5]\cdot[4, -2, -1] = (1\times4)+(3\times-2)+(-5\times-1)=3$

$[1, 3, -5]\cdot[4, -2, -1]=\begin{bmatrix}1 &3 &-5\end{bmatrix}\begin{bmatrix}

4

\-2

\1

\end{bmatrix}=3$

```
c = a.dot(b)
c = a @ b
```

##### Vector norms

For a vector $x = [x_1, x_2, ..., x_m]$

Reference: