๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๊ณต๋ถ€๊ธฐ๋ก/Python

NumPy ๋งŒ์œผ๋กœ 2-Layer Neural Network์™€ Backpropagation ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ

by kaizen_bh 2025. 9. 18.

 

 

 

1. ๋“ค์–ด๊ฐ€๋ฉฐ: ์™œ ์ง์ ‘ ๊ตฌํ˜„ํ•˜๋Š”๊ฐ€?

 

๋”ฅ๋Ÿฌ๋‹์„ ๋ฐฐ์šฐ๋Š” ๊ณผ์ •์—์„œ ์šฐ๋ฆฌ๋Š” 10์— 10์€ TensorFlow๋‚˜ PyTorch ๊ฐ™์€ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค

๋ฌผ๋ก  ์ด๋“ค ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๊ฐ•๋ ฅํ•˜๊ณ  ํšจ์œจ์ ์ด์ง€๋งŒ, ์—ฐ์‚ฐ ๊ณผ์ •์ด ๋‚ด๋ถ€์ ์œผ๋กœ ๊ฐ์ถฐ์ ธ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹ ๊ฒฝ๋ง์ด ์‹ค์ œ๋กœ ๋ฌด์—‡์„ ํ•˜๋Š”์ง€ ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ต๊ธฐ๋„ ํ•˜๋‹ค

 

๊ทธ๋ž˜์„œ ์ด๋ฒˆ ์‹ค์Šต์—์„œ๋Š” NumPy๋งŒ ์‚ฌ์šฉํ•˜์—ฌ 2-layer Neural Network๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•˜๊ณ , Forward Pass์™€ Backpropagation์„ ์ฐจ๊ทผ์ฐจ๊ทผ ์‚ดํŽด๋ณด์•˜๋‹ค

ํ•ต์‹ฌ ๋ชฉํ‘œ:  “์ฝ”๋“œ๊ฐ€ ๋Œ์•„๊ฐ€๋Š” ์ด์œ ๋ฅผ ์ดํ•ดํ•˜๊ณ , ๋ชจ๋“  ์—ฐ์‚ฐ๊ณผ ๋ฏธ๋ถ„ ๊ณผ์ •์ด ๋ˆˆ์— ๋ณด์ด๋„๋ก ํ•™์Šต”

 

 

 

 


 

 

 

 

 

2. ๋ฐ์ดํ„ฐ ์ค€๋น„: MNIST ์†๊ธ€์”จ ์ด๋ฏธ์ง€

 

๋ฐ์ดํ„ฐ ํŠน์ง•

  • 28x28 ํ”ฝ์…€ ํ‘๋ฐฑ ์ด๋ฏธ์ง€
  • Flattenํ•˜์—ฌ 784์ฐจ์› ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜
  • 0~9 ์ˆซ์ž ์ด 10๊ฐœ ํด๋ž˜์Šค
  • ๋ผ๋ฒจ์€ ์›-ํ•ซ ์ธ์ฝ”๋”ฉ(One-Hot Encoding)
import numpy as np
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 784) / 255.0
X_test = X_test.reshape(-1, 784) / 255.0

Y_train = to_categorical(y_train, 10)
Y_test = to_categorical(y_test, 10)

 

  • ์›๋ž˜ X_train์˜ shape๋Š” (60000, 28, 28)
  • reshape๋กœ 0์ฐจ์›์€ -1๋กœ ๋‘๊ณ  ๋’ค์˜ 2-3์ฐจ์› 28x28 = 784๋กœ ์ฃผ์–ด์„œ 3์ฐจ์›์„ 2์ฐจ์›์œผ๋กœ ์ถ•์†Œํ•ด์ค€๋‹ค
  • ์›๋ž˜ Y_train์€ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๋งŒํผ ์ •๋‹ต๊ฐ’์ด ํ•œ์ค„๋กœ ์žˆ๋Š”, (60000,) ํ˜•ํƒœ์˜ 1์ฐจ์› ๋ฒกํ„ฐ์ด๋‹ค
  • ๊ทธ๋Ÿฌ๋‚˜ to_categorical์„ ์‚ฌ์šฉํ•ด์ฃผ๋ฉด ์ •๋‹ต๊ฐ’์ด ์žˆ๋Š” ์ธ๋ฑ์Šค๋ฅผ 1๋กœ ํ‘œ๊ธฐํ•ด์ฃผ๊ณ  ๋‚˜๋จธ์ง€๋Š” 0์œผ๋กœ ํ‘œ๊ธฐํ•˜๋Š”, ์›ํ•ซ ์ธ์ฝ”๋”ฉ ์ฒ˜๋ฆฌ๋ฅผ ํ•ด์ค€๋‹ค
    • Y_train[0] : np.uint8(5), ์ˆซ์ž 5๊ฐ€ ์ •๋‹ต
    • to_categorical(Y_train, 10)[0] : array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.])

 

ํฌ์ธํŠธ

  • ์ž…๋ ฅ๊ฐ’ ์ •๊ทœํ™”(0~1)
  • ์›-ํ•ซ ๋ผ๋ฒจ ์ค€๋น„: Cross-Entropy ๊ณ„์‚ฐ์— ํ•„์š”

 

 

 

 

 

3. ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ ์„ค๊ณ„

์ด๋ฒˆ ์‹ ๊ฒฝ๋ง์€ 2๊ฐœ์˜ ๋ ˆ์ด์–ด๋ฅผ ๊ฐ€์ง€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ตฌ์กฐ์ด๋‹ค:

 
Input(784) -> Hidden Layer(100, Sigmoid or ReLU) -> Output Layer(10, Softmax)
  • ์ž…๋ ฅ์ธต: 28x28 ์ด๋ฏธ์ง€๋ฅผ 784์ฐจ์› ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜
  • ํžˆ๋“ ์ธต: 100๊ฐœ ๋…ธ๋“œ, ํ™œ์„ฑํ™” ํ•จ์ˆ˜ Sigmoid ๋˜๋Š” ReLU
  • ์ถœ๋ ฅ์ธต: 10๊ฐœ ๋…ธ๋“œ, Softmax๋กœ ํด๋ž˜์Šค ํ™•๋ฅ  ๊ณ„์‚ฐ

 

 

 

 

 

 

4. Weight ์ดˆ๊ธฐํ™” ์ „๋žต: Xavier vs He

 

Xavier Initialization

  • Sigmoid๋‚˜ Tanh์— ์ ํ•ฉ

He Initialization

  • ReLU ๊ณ„์—ด ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ์ ํ•ฉ

 

์›จ์ดํŠธ ์ดˆ๊ธฐํ™”์™€ ๊ด€๋ จ๋œ ๊ธ€์„ ์ด์ „์— ์ •๋ฆฌํ•œ ์ ์ด ์žˆ๋‹ค

 

ํ˜ํŽœํ•˜์ž„์˜ Easy! ๋”ฅ๋Ÿฌ๋‹ - ๊ธฐ์šธ๊ธฐ ๋ฌธ์ œ๋ถ€ํ„ฐ ๋ฐฐ์น˜ & ๋ ˆ์ด์–ด ์ •๊ทœํ™”๊นŒ์ง€

 

ํ˜ํŽœํ•˜์ž„์˜ Easy! ๋”ฅ๋Ÿฌ๋‹ - ๊ธฐ์šธ๊ธฐ ๋ฌธ์ œ๋ถ€ํ„ฐ ๋ฐฐ์น˜ & ๋ ˆ์ด์–ด ์ •๊ทœํ™”๊นŒ์ง€

Easy! ๋”ฅ๋Ÿฌ๋‹ใ€ŽEasy! ๋”ฅ๋Ÿฌ๋‹ใ€์€ ๋”ฅ๋Ÿฌ๋‹์„ ์ฒ˜์Œ ์ ‘ํ•˜๋Š” ๋…์ž๋“ค์„ ์œ„ํ•œ ํ•„์ˆ˜ ๊ฐ€์ด๋“œ๋กœ, ์ธ๊ณต์ง€๋Šฅ์˜ ๊ธฐ์ดˆ ๊ฐœ๋…๋ถ€ํ„ฐ CNN, RNN ๋“ฑ ๋”ฅ๋Ÿฌ๋‹์˜ ์ฃผ์š” ์ฃผ์ œ๋ฅผ ํญ๋„“๊ฒŒ ๋‹ค๋ฃจ๊ณ  ์žˆ๋‹ค. KAIST ๋ฐ•์‚ฌ์ด์ž ์œ ํŠœ๋ฒ„๋กœ ํ™œ๋™

bh-kaizen.tistory.com

 

 

ํ˜ํŽœํ•˜์ž„ - AI DEEP DIVE

 

์ˆ˜์‹ ๊ทธ๋Œ€๋กœ ๊ตฌํ˜„ํ•ด๋ณด๋ฉด ์•„๋ž˜์˜ ์ฝ”๋“œ์™€ ๊ฐ™๋‹ค

def init_params(input_dim, hidden_dim, output_dim, method="xavier"):
    if method == "xavier":
        limit1 = np.sqrt(6 / (input_dim + hidden_dim))
        limit2 = np.sqrt(6 / (hidden_dim + output_dim))
    elif method == "he":
        limit1 = np.sqrt(6 / input_dim)
        limit2 = np.sqrt(6 / hidden_dim)
    W1 = np.random.uniform(-limit1, limit1, (input_dim, hidden_dim)) # shape : (input_dim, num_hiddens) == (784, 100)
    b1 = np.zeros(hidden_dim) # shape : (num_hiddens,) == (100,)
    W2 = np.random.uniform(-limit2, limit2, (hidden_dim, output_dim)) # shape : (num_hiddens, num_classes) == (100, 10)
    b2 = np.zeros(output_dim) # shape : (num_classes,) == (10,)
    return {"W1": W1, "b1": b1, "W2": W2, "b2": b2}

 

์ดˆ๊ธฐํ™”์— ๋”ฐ๋ผ ํ•™์Šต ์†๋„์™€ ์•ˆ์ •์„ฑ์ด ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง„๋‹ค

 

 

 

5. Forward Pass ๊ตฌํ˜„

 

ํ•ต์‹ฌ ์ˆ˜์‹

 

$$ z_{1} = XW_{1} + b_{1} $$

$$ a_1 = \text{activation}(z_1) $$

$$ z_2 = a_1W_2 + b_2 $$

$$ y = \text{softmax}(z_2) $$

 

 

์ฃผ์˜

  • ํ–‰๋ ฌ ๊ณฑ์˜ ์ˆœ์„œ์™€ shape๊ฐ€ ํ•ต์‹ฌ
  • Bias๋Š” ๊ฐ ๋ฐฐ์น˜๋งˆ๋‹ค ๋ธŒ๋กœ๋“œ์บ์ŠคํŒ…๋จ
  • Sigmoid๋Š” 0~1 ๋ฒ”์œ„, ReLU๋Š” 0~๋ฌดํ•œ

์ถœ๋ ฅ๊ฐ’ ํ™•์ธ

  • Softmax ๊ฒฐ๊ณผ: ๊ฐ ์ƒ˜ํ”Œ๋ณ„ 10์ฐจ์› ํ™•๋ฅ  ๋ฒกํ„ฐ
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

def forward(X, params):
    z1 = X @ params["W1"] + params["b1"] # (32, 784) x (784, 100) => (32, 100) + (100,)
    a1 = sigmoid(z1) # (32, 100)
    z2 = a1 @ params["W2"] + params["b2"] # (32, 100) x (100, 10) => (32, 10) + (10,)
    y_hat = softmax(z2) # (32, 10)
    cache = {"X": X, "z1": z1, "a1": a1, "z2": z2, "y_hat": y_hat}
    return y_hat, cache

 

 

์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด์˜ค๋Š” X์™€ ๊ฐ€์ค‘์น˜์˜ ํ–‰๋ ฌ ๊ณฑ ์—ฐ์‚ฐ ์ „์— shape๊ฐ€ ์ •ํ™•ํžˆ ์ผ์น˜ํ•˜๋Š”์ง€ ์ž˜ ํ™•์ธํ•ด์•ผํ•œ๋‹ค

 

 

6. Loss ๊ณ„์‚ฐ: Cross-Entropy์™€ Softmax

 

์ˆ˜์‹

 

$$
L = - \frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})
$$

 

๋ฏธ๋ถ„

 

Softmax + Cross-Entropy์˜ ์กฐํ•ฉ์—์„œ ๋ฏธ๋ถ„:

  • ๋ฏธ๋ถ„๊ณผ์ •์„ ๊ฑฐ์น˜๊ณ  ๋‚˜๋ฉด ์•„๋ž˜์ฒ˜๋Ÿผ ์˜ˆ์ธก ํ™•๋ฅ ์—์„œ ์‹ค์ œ ๋ผ๋ฒจ์„ ๋นผ๋Š” ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ๋กœ ๋‚˜์˜จ๋‹ค


$$
\frac{\partial L}{\partial z_i} = \hat{y}_i - y_i
$$

 

 

 

 

7. Backpropagation ๊ตฌํ˜„

 

์ดํ•ดํ•˜๋Š”๋ฐ ๊ฐ€์žฅ ์˜ค๋ž˜ ๊ฑธ๋ ธ๋˜ ๋ถ€๋ถ„

 

$ z_{1} = XW_{1} + b_{1} $

$ a_1 = \text{activation}(z_1) $

$ z_2 = a_1W_2 + b_2 $

$ y = \text{softmax}(z_2) $

 

์œ„์˜ Forward ์— ์žˆ๋Š” ์ˆ˜์‹๋“ค ๊ทธ๋Œ€๋กœ ์•„๋ž˜์—์„œ ์œ„๋กœ ํŽธ๋ฏธ๋ถ„์„ ํ•ด์ฃผ๋ฉด์„œ ์˜ฌ๋ผ๊ฐ€๋ฉด ๋œ๋‹ค

Loss → z2 → W2, b2 → a1 → z1 → W1, b1 ์ˆœ์œผ๋กœ ์ง„ํ–‰ํ•œ๋‹ค

 

์ฒด์ธ๋ฃฐ ์ ์šฉ

 

์ถœ๋ ฅ์ธต:

$ y = \text{softmax}(z_2) $

  • ์˜ˆ์ธก๊ฐ’ y_hat์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์€ ๋ฐ”๋กœ ์œ„์˜ Softmax + Cross-Entropy์˜ ์กฐํ•ฉ์—์„œ ๋ฏธ๋ถ„์„ ์ฐธ๊ณ ํ•˜๋ฉด ๋œ๋‹ค
  • ๊ทธ๋Ÿผ ์˜ˆ์ธก๊ฐ’ - ์‹ค์ œ๊ฐ’์ด๋ผ๋Š” ๊ฐ„๋‹จํ•œ ์‹์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค
  • ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋ฅผ 32์ด๋ผ ๊ฐ€์ •, ๋‚˜์˜ค๋Š” shape๋Š” (32, 10)
dl_dz2 = y_hat - Y

 

 

ํžˆ๋“ ์ธต:

$ z_2 = a_1W_2 + b_2 $

  • ํ•ด๋‹น ์‹์— ์žˆ๋Š” a1, W2, b2์— ๋Œ€ํ•ด์„œ ํŽธ๋ฏธ๋ถ„์„ ํ†ตํ•ด ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค
  • ์ด ๋•Œ ์œ„์˜ ์ถœ๋ ฅ์ธต์—์„œ ๊ตฌํ•œ ๊ธฐ์šธ๊ธฐ์™€ ํ–‰๋ ฌ๊ณฑ์ด ๋˜๊ธฐ์— ์ˆœ์„œ์— ์ฃผ์˜ํ•  ๊ฒƒ
  • ์ˆœ์„œ๋Š” ํŽธ๋ฏธ๋ถ„์„ ํ–ˆ์„ ๋•Œ ์–ด๋””๊ฐ€ ๋‚จ๋Š”์ง€๋ฅผ ์ž˜ ์‚ดํŽด๋ณด๋ฉด ๋œ๋‹ค
  • ์˜ˆ๋ฅผ ๋“ค์–ด dl_dW2 ๋Š” W2์— ๋Œ€ํ•ด ํŽธ๋ฏธ๋ถ„ ํ–ˆ์„ ๋•Œ a1์ด ์•ž์— ๋‚จ๊ณ  ๋’ค์— ์ฒด์ธ๋ฃฐ๋กœ ์œ„์—์„œ ๊ตฌํ•œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ณฑํ•ด์ง€๊ธฐ์— 
    dl_dW2 = a1.T @ dl_dz2 ์ด๋Ÿฐ ์‹์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค

Weight/ํŽธํ–ฅ ๊ธฐ์šธ๊ธฐ:

dl_dW2 = a1.T @ dl_dz2
dl_db2 = np.sum(dl_dz2, axis=0)
dl_da1 = dl_dz2 @ W2.T

 

  • ์œ„์—์„œ ๋‚˜์˜จ a1์˜ shape๋Š” (32, 100)
  • dl_dz2์˜ shape๋Š” (32,10)
  • (32, 100) x (32, 10)  : shape ๋ถˆ์ผ์น˜ โŒ ๋”ฐ๋ผ์„œ ์•ž์˜ a1์„ ํŠธ๋žœ์Šคํฌ์ฆˆ ์ทจํ•ด์ค€๋‹ค
  • (100, 32) x (32, 10) โœ…

 

 

ํ™œ์„ฑํ™” ํ•จ์ˆ˜:

$ a_1 = \text{sigmoid}(z_1) $

 

Sigmoid์˜ ๊ฒฝ์šฐ

  • ์ด์ „ ๊ธ€์—์„œ ๋ณด์•˜๋“ฏ์ด ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋ฅผ ๋ฏธ๋ถ„ํ•˜๋ฉด sigmoid(x) * (1 - sigmoid(x)) ๋ผ๋Š” ํ˜•ํƒœ๊ฐ€ ๋‚˜์˜จ๋‹ค
  • a1์€ ์‹œ๊ทธ๋ชจ์ด๋“œ๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’. ๋”ฐ๋ผ์„œ ํŽธ๋ฏธ๋ถ„์‹œ a1์„ ์‚ฌ์šฉํ•œ๋‹ค
  • ์œ„์˜ ์‹œ๊ทธ๋ชจ์ด๋“œ ์‹์„ z1์— ๋Œ€ํ•ด ํŽธ๋ฏธ๋ถ„ํ•˜๋ฉด ์ฒด์ธ๋ฃฐ์— ์˜ํ•ด ์ด์ „ ๊ธฐ์šธ๊ธฐ dl_da1๊ณผ ์‹œ๊ทธ๋ชจ์ด๋“œ ๋ฏธ๋ถ„, sigmoid(x) * (1 - sigmoid(x)) ์ด ๊ณฑํ•ด์ง„ ํ˜•ํƒœ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค
dl_dz1 = dl_da1 * a1 * (1 - a1)

 

 

 

ReLU์˜ ๊ฒฝ์šฐ

  • ReLU๋Š” ์–‘์ˆ˜๋Š” y = x, 0๊ณผ ์Œ์ˆ˜๋Š” 0์œผ๋กœ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ์ด์ „ ๊ธฐ์šธ๊ธฐ๊ฐ’์ด ์Œ์ˆ˜๋ƒ ์–‘์ˆ˜๋ƒ์— ๋”ฐ๋ผ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์ „๋‹ฌ๋˜๊ฑฐ๋‚˜ 0์œผ๋กœ ์ „๋‹ฌ๋˜์ง€ ์•Š๋Š”๋‹ค
  • ์–‘์ˆ˜๋ผ๋ฉด y = x์˜ ๊ธฐ์šธ๊ธฐ๋Š” 1, ์ด์ „ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ทธ๋Œ€๋กœ ์ „๋‹ฌ๋˜๋ฉฐ 0์ดํ•˜์ผ ๊ฒฝ์šฐ 0์œผ๋กœ ๋งŒ๋“ ๋‹ค
  • z1 > 0 ์ด๋ผ๋Š” ์กฐ๊ฑด์—์„œ bool๋กœ 0, 1์„ ๊ณฑํ•˜์—ฌ ํ•„ํ„ฐ๋ง์„ ํ•ด์ฃผ๋Š” ์‹์ด๋‹ค
dl_dz1 = dl_da1 * (z1 > 0)

 

ํฌ์ธํŠธ

  • ๊ฐ ํ–‰๋ ฌ ๊ณฑ๊ณผ ์ฐจ์› ํ™•์ธ
  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ๋ฏธ๋ถ„ ์ ์šฉ ํ•„์ˆ˜

 

์œ„์˜ ๋‚ด์šฉ๋“ค์„ ์ •๋ฆฌํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค

def backward(Y, cache, params):
    m = Y.shape[0]
    dz2 = (cache["y_hat"] - Y) / m
    dW2 = cache["a1"].T @ dz2
    db2 = np.sum(dz2, axis=0)

    da1 = dz2 @ params["W2"].T
    dz1 = da1 * cache["a1"] * (1 - cache["a1"])  # sigmoid ๋ฏธ๋ถ„
    dW1 = cache["X"].T @ dz1
    db1 = np.sum(dz1, axis=0)

    return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}

 

 

 

 

 

 

 

8. Gradient Descent๋ฅผ ํ†ตํ•œ ํ•™์Šต

Mini-batch ํ•™์Šต

  • Batch size: 32
  • ํ•™์Šต๋ฅ : 0.001 

์—…๋ฐ์ดํŠธ

  • ์ž๋น„์— ๋˜๋Š” ์นด์ด๋ฐ ํ—ˆ ์ดˆ๊ธฐํ™”๋ฅผ ํ†ตํ•ด ์„ค์ •ํ•œ ๊ฐ€์ค‘์น˜์—์„œ ์—ญ์ „ํŒŒ๋ฅผ ํ†ตํ•ด ๊ฐ๊ฐ ๊ตฌํ–ˆ๋˜ ๊ธฐ์šธ๊ธฐ์— ํ•™์Šต๋ฅ ์„ ๊ณฑํ•˜์—ฌ ๋บŒ์œผ๋กœ์จ ์—…๋ฐ์ดํŠธ๋ฅผ ํ•ด์ค€๋‹ค
def update_params(params, grads, lr=0.1):
    for key in params.keys():
        params[key] -= lr * grads["d" + key]
    return params

---
self.params['W2'] = self.params['W2'] - lr * grad['dl_dW2']
self.params['b2'] = self.params['b2'] - lr * grad['dl_db2']
self.params['W1'] = self.params['W1'] - lr * grad['dl_dW1']
self.params['b1'] = self.params['b1'] - lr * grad['dl_db1']
  • ์—ฌ๊ธฐ์„œ params์˜ key๋“ค์€ ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๋ฅผ ํ•ด์„œ ์„ค์ •ํ•œ W1, W2, b1, b2

 

 

9. ๋ชจ๋ธ ๊ฒ€์ฆ (Evaluate)

 

์œ„์—์„œ ์ˆœ์ „ํŒŒ-์—ญ์ „ํŒŒ, ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ ๋ฟ ์•„๋‹ˆ๋ผ epoch๋งˆ๋‹ค ๋ชจ๋ธ์˜ ์ง€์†์ ์ธ ์„ฑ๋Šฅ ํ™•์ธ์ด ํ•„์š”ํ•˜๋‹ค

์ˆœ์ „ํŒŒ๋ฅผ ํ†ตํ•ด ์–ป์€ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’๋งŒ ์žˆ๋‹ค๋ฉด ์–ด๋А ์ •๋„์˜ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๋Š”์ง€ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋‹ค

 

def evaluate(Y, Y_hat):
    # Accuracy ๊ณ„์‚ฐ
    pred = np.argmax(Y_hat, axis=1)
    true = np.argmax(Y, axis=1)
    accuracy = np.mean(pred == true)
    return accuracy

 

์•ž์—์„œ ์šฐ๋ฆฌ๋Š” to_categorical ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•ด ๋ผ๋ฒจ๊ฐ’๋“ค์„ 10๊ฐœ ์ค‘ ์œ„์น˜ํ•œ ์ธ๋ฑ์Šค๋ฅผ ํ‘œ์‹œํ•˜๋Š” ์›-ํ•ซ ์ธ์ฝ”๋”ฉ์„ ํ•ด์ฃผ์—ˆ๋‹ค

 

1. np.argmax๋กœ ์˜ˆ์ธก๊ฐ’ ๋ณ€ํ™˜

Y_hat์€ ๋ชจ๋ธ ์ถœ๋ ฅ์œผ๋กœ, softmax๋ฅผ ํ†ตํ•ด ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ ํ˜•ํƒœ์ด๋‹ค
์˜ˆ๋ฅผ ๋“ค์–ด, ํ•œ ์ƒ˜ํ”Œ์˜ Y_hat์ด ์•„๋ž˜์™€ ๊ฐ™๋‹ค๋ฉด:

y_hat_sample = [0.05, 0.01, 0.88, 0.03, 0.01, 0.01, 0.01, 0.0, 0.0, 0.0]
  • softmax ํ™•๋ฅ ์—์„œ ๊ฐ€์žฅ ๋†’์€ ๊ฐ’(0.88)์˜ ์ธ๋ฑ์Šค๋ฅผ ์˜ˆ์ธก ํด๋ž˜์Šค๋ผ๊ณ  ํŒ๋‹จ
  • np.argmax(y_hat_sample) → 2 (ํด๋ž˜์Šค 2๋ฅผ ์˜ˆ์ธก)

2. ์ •๋‹ต ๋ผ๋ฒจ ๋ณ€ํ™˜

Y๋Š” ์›-ํ•ซ ์ธ์ฝ”๋”ฉ ํ˜•ํƒœ๋กœ ๋˜์–ด ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํด๋ž˜์Šค 2๊ฐ€ ์ •๋‹ต์ด๋ฉด:

Y_sample = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
  • np.argmax(Y_sample) → 2 (์ •๋‹ต ํด๋ž˜์Šค)

3. ๋น„๊ต ๋ฐ ํ‰๊ท 

pred == true
  • ์˜ˆ์ธก ํด๋ž˜์Šค์™€ ์‹ค์ œ ํด๋ž˜์Šค๊ฐ€ ์ผ์น˜ํ•˜๋ฉด True, ์•„๋‹ˆ๋ฉด False
  • ์—ฌ๋Ÿฌ ์ƒ˜ํ”Œ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ np.mean()๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ „์ฒด ์ผ์น˜ ๋น„์œจ = accuracy

์˜ˆ์‹œ:

pred true pred == true
2 2 True
0 0 True
4 3 False
  • Accuracy = (2/3) = 0.6667
 

 

 

 

 

 


 

 

 

 

๋งˆ๋ฌด๋ฆฌ

์ด๋ฒˆ ๋‚ด์šฉ์€ ์‚ฌ์‹ค์ƒ ์ด์ „์— ์ •๋ฆฌํ•œ ๊ธ€, "MSE Loss์™€ ์‹œ๊ทธ๋ชจ์ด๋“œ์—์„œ์˜ ์—ญ์ „ํŒŒ ์ดํ•ดํ•˜๊ธฐ" ์™€ ๋น„์Šทํ•œ ๋‚ด์šฉ์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ดํ•ดํ•˜๋Š”๋ฐ ์˜ค๋žœ ์‹œ๊ฐ„์ด ๊ฑธ๋ ธ๋‹ค

์ด์ „์—๋Š” ์—ญ์ „ํŒŒ ์ผ๋ถ€๋งŒ ๋‹ค๋ฃจ์—ˆ๋‹ค๋ฉด ์ด๋ฒˆ์—๋Š” ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์„œ ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ๋ถ€ํ„ฐ ์ง์ ‘ ์„ค์ •ํ•˜๊ณ  ์ด๊ฑธ ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ๋ฅผ ํ•  ๋•Œ ์–ด๋–ค ์š”์†Œ๋“ค์„ ์ ์ ˆํ•˜๊ฒŒ ์‚ฌ์šฉํ•ด์•ผํ•˜๋Š”์ง€, ๋” ํ™•์žฅ๋œ ๋‚ด์šฉ์ด์—ˆ๊ธฐ์— ์• ๋ฅผ ๋จน์—ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค

 

๋’ค์˜ ๊ณต๋ถ€๋“ค์ด ๋ฐ€๋ ธ์ง€๋งŒ.. ์—ฌ๊ธฐ์„œ ๋ฐฐ์šด ๊ฒƒ๋“ค์ด ์ •๋ง ๊ธฐ๋ณธ์ ์ด๊ณ  ๊ทผ๋ณธ์ด ๋˜๋Š” ๋‚ด์šฉ์ด๊ธฐ์— ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋”๋ผ๋„ ์ตœ๋Œ€ํ•œ ์ดํ•ดํ•˜๊ณ  ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•˜์—ฌ ์ •๋ฆฌํ•ด๋ณด์•˜๋‹ค

 

 

 

 

 

 

 

 

์ฐธ๊ณ ์ž๋ฃŒ

 

https://humankind.tistory.com/59

 

[๋ฐ‘๋ฐ”๋‹ฅ๋”ฅ๋Ÿฌ๋‹] 10. ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•(backpropagation) ๊ตฌํ˜„(1)

๋ณธ ๊ฒŒ์‹œ๊ธ€์€ ํ•œ๋น›๋ฏธ๋””์–ด ใ€Ž๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹, ์‚ฌ์ดํ†  ๊ณ ํ‚ค, 2020ใ€์˜ ๋‚ด์šฉ์„ ์ฐธ์กฐํ•˜์˜€์Œ์„ ๋ฐํž™๋‹ˆ๋‹ค. ์ง€๋‚œ ์žฅ์—์„œ๋Š” ๋ง์…ˆ ๋…ธ๋“œ์™€ ๊ณฑ์…ˆ ๋…ธ๋“œ์—์„œ์˜ ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ ์‚ด

humankind.tistory.com

 

์ด ๊ธ€ ๋‚ด์šฉ๋„ ๊ดœ์ฐฎ์•˜๋‹ค. ๋ฐ‘๋ฐ”๋”ฅ์„ ๋ณด๊ณ  ์‹ถ์ง€๋งŒ.. ๋‚ด์šฉ์ด ๋งŽ์ด ๊ฒน์น˜๊ธฐ๋„ ํ•˜๊ณ  ๋‹ค ๋ณผ ์ˆ˜๋Š” ์—†์–ด์„œ ํŒจ์Šค. ๋Œ€์‹  ํ˜ํŽœ๋‹˜ Easy ๋”ฅ๋Ÿฌ๋‹ ์ฑ… ๋ณ‘ํ–‰ํ•˜๋ฉด์„œ ์ถ”์„ ์ „๊นŒ์ง€๋Š” ๊ผญ ๋‹ค๋ณด์ž