1. 오차역전파 알고리즘

  • 학습 데이터로 정방향(forward) 연산을 통해 손실함수 값(loss)을 구함
  • 각 layer별로 역전파 학습을 위해 중간값을 저장
  • 손실함수를 학습 파라미터(가중치, 편향)으로 미분하여,
    마지막 layer로부터 앞으로 하나씩 연쇄법칙을 이용하여 미분 각 layer 를 통과할 때마다 저장된 값을 이용
  • 오류(error)를 전달하면서 학습 파라미터를 조금씩 갱신

 

  - 오차역전파 학습의 특징

  • 손실함수를 통한 평가를 한 번만 하고, 연쇄 법칙을 이용한 미분을 활용하기 때문에 학습 소요시간이 매우 단축
  • 미분을 위한 중간값을 모두 저장하기 때문에 메모리를 많이 사용

 

  - 신경망 학습에 있어서 미분 가능성의 중요성

  • 경사하강법(Gradient Descent)에서 손실 함수(cost function)의 최소값 즉, 최적값을 찾기 위한 방법으로 미분 활용
  • 미분을 통해 손실함수의 학습 매개변수(trainable parameter)를 갱신하여 모델의 가중치의 최적값을 찾는 과정

https://www.pinterest.co.kr/pin/424816177350692379/

 

2. 합성 함수의 미분(연쇄법칙, chain rule)

$$ \frac{d}{dx}[f(g(x))]=f'(g(x))g'(x) $$

  • 여러 개 연속으로 사용가능
    \( \frac{\partial f}{\partial x}=\frac{\partial f}{\partial u} \times \frac{\partial u}{\partial m} \times \frac{\partial m}{\partial n} \times \cdots \times \frac{\partial l}{\partial k} \times \frac{\partial k}{\partial g} \times \frac{\partial g}{\partial x} \)
  • 각각에 대해 편미분 적용 가능

https://www.freecodecamp.org/news/demystifying-gradient-descent-and-backpropagation-via-logistic-regression-based-image-classification-9b5526c2ed46/

  • 오차역전파의 직관적 이해
    • 학습을 진행하면서, 즉 손실함수의 최소값을 찾아가는 과정에서 가중치 또는 편향의 변화에 따라 얼마나 영향을 받는지 알 수 있음

 

  - 합성함수 미분 예제

https://medium.com/spidernitt/breaking-down-neural-networks-an-intuitive-approach-to-backpropagation-3b2ff958794c

\( a=-1, \,\, b=3, \,\, c=4,\)

\( x=a+b, \,\, y=b+c, \,\, f=x*y \)일때

 

\( \begin{matrix} \frac{\partial f}{\partial x} &=& y+x\frac{\partial y}{\partial x} \\ &=& (b+c)+(a+b)\times 0 \\ &=& 7 \end{matrix} \)

 

\( \begin{matrix} \frac{\partial f}{\partial y} &=& x+\frac{\partial x}{\partial y}y \\ &=& (a+b)+0 \times (b+c) \\ &=& 2 \end{matrix} \)

 

\( \begin{matrix} \frac{\partial x}{\partial a} &=& 1+a\frac{\partial b}{\partial a} \\ &=& 1 \end{matrix} \)

 

\( \begin{matrix} \frac{\partial y}{\partial c} &=& \frac{\partial b}{\partial c}+1 \\ &=& 1 \end{matrix} \)

 

\( \begin{matrix} \frac{\partial f}{\partial a} &=& \frac{\partial f}{\partial x} \times \frac{\partial x}{\partial a} \\ &=& y \times 1 \\ &=& 7 \times 1 = 7 \end{matrix} \)

 

\( \begin{matrix} \frac{\partial f}{\partial b} &=& \frac{\partial x}{\partial b}y+x\frac{\partial y}{\partial b} \\ &=& 1 \times 7+2 \times 1 = 9 \end{matrix} \)

 

  - 덧셈, 곱셈 계층의 역전파

  • 위 예제를 통해 아래 사항을 알 수 있음
    1. \( z=x+y \) 일 때,
      \( \frac {\partial z}{\partial x}=1, \frac {\partial z}{\partial y}=1 \)
    2. \( t = xy \) 일 때,
      \( \frac {\partial t}{\partial x}=y, \frac {\partial t}{\partial y}=x \)
# 곱셈 연산
class Mul():

    def __init__(self):
        self.x = None
        self.y = None

    def forward(self, x, y):
        self.x = x
        self.y = y
        result = x*y
        return result
    
    def backward(self, dresult):
        dx = dresult * self.y
        dy = dresult * self.x
        return dx, dy

# 덧셈 연산
class Add():
    
    def __init__(self):
        self.x = None
        self.y = None

    def forward(self, x, y):
        self.x = x
        self.y = y
        result = x + y
        return result
    
    def backward(self, dresult):
        dx = dresult * 1
        dy = dresult * 1
        return dx, dy

a, b, c = -1, 3, 4
x = Add()
y = Add()
f = Mul()
# forward
x_result = x.forward(a, b)
y_result = y.forward(b, c)

print(x_result)
print(y_result)
print(f.forward(x_result, y_result))

# 출력 결과
2
7
14
# backward
dresult = 1
dx_mul, dy_mul = f.backward(dresult)

da_add, db_add_1 = x.backward(dx_mul)
db_add_2, dc_add = y.backward(dy_mul)

print(dx_mul, dy_mul)
print(da_add)
print(db_add_1 + db_add_2)
print(dc_add)

# 출력 결과
7 2
7
9
2

https://medium.com/spidernitt/breaking-down-neural-networks-an-intuitive-approach-to-backpropagation-3b2ff958794c

 

3. 활성화 함수에서의 역전파

  - 시그모이드(sigmoid) 함수

https://www.geeksforgeeks.org/implement-sigmoid-function-using-numpy/

  • 수식
    \( y=\frac{1}{1+e^{-x}} \) 일 때,
    \( \begin{matrix} y' &=& \left ( \frac{1}{1+e^{-x}} \right )' \\
    &=& \frac{-1}{(1+e^{-x})^{2}} \times (-e^{-x}) \\
    &=& \frac{1}{1+e^{-x}} \times \frac{e^{-x}}{1+e^{-x}} \\
    &=& \frac{1}{1+e^{-x}} \times \left ( 1-\frac{1}{1+e^{-x}} \right ) \\
    &=& y(1-y) \end{matrix} \)
class Sigmoid:
    def __init__(self):
        self.out = None

    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        return out
    
    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.dout
        return dx

 

   - ReLU 함수

https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/

  • 수식
    /\( y=\left \{ \begin{matrix} x & (x \geq 0) \\ 0 & (x<0) \end{matrix} \right. \)
class ReLU():
    def __init__(self):
        self.out = None
    
    def forward(self, x):
        self.mask = (x < 0)
        out = x.copy()
        # 마스킹 연산자(x<0일때는 전부 0으로 넣어버리기)
        out[x<0] = 0
        return out
    
    def backward(self, dout):
        dout[self.mask] = 0
        dx =dout
        return dx

 

4. 행렬 연산에 대한 역전파

$$ Y=X \bullet W+B $$

  - 순전파(forward)

  • 형상(shape)을 맞춰줘야함
  • 곱셈, 덧셈 계층을 합친 상태
# shape이 맞을 때
import numpy as np

X = np.random.rand(3)
W = np.random.rand(3, 2)
B = np.random.rand(2)

print(X.shape)
print(W.shape)
print(B.shape)

# 출력 결과
(3,)
(3, 2)
(2,)

Y = np.dot(X, W) + B
print(Y.shape)

# 출력 결과
(2,)
# shape이 틀릴 때
import numpy as np

X = np.random.rand(3)
W = np.random.rand(2, 2)
B = np.random.rand(2)

Y = np.dot(X, W) + B
print(Y.shape)

# 출력 결과
ValueError: shapes (3,) and (2,2) not aligned: 3 (dim 0) != 2 (dim 0)

 

  - 역전파(1)

$$ Y=X \bullet W $$

  • \( X \): (2, )
  • \( W \): (2, 3)
  • \( X \bullet W \): (3, )
  • \( \frac{\partial L}{\partial Y} \): (3, )
  • \( \frac{\partial L}{\partial X}=\frac{\partial L}{\partial Y} \bullet W^{T}, (2, ) \)
  • \( \frac{\partial L}{\partial W}=X^{T} \bullet \frac{\partial L}{\partial Y}, (2, 3) \)
# 순전파
X = np.random.rand(2)
W = np.random.rand(2, 3)
Y = np.dot(X, W)

print("X\n{}".format(X))
print("W\n{}".format(W))
print("Y\n{}".format(Y))

# 출력 결과
X
[0.82112094 0.52401537]
W
[[0.98913291 0.3114957  0.74020997]
 [0.0272213  0.29891712 0.30511339]]
Y
[0.82646212 0.41241281 0.76768601]
# 역전파
dL_dY = np.random.randn(3)
dL_dX = np.dot(dL_dY, W.T)
dL_dW = np.dot(X.reshape(-1, 1), dL_dY.reshape(1, -1))

print("dL_dY\n{}".format(dL_dY))
print("dL_dX\n{}".format(dL_dX))
print("dL_dW\n{}".format(dL_dW))

# 출력 결과
dL_dY
[ 2.14017912 -1.88100173 -0.33160328]
dL_dX
[ 1.28554159 -0.60518177]
dL_dW
[[ 1.75734588 -1.5445299  -0.2722864 ]
 [ 1.12148676 -0.98567383 -0.17376522]]

 

  - 역전파(2)

$$ Y=X \bullet W+B $$

  • X, W는 위와 동일
  • B: (3, )
  • \( \frac{\partial L}{\partial B}=\frac{\partial L}{\partial Y}, (3, ) \)
  •  
# 순전파
X = np.random.randn(2)
W = np.random.randn(2, 3)
B = np.random.randn(3)
Y = np.dot(X, W) + B
print(Y)

# 출력 결과
[1.32055282 0.71833361 1.73777915]
# 역전파
dL_dY = np.random.randn(3)
dL_dX = np.dot(dL_dY, W.T)
dL_dW = np.dot(X.reshape(-1, 1), dL_dY.reshape(1, -1))
dL_dB = dL_dY

print("dL_dY\n{}".format(dL_dY))
print("dL_dX\n{}".format(dL_dX))
print("dL_dW\n{}".format(dL_dW))
print("dL_dB\n{}".format(dL_dB))

# 출력 결과
dL_dY
[-0.00997423  0.34937897  1.55598133]
dL_dX
[ 0.9368195  -0.10629718]
dL_dW
[[ 0.00182182 -0.06381513 -0.28420473]
 [ 0.00772224 -0.2704957  -1.20466967]]
dL_dB
[-0.00997423  0.34937897  1.55598133]

 

   - 배치용 행렬 내적 계층

  • N개의 데이터에 대해,

$$ Y=X \bullet W+B $$

    • \( X \): (N, 3)
    • \( W \): (3, 2)
    • \( B \): (2, )
X = np.random.rand(4, 3)
W = np.random.rand(3, 2)
B = np.random.rand(2)

print(X.shape)
print(W.shape)
print(B.shape)

# 출력 결과
(4, 3)
(3, 2)
(2,)

print("X\n{}".format(X))
print("W\n{}".format(W))
print("B\n{}".format(B))

# 출력 결과
X
[[0.5345643  0.82120127 0.38182761]
 [0.07479261 0.99042377 0.50473867]
 [0.47142528 0.72266964 0.44472929]
 [0.16390528 0.94442809 0.78815273]]
W
[[0.90326978 0.75695534]
 [0.24771738 0.05041714]
 [0.5838499  0.60451043]]
B
[0.90558133 0.19752999]
# 순전파
Y = np.dot(X, W) + B
print("Y\n{}".format(Y))
print("Y.shape:", Y.shape)

# 출력 결과
Y
[[1.81479294 0.87439269]
 [1.51317603 0.60919879]
 [1.77007852 0.85965631]
 [1.74774615 0.84566088]]
Y.shape: (4, 2)
# 역전파
dL_dY = np.random.randn(4, 2)
dL_dX = np.dot(dL_dY, W.T)
dL_dW = np.dot(X.T, dL_dY)
dL_dB = np.sum(dL_dY, axis = 0)

print("dL_dY\n{}".format(dL_dY))
print("dL_dX\n{}".format(dL_dX))
print("dL_dW\n{}".format(dL_dW))
print("dL_dB\n{}".format(dL_dB))

# 출력 결과
dL_dY
[[-0.70142117  1.06162232]
 [-0.114932    0.16975345]
 [-1.51024593  0.19728549]
 [ 1.93432977  0.58605845]]
dL_dX
[[ 0.17002814 -0.12023025  0.23223709]
 [ 0.02468118 -0.01991217  0.0355147 ]
 [-1.2148232  -0.36416759 -0.76249579]
 [ 2.1908417   0.50871449  1.48363669]]
dL_dW
[[-0.77847203  0.76926514]
 [ 0.04558715  1.73599575]
 [ 0.52706409  1.04068005]]
dL_dB
[-0.39226933  2.01471971]

 

  • 예제 Layer 생성
class Layer():
    def __init__(self):
        self.W = np.random.randn(3, 2)
        self.b = np.random.randn(2)
        self.x = None
        self.dW = None
        self.sb = None

    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b
        return out
    
    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis = 0)
        return dx

np.random.seed(111)
layer = Layer()

# 순전파
X = np.random.rand(2, 3)
Y = layer.forward(X)

print(X)

# 출력 결과
[[0.23868214 0.33765619 0.99071246]
 [0.23772645 0.08119266 0.66960024]]
 
 
 # 역전파
 dout = np.random.rand(2, 2)
dout_dx = layer.backward(dout)

print(dout_dx)

# 출력 결과
[[-0.57717814  0.8894841  -1.01146255]
 [-0.5434705   0.86783399 -1.09728643]]

 

5. MNIST 분류 with 역전파

  • Module Import
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from collections import OrderedDict
  • 데이터 로드
np.random.seed(42)

mnist = tf.keras.datasets.mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

num_classes = 10
  • 데이터 전처리
X_train, X_test = X_train.reshape(-1, 28 * 28).astype(np.float32), X_test.reshape(-1, 28 * 28).astype(np.float32)

# 색깔 평탄화
X_train /= .255
X_test /= .255

# 원-핫 벡터 변환
y_train = np.eye(num_classes)[y_train]

# 확인
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

# 출력 결과
(60000, 784)
(60000, 10)
(10000, 784)
(10000,)
  • 하이퍼 파라미터
epochs = 1000
learning_rate = 1e-3
batch_size = 100
train_size=  X_train.shape[0]
  • Util Functions
def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis = 0)
        y = np.exp(x) / np.sum(np.exp(x), axis = 0)
        return y.T
    
    # 오버플로우 방지
    x = x - np.max(x)
    return np.exp(x) / np.sum(np.exp(x))

def mean_squared_error(pred_y, true_y):
    return 0.5 * np.sum((pred_y, true_y)**2)

def cross_entropy_error(pred_y, true_y):
    if pred_y.ndim == 1:
        true_y = true_y.reshape(1, true_y.size)
        pred_y = pred_y.reshape(1, pred_y.size)

    # train 데이터가 원-핫 벡터 형태면 정답 레이블의 인덱스 반환
    if true_y.size == pred_y.size:
        true_y = true_y.argmax(axis = 1)
    
    batch_size = pred_y.shape[0]
    return -np.sum(np.log(pred_y[np.arange(batch_size), true_y] + 1e-7)) / batch_size

def softmax_loss(X, true_y):
    # softmax의 결과와 원-핫 벡터의 실제 ture_y 값과 비교하여 그것에 대한 차이를 cross_entrpy_error로 return
    pred_y = softmax(X)
    return cross_entropy_error(pred_y, true_y)
  • Util Classes

  - ReLU

class ReLU():
    def __init__(self):
        self.out = None
    
    def forward(self, x):
        self.mask = (x < 0)
        out = x.copy()
        out[x<0] = 0
        return out
    
    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout
        return dx

  - Sigmoid

class Sigmoid():
    def __init__(self):
        self.out = None

    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        return out
    
    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.dout
        return dx

  - Layer

class Layer():
    def __init__(self, W, b):
        self.W = W
        self.b = B

        self.x = None
        self.origin_x_shape = None

        self.dL_dW = None
        self.dL_db = None
    
    def forward(self, x):
        self.origin_x_shape = x.shape

        x = x.reshape(x.shape[0], -1)
        self.x = x
        out = np.dot(self.x, self.W) + self.b

        return out
    
    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dL_dW = np.dot(self.x.T, dout)
        self.dL_db = np.sum(dout, axis = 0)
        dx = dx.reshpae(self.origin_x_shape)
        return dx

  - SoftMax

class Softmax():
    def __init__(self):
        self.loss = None
        self.y = None
        self.x = None
    
    def forward(self, x, t):
        self.t = t
        self.y = softmax(x)
        self.loss = cross_entropy_error(self.y, self.t)

        return self.loss
    
    def backward(self, dout = 1):
        batch_size = self.t.shape[0]

        # 정답 레이블이 원-핫 인코딩 형태일 때
        if self.t.size == self.y.size:
            dx = (self.y - self.t) / batch_size
        else:
            dx = self.y.copy()
            dx[np.arange(batch_size), self.t] -= 1
            dx = dx / batch_size
        
        return dx

 

  • 모델 생성 및 학습
class MyModel():
    def __init__(self, input_size, hidden_size_list, output_size, activation = 'relu'):
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_size_list = hidden_size_list
        self.hidden_layer_num = len(hidden_size_list)
        self.params = {}

        self.__init_weights(activation)

        activation_layer = {'sigmoid': Sigmoid, 'relu': ReLU}
        self.layers = OrderedDict()
        for idx in range(1, self.hidden_layer_num + 1):
            self.layers['Layer' + str(idx)] = Layer(self.params['W' + str(idx)], self.params['b' + str(idx)])
            self.layers['Activation_function' + str(idx)] = activation_layer[activation]()
        
        idx = self.hidden_layer_num + 1

        self.layers['Layer' + str(idx)] = Layer(self.params['W' + str(idx)], self.params['b' + str(idx)])

        self.last_layer = Softmax()

    
    def __init_weights(self, activation):
        weight_std = None
        # 전체 사이즈 리스트
        all_size_list = [self.input_size] + self.hidden_size_list + [self.output_size]
        for idx in range(1, len(all_size_list)):
            if activation.lower() == 'relu':
                weight_std = np.sqrt(2.0 / self.input_size)
            elif activation.lower() =='sigmoid':
                weight_std = np.sqrt(1.0 / self.input_size)
            
            self.params['W' + str(idx)] = weight_std * np.random.randn(all_size_list[idx-1], all_size_list[idx])
            self.params['b' + str(idx)] = np.random.randn(all_size_list[idx])

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        return X
    
    def loss(self, x, true_y):
        pred_y = self.predict(x)

        return self.last_layer.forward(pred_y, true_y)
    
    def accuracy(self, x, true_y):
        pred_y = self.predict(x)
        pred_y = np.argmax(pred_y, axis = 1)

        if true_y.ndim != 1:
            true_y = np.argmax(true_y, axis = 1)
        
        accuracy = np.sum(pred_y == true_y) / float(x.shape[0])
        return accuracy
    
    def gradient(self, x, t):
        self.loss(x, t)

        dout = 1
        dout = self.last_layer.backward(dout)

        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)
        
        grads = {}
        for idx in range(1, self.hidden_layer_num + 2):
            grads['W' + str(idx)] = self.layers['Layer' + str(idx)].dL_dW
            grads['b' + str(idx)] = self.layers['Layer' + str(idx)].dL_db
        return grads

model = MyModel(28*28, [100, 64, 32], 10, activation = 'relu')
# 손실값과 정확도를 저장하는 리스트 생성
train_lost_list = []
train_acc_list = []
test_acc_list = []
for epoch in range(epochs):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = X_train[batch_mask]
    y_batch = y_train[batch_mask]

    grad = model.gradient(x_batch, y_batch)

    for key in model.params.keys():
        model.params[key] -= learning_rate * grad[key]
    
    loss = model.loss(x_batch, y_batch)
    train_lost_list.append(loss)

    if epoch % 50 == 0:
        train_acc = model.accuracy(X_train, y_train)
        test_acc = model.accuracy(X_test, y_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print("Epoch: {}, Train Accuracy: {:.4f}, Test Accuracy: {:.4f}".format(epoch, train_acc, test_acc))

# 정확도 시각화
plt.plot(np.arange(1000//50), train_acc_list, 'r--', label = 'train_acc')
plt.plot(np.arange(1000//50), test_acc_list, 'b', label = 'test_acc')

plt.title('Result')
plt.xlabel(loc = 5)
plt.grid()
plt.show()

 

# 손실값 시각화
plt.plot(np.arange(1000), train_lost_list, 'green', label = 'train_loss')
plt.title('train loss')
plt.xlabel('Epochs')
plt.legend(loc = 5)
plt.grid()
plt.show()

+ Recent posts