'Python/Deep Learning' 카테고리의 글 목록 (4 Page)

1. 볼록함수(Convex Function)

어떤 지점에서 시작하더라도 최적값(손실함수가 최소로 하는 점)에 도달할 수 있음
1-D Convex Function(1차원 볼록함수)

https://www.researchgate.net/figure/A-strictly-convex-function_fig5_313821095

2-D Convex Function(2차원 볼록함수)

https://www.researchgate.net/figure/Sphere-function-D-2_fig8_275069197

2. 비볼록함수(Non-Convex Function)

비볼록함수는 시작점의 위치에 따라 다른 최적값에 도달할 수 있음
1-D Non-Convex Function

https://www.slideserve.com/betha/local-and-global-optima

2-D Non-Convex Function

https://commons.wikimedia.org/wiki/File:Non-Convex_Objective_Function.gif

3. 미분과 기울기

스칼라를 벡터로 미분한 것
$\frac{df(x)}{dx}=\underset{\triangle x \to 0}{lim}\frac{f(x+\triangle x)-f(x)}{\triangle x}$

https://ko.wikipedia.org/wiki/%EA%B8%B0%EC%9A%B8%EA%B8%B0_(%EB%B2%A1%ED%84%B0)

$\triangledown f(x)=\left ( \frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}}, \cdots, \frac{\partial f}{\partial x_{N}} \right )$
변화가 있는 지점에서는 미분값이 존재하고, 변화가 없는 지점은 미분값이 0
미분값이 클수록 변화량이 크다는 의미

4. 경사하강법의 과정

경사하강법은 한 스텝마다의 미분값에 따라 이동하는 방향을 결정
$f(x)$의 값이 변하지 않을 때까지 반복

$$ x_{n}=x_{n-1}-\eta \frac{\partial f}{\partial x} $$

$\eta$: 학습률(learning rate)
즉, 미분값이 0인 지점을 찾는 방법

https://www.kdnuggets.com/2018/06/intuitive-introduction-gradient-descent.html

- 2-D 경사하강법

https://gfycat.com/ko/angryinconsequentialdiplodocus

5. 경사하강법 구현

\(f_{1}(x)=x^{2}

# 손실함수
def f1(x):
    return x**2

# 손실함수를 미분한 식
def df_dx1(x):
    return 2*x 

# 경사하강법 구현
def gradient_descent(f, df_dx, init_x, learning_rate = 0.01, step_num = 100):
    x = init_x
    x_log, y_log = [x], [f(x)]
    for i in range(step_num):
        grad = df_dx(x)
        # 학습률에 미분한 기울기를 곱한 값만큼 x값을 변화시켜 최적값에 도달
        x -= learning_rate * grad

        x_log.append(x)
        y_log.append(f(x))

    return x_log, y_log

# 시각화
import matplotlib.pyplot as plt
import numpy as np

x_init = 5
x_log, y_log = gradient_descent(f1, df_dx1, init_x = x_init)
plt.scatter(x_log, y_log, color = 'red')

x = np.arange(-5, 5, 0.01)
plt.plot(x, f1(x))
plt.grid()
plt.show()

- 비볼록함수에서의 경사하강법

# 손실함수
def f2(x):
    return 0.01*x**4 - 0.3*x**3 - 1.0*x + 10.0

# 손실함수를 미분한 식
def df_dx2(x):
    return 0.04*x**3 - 0.9*x**2 - 1.0

# 시각화
x_init = 2
x_log, y_log = gradient_descent(f2, df_dx2, init_x = x_init)
plt.scatter(x_log, y_log, color = 'red')

x = np.arange(-5, 30, 0.01)
plt.plot(x, f2(x))
plt.xlim(-5, 30)
plt.grid()
plt.show()

6. 전역최적값 vs 지역최적값

초기값이 어디냐에 따라 전체 함수의 최솟값이 될 수도 있고, 지역적으로 최솟값일 수도 있음

https://www.kdnuggets.com/2017/06/deep-learning-local-minimum.html

$f_{3}(x)=xsin(x^{2})+1$ 그래프

# 손실함수
def f3(x):
    return x*np.sin(x**2) + 1
    
# 손실함수를 미분한 식
def df_dx3(x):
    return np.sin(x**2) + x*np.cos(x**2)*2*x
    
# 시각화
x_init1 = -0.5
x_log1, y_log1 = gradient_descent(f3, df_dx3, init_x = x_init1)
plt.scatter(x_log1, y_log1, color = 'red')

x_init1 = -0.5
x_log1, y_log1 = gradient_descent(f3, df_dx3, init_x = x_init1)
plt.scatter(x_log1, y_log1, color = 'red')

x_init2 = 1.5
x_log2, y_log2 = gradient_descent(f3, df_dx3, init_x = x_init2)
plt.scatter(x_log2, y_log2, color = 'blue')

x = np.arange(-3, 3, 0.01)
plt.plot(x, f3(x), '--')

plt.scatter(x_init1, f3(x_init1), color = 'red')
plt.text(x_init1 - 1.0, f3(x_init1) + 0.3, "x_init1 ({})".format(x_init1), fontdict = {'size': 13})
plt.scatter(x_init2, f3(x_init2), color = 'blue')
plt.text(x_init2 - 0.7, f3(x_init2) + 0.4, "x_init2 ({})".format(x_init2), fontdict = {'size': 13})

plt.grid()
plt.show()

위의 그래프에서
- '전역최적값'은 x가 -3보다 조금 큰 부분
- x가 -0.5에서 시작했을 때 x가 찾아간 '지역최적값'은 -1보다 조금 작은 부분
- x가 1.5에서 시작했을 때 x가 찾아간 '지역최적값'은 2보다 조금 부분

7. 경사하강법 구현(2)

경사하강을 진행하는 도중, 최솟값에 이르면 경사하강법을 종료하는 코드

def gradient_descent2(f ,df_dx, init_x, learning_rate = 0.01, step_num = 100):
    eps = 1e-5
    count = 0

    old_x = init_x
    min_x = old_x
    min_y = f(min_x)

    x_log, y_log = [min_x], [min_y]
    for i in range(step_num):
        grad = df_dx(old_x)
        new_x = old_x - learning_rate * grad
        new_y = f(new_x)
        
        if min_y > new_y:
            min_x = new_x
            min_y = new_y

        if np.abs(old_x - new_x) < eps:
            break

        x_log.append(old_x)
        y_log.append(new_y)

        old_x = new_x
        count += 1

    return x_log, y_log, count

$f_{3}(x)=xsin(x^{2})+1$ 그래프

# 시각화
x_init1 = -2.2
x_log1, y_log1, count1 = gradient_descent2(f3, df_dx3, init_x = x_init1)
plt.scatter(x_log1, y_log1, color = 'red')
print("count:", count1)

x_init2 = -0.5
x_log2, y_log2, count2 = gradient_descent2(f3, df_dx3, init_x = x_init2)
plt.scatter(x_log2, y_log2, color = 'blue')
print("count:", count2)

x_init3 = 1.5
x_log3, y_log3, count3 = gradient_descent2(f3, df_dx3, init_x = x_init3)
plt.scatter(x_log3, y_log3, color = 'green')
print("count:", count3)

x = np.arange(-3, 3, 0.01)
plt.plot(x, f3(x), '--')

plt.scatter(x_init1, f3(x_init1), color = 'red')
plt.text(x_init1 + 0.2, f3(x_init1) + 0.2, "x_init1 ({})".format(x_init1), fontdict = {'size': 13})
plt.scatter(x_init2, f3(x_init2), color = 'blue')
plt.text(x_init2 + 0.1, f3(x_init2) - 0.3, "x_init2 ({})".format(x_init2), fontdict = {'size': 13})
plt.scatter(x_init3, f3(x_init3), color = 'green')
plt.text(x_init3 - 1.0, f3(x_init3) + 0.3, "x_init3 ({})".format(x_init3), fontdict = {'size': 13})

plt.grid()
plt.show()

# 출력 결과
count: 17
count: 100
count: 28

8. 학습률(Learning Rate)

학습률 값은 적절히 지정해야 함
너무 크면 발산, 너무작으면 학습이 잘 되지 않음

https://mc.ai/an-introduction-to-gradient-descent-algorithm/

# learning rate가 굉장히 큰 경우(learning_rate = 1.05)
x_init = 10
x_log, y_log, _ = gradient_descent2(f1, df_dx1, init_x = x_init, learning_rate = 1.05)
plt.plot(x_log, y_log, color = 'red')

plt.scatter(x_init, f1(x_init), color = 'green')
plt.text(x_init - 2.2, f1(x_init) - 2, "x_init ({})".format(x_init), fontdict = {'size': 10})
x = np.arange(-50, 30, 0.01)
plt.plot(x, f1(x), '--')
plt.grid()
plt.show()

- 학습률별 경사하강법

lr_list = [0.001, 0.01, 0.1, 1.01]

init_x = 30.0
x = np.arange(-30, 50, 0.01)
fig = plt.figure(figsize = (12,10))

for i, lr in enumerate(lr_list):
    x_log, y_log, count = gradient_descent2(f1, df_dx1, init_x = x_init, learning_rate = lr)
    ax = fig.add_subplot(2, 2, i+1)
    ax.scatter(init_x, f1(init_x), color = 'green')
    ax.plot(x_log, y_log, color = 'red', linewidth = '4')
    ax.plot(x, f1(x), '--')
    ax.grid()
    ax.title.set_text('learning_rate = {}'.format(str(lr)))
    print("init value = {}, count = {}".format(str(lr), str(count)))

plt.show()

9. 안장점(Saddle Point)

기울기가 0이지만 극값이 되지 않음
경사하강법은 안장점에서 벗어나지 못함

$f_{2}(x)=0.01x^{4}-0.3x^{3}-1.0x+10.0$ 그래프로 확인하기
첫번째 시작점
- count가 100, 즉 step_num(반복횟수)만큼 루프를 돌았음에도 손실함수의 값이 10 언저리에서 멈춤, 변화 x
- 이는 학습률 조절 또는 다른 초기값 설정을 통해 수정해야 함

x_init1 = -10.0
x_log1, y_log1, count1 = gradient_descent2(f2, df_dx2, init_x = x_init1)
plt.scatter(x_log1, y_log1, color = 'red')
print("count:", count1)

x_init2 = 5.0
x_log2, y_log2, count2 = gradient_descent2(f2, df_dx2, init_x = x_init2)
plt.scatter(x_log2, y_log2, color = 'blue')
print("count:", count2)

x_init3 = 33.0
x_log3, y_log3, count3 = gradient_descent2(f2, df_dx2, init_x = x_init3)
plt.scatter(x_log3, y_log3, color = 'green')
print("count:", count3)

x = np.arange(-15, 35, 0.01)
plt.plot(x, f2(x), '--')

plt.scatter(x_init1, f2(x_init1), color = 'red')
plt.text(x_init1 + 2, f2(x_init1), "x_init1 ({})".format(x_init1), fontdict = {'size': 13})
plt.scatter(x_init2, f2(x_init2), color = 'blue')
plt.text(x_init2 + 2, f2(x_init2) + 53, "x_init2 ({})".format(x_init2), fontdict = {'size': 13})
plt.scatter(x_init3, f2(x_init3), color = 'green')
plt.text(x_init3 - 18, f2(x_init3), "x_init3 ({})".format(x_init3), fontdict = {'size': 13})

plt.grid()
plt.show()

# 출력 결과
count: 100
count: 82
count: 50

$f_{3}(x)=xsin(x^{2})+1$ 그래프로 확인하기

x_init1 = -2.2
x_log1, y_log1, count1 = gradient_descent2(f3, df_dx3, init_x = x_init1)
plt.scatter(x_log1, y_log1, color = 'red')
print("count:", count1)

x_init2 = 1.2
x_log2, y_log2, count2 = gradient_descent2(f3, df_dx3, init_x = x_init2)
plt.scatter(x_log2, y_log2, color = 'blue')
print("count:", count2)

x = np.arange(-3, 3, 0.01)
plt.plot(x, f3(x), '--')

plt.scatter(x_init1, f3(x_init1), color = 'red')
plt.text(x_init1 + 0.2, f3(x_init1) + 0.2, "x_init1 ({})".format(x_init1), fontdict = {'size': 13})
plt.scatter(x_init2, f3(x_init2), color = 'blue')
plt.text(x_init2 - 1.0, f3(x_init2) + 0.3, "x_init2 ({})".format(x_init2), fontdict = {'size': 13})

plt.grid()
plt.show()

# 출력 결과
count: 17
count: 100

저작자표시 (새창열림)

'Python > Deep Learning' 카테고리의 다른 글

[딥러닝 기초] 오차역전파(Backpropagation) (0)	2023.03.15
[딥러닝 기초] 신경망 학습 (0)	2023.03.14
[딥러닝 기초] 모델 학습과 손실 함수 (1)	2023.03.12
[딥러닝 기초] 신경망 구조 (0)	2023.03.09
[딥러닝 기초] 신경망 데이터 표현 (0)	2023.03.08

1. 지도 학습 vs 비지도 학습

지도 학습(Supervised Learning)
- 입력에 대한 정답(Label, Ground Truth)이 존재
- [입력-정답] 관계를 학습하여 새로운 입력에 대해 정답을 맞추는 과정
비지도 학습(Unsupervised Learning)
- 정답이 없음
- 데이터로부터 어떤 알고리즘을 통해 유용한 정보를 추출

2. 학습 매개변수(Trainable Parameter)

학습 매개변수: 학습 시, 값이 변화하는 매개변수, 이 매개변수에 따라 학습 알고리즘(모델)이 변함
학습 모델: 입력에 따른 출력을 나타내는 수식
선형회귀를 예로 들면,
$$ Y = aX + b $$
- $X$: 입력
- $Y$: 출력
- $a, b$: 학습 매개변수
초기화된 모델로부터 학습이 진행되면서 학습 데이터에 맞는 모델로 학습 파라미터를 수정해 나가는 과정

https://learningstatisticswithr.com/book/regression.html

3. 하이퍼 파라미터(Hyper Parameter)

사람이 직접 설정해야하는 매개변수
학습이 되기 전 미리 설정되어 상수취급
- 손실함수(Cost Function)
- 학습률(Learnign Rate)
- 학습 반복 횟수(Epochs)
- 미니 배치 크기(Batch Size)
- 은닉층의 노드 개수(Units)
- 노이즈(Noise)
- 규제화(Regularization)
- 가중치 초기화(Weights Initialization)
신경망의 매개변수인 가중치는 학습 알고리에 의해 자동으로 갱

4. 손실함수(Loss Function, Cost Function)

학습이 진행되면서 해당 과정이 얼마 잘 되고 있는지 나타내는 지표
손실 함수에 따른 결를 통해 학습 파라미를 조정
최적화 이론에서 최소화하고자 하는 함수
미분 가능한 함수 사용

- 학습의 수학적 의미

https://www.internalpointers.com/post/cost-function-logistic-regression

$$ \widetilde{\theta}=\underset{\theta}{argmin}L(x, y; \theta) $$

$L$: 손실함수
$x$: 학습에 사용되는 데이터의 입력값
$y$: 학습에 사용되는 데이터의 출력값
$\theta$: 학습될 모든 파라미를 모은 벡터
$\widetilde{\theta}$: 추정된 최적의 파라미터
학습에 사용되는 파라미터를 모두 통칭해서 $\theta$로 표현 가능, 이러한 $\theta$의 최적값을 찾는 것이 학습
학습 데이터의 입력($x$)와 $\theta$에 따라 나온 예측값이 정답(\y\)와 비교하여 $\theta$를 조절해나가는 과정

https://medium.com/@dhartidhami/machine-learning-basics-model-cost-function-and-gradient-descent-79b69ff28091

즉, 최적의 $theta$값에 따라 손실함수의 가정 최저점(최소값)을 찾는 과정
손실함수는 지도학습 알고리즘에 반드시 필요

5. 원-핫 인코딩

범주형 변수를 표현할 때 사용
가변수(Dummy Variable)이라고도 함
정답인 레이블을 제외하고 0으로 처리

https://medium.com/@michaeldelsole/what-is-one-hot-encoding-and-how-to-do-it-f0ae272f1179

import numpy as np

def convert_one_hot(labels, num_classes):
    one_hot_result = np.zeros((len(labels), num_classes))
    for idx, label in enumerate(labels):
        one_hot_result[idx][label] = 1

    return one_hot_result

x_label = [1, 3, 3, 4, 2, 0, 5, 3, 0]
print(convert_one_hot(x_label, max(x_label) + 1))

# 출력 결과
# 맨 앞에 0부터 시작
[[0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 0.]]

# Keras에서 원-핫 인코딩 하는 법
from keras.utils.np_utils import to_categorical

x_label = [1, 3, 3, 4, 2, 0, 5, 3, 0]
one_hot_label = to_categorical(x_label)
print(one_hot_label)

# 출력 결과
[[0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 0.]]

# sklearn에서 원-핫 인코딩
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder

def convert_one_hot_sklearn(class_label):
    encoder = LabelEncoder()
    encoder.fit(class_label)
    labels = encoder.transform(class_label)

    labels = labels.reshape(-1, 1)

    oh_encoder = OneHotEncoder()
    oh_encoder.fit(labels)
    oh_labels = oh_encoder.transform(labels)

    return oh_labels.toarray()

# 예시 데이터 생성
marvel_labels = ['아이언맨', '캡틴 아메리카', '헐크', '블랙 위도우', '스파이더맨', '앤트맨']
ohe = convert_one_hot_sklearn(marvel_labels)
print(ohe)
print("One hot encoder datatype:", type(ohe))
print("One hot encoder shape:", ohe.shape)
print("-----------------------------")

classes = [3, 2, 1, 3, 0, 4, 5, 3, 0]
ohe = convert_one_hot_sklearn(classes)
print(ohe)
print("One hot encoder datatype:", type(ohe))
print("One hot encoder shape:", ohe.shape)

# 출력 결과
[[0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]]
One hot encoder datatype: <class 'numpy.ndarray'>
One hot encoder shape: (6, 6)
-----------------------------
[[0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 0.]]
One hot encoder datatype: <class 'numpy.ndarray'>
One hot encoder shape: (9, 6)

# pandas에서 원-핫 인코딩
import pandas as pd

df = pd.DataFrame({'labels': ['아이언맨', '캡틴 아메리카', '헐크', '블랙 위도우', '스파이더맨', '앤트맨']})
ohe_df = pd.get_dummies(df['labels'])
ohe_df

6. 평균절대오차(Mean Absolute Error, MAE)

오차가 커져도 손실함수가 일정하게 증가
이상치(Outlier)에 강건함(Robust)
- 데이터에서 [입력-정답]관계가 적절하지 않은 것이 있을 경우에, 좋은 추정을 하더라도 오차가 발생하는 경우가 발생
- 그때, 해당 이상치에 해당하는 지점에서 손실 함수의 최소값으로 가는 정도의 영향력이 크지 않음
중간값(Median)과 연관
회귀(Regression)에 많이 사용

https://heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0

$$ E=\frac{1} {n} \sum_{i=1}^{n} \left |y_{i}-\widetilde{y_{i}} \right | $$

$y_{i}$: 학습 데이터의 $i$번째 정답
$\widetilde{y_{i}}$: 학습데이터의 입력으로 추정한 $i$번째 출력

def MAE(y, pred_y):
    return np.mean(np.abs((y - pred_y)))
    
import matplotlib.pyplot as plt

y = np.array([-3, -1, -2, 1, 3, -2, 2, 5, 3, 3, -2, -1, 2])
yhat = np.array([-3, -1, -5, 0, 3, -1, 2, 4, 3, 3, -2, -1, -1])
x = list(range(len(y)))

plt.scatter(x, y, color = 'b', label = 'True')
plt.plot(x, yhat, color = 'r', label = 'Pred')
plt.legend()
plt.grid()
plt.show()

# 위 그래프와 같은 y와 y의 예측값을 가질 때 MAE 계산
print(MAE(y, yhat))

# 출력 결과
0.6923076923076923

7. 평균제곱오차(Mean Squared Error, MSE)

가장 많이 쓰이는 손실 함수 중 하나
오차가 커질수록 손실함수가 빠르게 증가
- 정답과 예측한 값의 차이가 클수록 더 많은 패널티를 부여
회귀(Regression)에 쓰임

$$ E=\frac{1} {n} \sum_{i=1}^{n} (y_{i}-\widetilde{y_{i}})^{2} $$

$y_{i}$: 학습 데이터의 $i$번째 정답
$\widetilde{y_{i}}$: 학습데이터의 입력으로 추정한 $i$번째 출력

def MSE(y, pred_y):
    return 0.5 * np.sum(np.square(y - pred_y))

print(MSE(y, yhat))

# 출력 결과
10.5

- 손실함수로써의 MAE와 MSE 비교

8. 교차 엔트로피 오차(Cross Entropy Error, CEE)

이진 분류, 다중 클래스 분류
소프트맥스와 원-핫 인코딩 사이의 출력 간 거리를 비교
정답인 클래스에 대해서만 오차를 계산
- 정답을 맞추면 오차가 0, 틀리면 그 차이가 클수록 오차가 무한히 커짐

$$ E=-\frac{1}{N} \sum_{n} \sum_{i} y_{i} log\widetilde{y_{i}} $$

$y_{i}$: 학습 데이터의 $i$번째 정답
$\widetilde{y_{i}}$: 학습데이터의 입력으로 추정한 $i$번째 출력
$N$: 전체 데이터의 개수
$i$: 데이터 하나당 클래스 개수
$y=log(x)$는
- $x$가 0에 가까울수록 $y$값은 무한히 커
- 1에 가까울수록 0에 가까워
정답 레이블($y_{i}$)은 원-핫 인코딩으로 정답인 인덱스에만 1이고, 나머지는 모두 0
따라서, 위 수식은 다음과 같이 나타낼 수 있음

$$ E=-log\widetilde{y_{i}} $$

소프트맥스를 통해 나온 신경망 출력이 0.6이라면 $log0.6 \approx−0.51$이 되고, 신경망 출력이 0.3이라면 $−log0.3 \approx −1.2$이 됨
정답에 가까워질수록 오차값은 작아짐
학습시, 원-핫 인코딩에 의해 정답 인덱스만 살아 남아 비교하지만, 정답이 아닌 인덱스들도 학습에 영향을 미침
다중 클래스 분류는 소프트맥스(softmax) 함수를 통해 전체 항들을 모두 다루기 때문

https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a

def CEE(y_pred, y_true):
    delta = 1e-7
    return -np.sum(y_true * np.log(y_pred + delta))

y = np.array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0])
yhat = np.array([0.01, 0.1, 0.05, 0.0, 0.1, 0.7, 0.0, 0.03, 0.01, 0.0])
print("yhat 합:", np.sum(yhat))
print(CEE(yhat, y))
print("-----------------")

# yhat의 결과에서 0.03과 0.7 값의 위치를 바꿔 오차를 더 크게 만듦
y = np.array([0, 0, 0, 0, 0, 1, 0, 0, 0 ,0])
yhat = np.array([0.01, 0.1, 0.05, 0.0, 0.1, 0.03, 0.0, 0.7, 0.01, 0.0])
print("yhat 합:", np.sum(yhat))
print(CEE(yhat, y))

# 출력 결과
yhat 합: 1.0
0.3566748010815999
-----------------
yhat 합: 1.1
3.506554563992204

- 이진 분류에서의 교차 크로스 엔트로피(Binary Cross Entropy, BCE)

이진 분류 문제(Binary Classification Problem)에서도 크로스 엔트로피 오차를 손실함수로 사용 가능

$$ E = -\sum_{i=1}^{2} y_{i}log\widetilde{y_{i}} \\ \quad\qquad\qquad\qquad\qquad=-y_{1}log\widetilde{y}_{1}-(1-y_{1})log(1-\widetilde{y}_{1}) \\ \quad(\because y_{2}=1-y_{1}) $$

$y_{i}$: 학습 데이터의 $i$번째 정답
$\widetilde{y_{i}}$: 학습데이터의 입력으로 추정한 $i$번째 출력
2개의 클래스를 분류하는 문제에서 1번이 정답일 확률이 0.8이고, 실제로 정답이 맞다면 위 식은 다음과 같이 나타낼 수 있음

$$ E = -\sum_{i=1}^{2} y_{i}log\widetilde{y_{i}} \\ \quad\qquad\qquad\qquad\qquad=-1log0.8-(1-1)log(1-0.8) \\ =-log0.8 \\ \approx -0.22 $$

반대로 실제 정답이 2번이었다면, 식은 다음과 같이 나타낼 수 있음

$$ E = -\sum_{i=1}^{2} y_{i}log\widetilde{y_{i}} \\ \quad\qquad\qquad\qquad\qquad=-0log0.8-(1-0)log(1-0.8) \\=-log0.2 \\\approx -1.61 $$

# 2번이 정답
# 2번이 정답일 확률을 0.85로 예측, 맞음
y = np.array([0, 1])
yhat = np.array([0.15, 0.85])
print("yhat 합:", np.sum(yhat))
print(CEE(yhat, y))
print("-----------------")

# 1번이 정답
# 1번이 정답일 확률을 0.15로 예측, 틀림
y = np.array([1, 0])
yhat = np.array([0.15, 0.85])
print("yhat 합:", np.sum(yhat))
print(CEE(yhat, y))

# 출력 결과
yhat 합: 1.0
0.1625188118507231
-----------------
yhat 합: 1.0
1.8971193182194368

저작자표시 (새창열림)

'Python > Deep Learning' 카테고리의 다른 글

[딥러닝 기초] 신경망 학습 (0)	2023.03.14
[딥러닝 기초] 경사하강법 (0)	2023.03.13
[딥러닝 기초] 신경망 구조 (0)	2023.03.09
[딥러닝 기초] 신경망 데이터 표현 (0)	2023.03.08
[딥러닝 기초] 신경망 기초수학 (0)	2023.03.06

1. 퍼셉트론

인공신경망의 한 종류
다수의 입력($x_{1}, x_{2}, x_{3}, \cdots, x_{n}$)과 가중치($w_{1}, w_{2}, w_{3}, \cdots, w_{n}$)를 곱하여 그 값에 편향($bias$)를 더한 값이 어느 임계치 값($\theta$)을 초과하면 활성화 함수를 통과한 출력값을 내보냄

https://towardsdatascience.com/rosenblatts-perceptron-the-very-first-neural-network-37a3ec09038a

2. 뉴런의 수학적 표현

https://cs231n.github.io/convolutional-networks/

뉴런의 axon($x_{0}$)이 synapse를 통과하며 가중치($w_{0}$)가 주어짐 → dendrite를 통해 $w_{0}x_{0}$이 cell body에 들어감 → 이외에 $w_{1}x_{1}$, $w_{2}x_{2}$등도 cell body에 들어옴
cell body에서 값들이 합해지고, 합한 값에 마지막으로 편향을 더해줌 → activation function 과정을 거쳐 output axon으로 출력됨
$y=f(\sum _{i}w_{i}x_{i}+b)$
$f$: 활성화 함수
- 임계값($\theta$)을 경계로 출력이 바뀜
$b$: 편향
- 결정 경계선을 원점에서부터 벗어나게 해줌
- 따로 표현이 없어도 기본적으로 존재한다고 생각
$\sum _{i}w_{i}x_{i}$: 두 벡터의 내적으로 표현 가능
$x_{1}w_{1} + x_{2}w_{2} + \cdots + x_{n}w_{n} = w^{T}x$

3. 완전 연결 계층(Fully-Connected Layer) 수학적 표현

모드 노드들이 연결된 구조
$W=[w_{0}, w_{1}, \cdots, w_{M-1}]^{T}$
각각의 $w_{k}$는 $N \times 1$ 형태의 벡터
$W$는 $N \times M$행렬
$b=[b_{0}, b_{1}, \cdots, b_{M-1}]\\
y_{0}=f(w_{0}^{T}x+b_{0})\\
y_{1}=f(w_{1}^{T}x+b_{1})\\
y_{2}=f(w_{2}^{T}x+b_{2})\\
\cdots\\
y_{M-1}=f(w_{M-1}^{T}x+b_{M-1})\\
\to y=f(Wx+b)$

4. 논리 회로

논리 게이트(Logic Gates)
- AND: 둘 다 1이면 1
- OR: 둘 중 하나면 1이면 1
- NOT: 하나가 1이면 다른 하나는 0, 하나가 0이면 다른 하나는 1
- NAND: 둘 다 1이면 0
- NOR: 둘 다 0이면 1
다이어그램과 진리표

http://www.schoolphysics.co.uk/age14-16/Electronics/text/Logic_gates/index.html

- AND 게이터

두 입력이 모두 1일 때 1을 출력하는 논리회로

https://www.tutorialspoint.com/computer_logical_organization/logic_gates.htm

def AND(a, b):
    input = np.array([a, b])
    weights = np.array([0.4, 0.4])
    bias = -0.6
    value = np.sum(input * weights) + bias

    if (value <= 0):
        return 0
    else:
        return 1
        
print(AND(0, 0))  # 0
print(AND(0, 1))  # 0
print(AND(1, 0))  # 0
print(AND(1, 1))  # 1

x1 = np.arange(-2, 2, 0.01)
x2 = np.arange(-2, 2, 0.01)
bias = -0.6

y = (-0.4 * x1 - bias) / 0.4

plt.axvline(x = 0)
plt.axhline(y = 0)
plt.plot(x1, y, 'r--')
plt.scatter(0, 0, color = 'orange', marker = 'o', s = 150)
plt.scatter(0, 1, color = 'orange', marker = 'o', s = 150)
plt.scatter(1, 0, color = 'orange', marker = 'o', s = 150)
plt.scatter(1, 1, color = 'black', marker = '^', s = 150)
plt.xlim(-0.5, 1.5)
plt.ylim(-0.5, 1.5)
plt.grid()
plt.show()

빨간선인 임계값을 넘어간 부분이 결과 1을 출

- OR 게이트

두 입력 중 하나라도 1이면 1을 출력하는 논리회로

def OR(a, b):
    input = np.array([a, b])
    weights = np.array([0.4, 0.5])
    bias = -0.3
    value = np.sum(input * weights) + bias

    if (value <= 0):
        return 0
    else:
        return 1

print(OR(0, 0))  # 0
print(OR(0, 1))  # 1
print(OR(1, 0))  # 1
print(OR(1, 1))  # 1

x1 = np.arange(-2, 2, 0.01)
x2 = np.arange(-2, 2, 0.01)
bias = -0.3

y = (-0.4 * x1 - bias) / 0.5

plt.axvline(x = 0)
plt.axhline(y = 0)
plt.plot(x1, y, 'r--')
plt.scatter(0, 0, color = 'orange', marker = 'o', s = 150)
plt.scatter(0, 1, color = 'black', marker = '^', s = 150)
plt.scatter(1, 0, color = 'black', marker = '^', s = 150)
plt.scatter(1, 1, color = 'black', marker = '^', s = 150)
plt.xlim(-0.5, 1.5)
plt.ylim(-0.5, 1.5)
plt.grid()
plt.show()

빨간선인 임계값을 넘어간 부분이 결과 1을 출력

- NAND 게이트

두 입력이 모두 1일 때 0을 출력하는 논리회로

def NAND(a, b):
    input = np.array([a, b])
    weights = np.array([-0.6, -0.5])
    bias = 0.7
    value = np.sum(input * weights) + bias

    if (value <= 0):
        return 0
    else:
        return 1
        
print(NAND(0, 0))  # 1
print(NAND(0, 1))  # 1
print(NAND(1, 0))  # 1
print(NAND(1, 1))  # 0

x1 = np.arange(-2, 2, 0.01)
x2 = np.arange(-2, 2, 0.01)
bias = 0.7

y = (0.6 * x1 - bias) / -0.5

plt.axvline(x = 0)
plt.axhline(y = 0)
plt.plot(x1, y, 'r--')
plt.scatter(0, 0, color = 'black', marker = '^', s = 150)
plt.scatter(0, 1, color = 'black', marker = '^', s = 150)
plt.scatter(1, 0, color = 'black', marker = '^', s = 150)
plt.scatter(1, 1, color = 'orange', marker = 'o', s = 150)
plt.xlim(-0.5, 1.5)
plt.ylim(-0.5, 1.5)
plt.grid()
plt.show()

빨간선인 임계값을 넘어간 부분이 결과 0을 출력

5. XOR 게이트

인공지능 첫번째 겨울, 딥러닝의 첫번째 위기를 초래
AND, NAND와 같은 선형 문제는 퍼셉트론으로 해결 가능하지만, XOR은 직선(선형) 하나로는 불가능
다층 퍼셉트론으로 해
AND, NAND, OR Gate를 조합

6. 다층 퍼셉트론(Multi Layer Perceptron, MLP)

- 다층 퍼셉트론의 구성

입력층(input layer)
은닉층(hidden layer)
- 1개 이상 존재
- 보통 5개 이상 존재하면 Deep Neural Network라고 칭함
출력층(output layer)

https://www.researchgate.net/figure/A-schematic-diagram-of-artificial-neural-network-and-architecture-of-the-feed-forward_fig1_26614896

수식
- (input layer → hidden layer)
  $z=f_{L}(W_{L}x+b_{L})$
- (hidden layer → output layer)
  $y=a_{K}(W_{K}z+b_{K})$

- XOR 게이트

서로 다른 두 값이 입력으로 들어가면 1을 반환

def XOR(x1, x2):
    s1 = NAND(x1, x2)
    s2 = OR(x1, x2)
    y = AND(s1, s2)
    return y

print(XOR(0, 0))  # 0
print(XOR(0, 1))  # 1
print(XOR(1, 0))  # 1
print(XOR(1, 1))  # 0

7. 활성화 함수(Activation Function)

입력 신호의 총합을 출력 신호로 변환하는 함수
활성화 함수에 따라 출력값이 결정
단층, 다층 퍼셉트론 모두 사용
대표적인 활성화 함수
- Sigmoid
- ReLU
- tanh
- Identify Function
- Softmax
하나의 layer에서 다음 layer로 넘어갈 때는 항상 활성화 함수를 통과

- 계단 함수(Step Function)

$y=\begin{cases}
0&(x<0) \\
1& (x\geq 0)
\end{cases}$

https://www.intmath.com/laplace-transformation/1a-unit-step-functions-definition.php

def step_function(x):
    if x > 0:
        return 1
    else:
        return 0

def step_function_for_numpy(x):
    y = x > 0
    return y.astype(int)


print(step_function(-3))   # 0
print(step_function(5))    # 1

# 넘파이 배열로 입력값을 줄 때 사용
a = np.array([5, 3, -4, 2.0])
print(step_function_for_numpy(a))   # [1 1 0 1]

- 시그모이드 함수(Sigmoid Function)

이진분류(binary Classification)에 주로 사용
- 마지막 출력층의 활성화 함수로 사용
출력값이 0~1의 값이며, 이는 확률로 표현 가능
$y=\frac{1}{1+e^{-x}}$

https://www.geeksforgeeks.org/implement-sigmoid-function-using-numpy/

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

print(sigmoid(3))   # 0.9525741268224334, 1에 근접
print(sigmoid(-3))  # 0.04742587317756678, 0에 근접

- 시그모이드 함수와 계단 함수 비교

공통점
- 출력값이 0~1내의 범위
- 입력값의 정도에 따라 출력값의 정도가 달라짐 즉, 입력이 중요하면(입력값이 크면) 큰 값을 출력
차이점
계단함수에 비해 시그모이드 함수는
- 입력에 따라 출력이 연속적으로 변화
- 출력이 '매끄러움'
  이는 모든 점에서 미분 가능함을 의미

plt.grid()
x = np.arange(-5.0, 5.0, 0.01)
y1 = sigmoid(x)
y2 = step_function_for_numpy(x)
plt.plot(x, y1, 'r--', x, y2, 'b--')
plt.show()

- ReLU(Rectified Linear Unit)

가장 많이 쓰이는 함수 중 하나

$y=\begin{cases}
0&(x\leq0) \\
x& (x> 0)
\end{cases}$

https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/

def ReLU(x):
    if x > 0:
        return x
    else:
        return 0

print(ReLU(5))   # 5
print(ReLU(-3))  # 0

- 하이퍼볼릭 탄젠트 함수(Hyperbolic tangent function, tanh)

\(y=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}

https://www.researchgate.net/figure/Hyperbolic-tangent-activation-function_fig1_326279910

def tanh(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))

print(tanh(3))   # 0.9950547536867306
print(tanh(-3))  # -0.9950547536867306

- 항등 함수(Identity Function)

회귀(Regression) 문제에서 주로 사용
- 출력층의 활성화 함수로 활용
$y=x$
입력값 그대로 출력하기 때문에 굳이 정의할 필요는 없지만 신경망 중간 레이어 흐름과 통일하기 위해 사용

https://math.info/Algebra/Identity_Function/

def identify_function(x):
    return x

print(identify_function(4))   # 4
print(identify_function(-1))  # -1

X = np.array([2, -3, 0.4])
print(identify_function(X))   # [ 2.  -3.   0.4]

- Softmax

다중 클래스 분류에 사용(Multi Class Classification)
입력값의 영향을 크게 받음
입력값이 크면 출력값도 큼
출력값을 확률에 대응 가능
출력값의 총합은 1
수식
$y_{k}=\frac{exp(a_{k})} {\sum_{i=1}exp(a_{i})}$

def softmax(a):
    exp_a = np.exp(a)
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sum_exp_a
    return y

a = np.array([0.3, 0.2, 4.0, -1.2])
print(softmax(a))                 # [0.02348781 0.02125265 0.9500187  0.00524084]
print(np.sum(softmax(a)))    # 1.0

- 소프트맥스 함수 주의점

오버플로우(overflow) 문제

A = np.array([1000, 900, 1050, 500])
print(softmax(A))

# 출력 결과
[nan nan nan  0.]
RuntimeWarning: overflow encountered in exp
  exp_a = np.exp(a)
RuntimeWarning: invalid value encountered in true_divide
  y = exp_a / sum_exp_a

지수 함수(exponential function)을 사용하지 때문에 입력값이 너무 크면 무한대(inf)가 반환됨
개선한 수식(C는 a의 최대값, 스케일링을 조금 하는 것)
$y_{k}=\frac {exp(a_{k})} {\sum_{i=1}exp(a_{i})} = \frac {Cexp(a_{k})} {C\sum_{i=1}exp(a_{i})}\\
\quad = \frac {exp(a_{k} + logC)} {\sum_{i=1}exp(a_{i} + logC)}\\
\quad = \frac {exp(a_{k}+C'} {\sum_{i=1}exp(a_{i}+C')}$

def softmax(a):
    C = np.max(a)
    return (np.exp(a - C) / np.sum(np.exp(a - C)))

A = np.array([1000, 900, 1050, 500])
print(softmax(A))

# 출력 결과
[1.92874985e-022 7.17509597e-066 1.00000000e+000 1.37415257e-239]

- 활성화 함수를 비선형 함수로 사용하는 이유

신경망을 깊게 하기 위함
만약 활성화 함수를 선형함수로 하게 되면 은닉층의 개수가 여러개이더라도 의미가 없어짐
만약, $h(x)=cx$이고, 3개의 은닉층이 존재한다면
$y=h(h(h(x)))\\
\quad=c \times c \times c \times x\\
\quad=c^{3}x$
이므로 결국 선형 함수가 되어버림

- 그 외의 활성화 함수

LeakyReLU
$f_{a}(x)=\begin{cases}
x& x \geq 0\\
ax& x<0
\end{cases}$

https://knowhowspot.com/technology/ai-and-machine-learning/artificial-neural-network-activation-function/

def LeakyReLU(x):
    a = 0.01
    return np.maximum(a*x, x)
    
x = np.array([0.5, -1.4, 3, 0, 5])
print(LeakyReLU(x))

# 출력 결과
[ 0.5   -0.014  3.     0.     5.   ]

ELU(Exponential Linear Units)
$f(\alpha, x)=\begin{cases}
\alpha(e^{x}-1)& x \leq 0\\
x& x>0
\end{cases}$

https://www.researchgate.net/figure/Exponential-Linear-Unit-activation-function-input-output-mapping-The-activation-function_fig1_331794632

def ELU(x):
    alpha = 1.0
    return ( x>= 0) * x + (x < 0) * alpha * (np.exp(x)-1)

print(ELU(4))       # 4.0
print(ELU(-0.5))   # -0.3934693402873666

x = np.array([-2, 0.1, 4])
print(ELU(x))      # [-0.86466472  0.1         4.        ]

- 활성화 함수 참고

일반적인 사용 순서
1. ELU
2. LeakyReLU
3. ReLU
4. tanh
5. sigmoid
스탠포드 강의에서 언급한 사용 순서
1. ReLU
2. ReLU Family(LeakyReLU, ELU)
3. sigmoid는 사용 X

8. 3층 신경망 구현하기

2 클래스 분류
입력층(input layer)
- 뉴런수: 3
은닉층(hidden layer)
- 첫번째 은닉층 뉴런수: 3
- 두번째 은닉층 뉴런수: 2
출력층(output layer)
- 뉴런수: 2

- 활성화 함수 정의

def sigmoid(X):
    return 1 / (1 + np.exp(-X))

X = np.array([1.0, 0.5, 0.4])
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6], [0.3, 0.5, 0.7]])
B1 = np.array([1, 1, 1])

print(X.shape)    # (3,)
print(W1.shape)   # (3, 3)
print(B1.shape)   # (3,)

A1 = np.dot(X, W1) + B1
Z1 = sigmoid(A1)

print(A1)
print(Z1)

# 출력 결과
[1.32 1.7  2.08]
[0.78918171 0.84553473 0.88894403]

# 두번째 레이어 통과
W2 = np.array([[0.2, 0.4, 0.6], [0.1, 0.3, 0.5], [0.4, 0.6, 0.8]])
B2 = np.array([1, 1, 1])

print(W2.shape)   # (3, 3)
print(B2.shape)   # (3,)

A2 = np.dot(A1, W2) + B2
Z2 = sigmoid(A2)

print(A2)
print(Z2)

# 출력 결과
[2.266 3.286 4.306]
[0.90602176 0.96394539 0.9866921 ]

# 세번째 레이어 통과
W3 = np.array([[0.1, 0.3], [-0.1, -0.5], [0.3, 0.5]])
B3 = np.array([1, 1])

print(W3.shape)   # (3, 2)
print(B3.shape)   # (2,)

A3 = np.dot(A2, W3) + B3
Z3 = sigmoid(A3)

print(A3)
print(Z3)

# 출력 결과
[2.1898 2.1898]
[0.8993298 0.8993298]

# 네번째 레이어 통과
W4 = np.array([[0.1, 0.2], [0.3, 0.5]])
B4 = np.array([1, 1])

print(W4.shape)   # (2, 2)
print(B4.shape)   # (2,)

A4 = np.dot(A3, W4) + B4
Z4 = sigmoid(A4)

print(A4)
print(Z4)

# 출력 결과
[1.87592 2.53286]
[0.86714179 0.92641356]

# 하나의 네트워크로 합치면 다음과 같이 됨
def network():
    network = {}

    # 첫번째 레이어
    network['W1'] = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6], [0.3, 0.5, 0.7]])
    network['B1'] = np.array([1, 1, 1])

    # 두번째 레이어
    network['W2'] = np.array([[0.2, 0.4, 0.6], [0.1, 0.3, 0.5], [0.4, 0.6, 0.8]])
    network['B2'] = np.array([1, 1, 1])

    # 세번째 레이어
    network['W3'] = np.array([[0.1, 0.3], [-0.1, -0.5], [0.3, 0.5]])
    network['B3'] = np.array([1, 1])

    # 네번째 레이어
    network['W4'] = np.array([[0.1, 0.2], [0.3, 0.5]])
    network['B3'] = np.array([1, 1])

    return network
    

def forward(network, x):
    W1, W2, W3, W4 = network['W1'], network['W2'], network['W3'], network['W4']
    B1, B2, B3, B4 = network['B1'], network['B2'], network['B3'], network['B4']

    A1 = np.dot(x, W1) + B1
    Z1 = sigmoid(A1)

    A2 = np.dot(Z1, W1) + B1
    Z2 = sigmoid(A2)

    A3 = np.dot(Z2, W1) + B1
    Z3 = sigmoid(A3)


    A4 = np.dot(Z3, W1) + B1
    y = sigmoid(A4)

    return y

- 신경망 추론

net = network()
x = np.array([0.3, 1.3, -2.2])
y = forward(net, x)
print(y)

# 출력 결과
[0.78781193 0.82428264]

저작자표시 (새창열림)

'Python > Deep Learning' 카테고리의 다른 글

[딥러닝 기초] 신경망 학습 (0)	2023.03.14
[딥러닝 기초] 경사하강법 (0)	2023.03.13
[딥러닝 기초] 모델 학습과 손실 함수 (1)	2023.03.12
[딥러닝 기초] 신경망 데이터 표현 (0)	2023.03.08
[딥러닝 기초] 신경망 기초수학 (0)	2023.03.06

1. 텐서(Tensor)

일반적으로 텐서는 3차원 이상을 다룰 때 표현하는 방식이지만, 여기서는 어떠한 데이터를 표현할 때, 그 값 모두를 텐서라고 부르기로 함
아래에서 a, b, c, d 모두 텐서라고 지칭할 수 있

랭크(rank): 텐서의 축을 나타내고, 넘파이(numpy)의 ndim(number of dimension, 차원의 수)이라는 속성값으로 구할 수 있음
대괄호의 개수가 곧 랭크(축)의 값

크기(shape): 텐서의 각 축을 따라 얼마나 많은 차원이 있는지를 나타내며, 파이썬의 튜플 형태

- 스칼라(0차원 텐서)

하나의 숫자를 담고 있는 텐서
형상은 없음

x = np.array(3)
print(x)
print(x.shape)
print(np.ndim(x))

# 출력 결과
3
()
0

- 벡터(1차원 텐서)

숫자의 배열을 나타내는 텐서

x = np.array([1, 2, 3, 4])
print(x)
print(x.shape)
print(np.ndim(x))

# 출력 결과
[1 2 3 4]
(4,)
1

- 벡터의 합

같은 형상(shape)일 때, 각 원소별로 계산

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
c = a + b
print(a)
print(b)
print(c)
print(c.shape)
print(np.ndim(c))

# 출력 결과
[1 2 3 4]
[5 6 7 8]
[ 6  8 10 12]
(4,)
1

- 벡터의 곱

$A=(x_{1}, x_{2}, x_{3}, \cdots, x_{n})$
$B=(y_{1}, y_{2}, y_{3}, \cdots, y_{n})$ 일 때,
원소곱
- 같은 형상(shape)일 때, 각 원소별로 계산
  $A\times B=(x_{1}, x_{2}, x_{3}, \cdots, x_{n})\times (y_{1}, y_{2}, y_{3}, \cdots, y_{n})$
  $b\;\qquad=(x_{1}y_{1}, x_{2}y_{2}, x_{3}y_{3}, \cdots, x_{n}y_{n})$
벡터곱(product, dot)
- 두 1차원 벡터가 있을 때 각각의 성분끼리의 곱을 모두 더하는 계산

$$ A\bullet B\Rightarrow A \times B^{T}=(x_{1}, x_{2}, x_{3}, \cdots x_{n})\begin{pmatrix}
y_{1} \\
y_{2}  \\
y_{3}  \\
\vdots \\
y_{n}  \\
\end{pmatrix} \\=(x_{1}y_{1} + x_{2}y_{2} + x_{3}y_{3} + \cdots + x_{n}y_{n}$$

# 원소곱
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
c = a*b
print(c)
print(c.shape)
print(np.ndim(c))
print("-------------------------")
# 벡터곱
x = np.array([1, 2, 0])
y = np.array([0, 2, 1])
z = np.dot(x, y)
print(z)
print(z.shape)
print(np.ndim(z))

# 출력 결과
[ 5 12 21 32]
(4,)
1
-------------------------
4
()
0

- 스칼라와 벡터의 곱

# 스칼라
a = np.array(10)
# 벡터
b = np.array([1, 2, 3])
print(a*b)

# 출력 결과
[10 20 30]

2. 2차원 텐서(행렬)

2차원 텐서는 행렬로 생각할 수 있음
- (m$\times$n)형상의 배열

matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
print(matrix)
print(matrix.shape)
print(np.ndim(matrix))
print("-------------------------")
matrix2 = np.array([[1, 2, 3, 4]])
print(matrix2)
print(matrix2.shape)
print(np.ndim(matrix2))

# 출력 결과
[[1 2 3]
 [4 5 6]]
(2, 3)
2
-------------------------
[[1 2 3 4]]
(1, 4)
2

- 행렬 원소곱

같은 형상(shape)일 때 덧셈, 곱셈과 같은 연산은 원소별로 진행

A = np.array([[1, 2], [3, 4]])
B = np.array([[10, 10], [10, 10]])
print("행렬 A\n", A)
print("행렬 B\n", B)
print("A * B\n", A*B)

# 출력 결과
행렬 A
 [[1 2]
 [3 4]]
행렬 B
 [[10 10]
 [10 10]]
A * B
 [[10 20]
 [30 40]]

- 행렬 점곱(내적, product)

1차원 벡터와 마찬가지로 앞 행렬과 뒤 행렬의 수가 같아야함

# 2*2 행렬
M = np.array([[1, 2], [3, 4]])
# 2*3 행렬
N = np.array([[2, 3, 4], [2, 3, 4]])
# 앞 행렬의 열과 뒤 행렬의 행이 같아 행렬 점곱이 가능
print("행렬 M\n", M)
print("행렬 N\n", N)
L = np.dot(M, N)
print("행렬 L\n", L)
print(L.shape)
print(np.ndim(L))

# 출력 결과
행렬 M
 [[1 2]
 [3 4]]
행렬 N
 [[2 3 4]
 [2 3 4]]
행렬 L
 [[ 6  9 12]
 [14 21 28]]
(2, 3)
2

# 3*1 행렬
m = np.array([[1], [2], [3]])
# 3*1행렬
n = np.array([[1], [2], [3]])
# 앞 행렬의 열과 뒤 행렬의 행이 달라 행렬 점곱이 불가능
l = np.dot(m, n)

print(l)
print(l.shape)
print(np.ndim(l))

# 출력 결과
ValueError: shapes (3,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

- 역행렬

어떤 행렬 A가 있을 때, 곱해서 단위행렬(E)를 만드는 행렬 B가 존재한다면, 행렬 B는 A의 역행렬

A = np.array([[1, 2], [3, 4]])
B = np.linalg.inv(A)
print(A)
print(B)
print(np.dot(A, B))

# 출력 결과
[[1 2]
 [3 4]]
[[-2.   1. ]
 [ 1.5 -0.5]]
 # A 행렬과 A 행렬의 역행렬을 곱하여 단위행렬이 나옴
[[1.00000000e+00 1.11022302e-16]
 [0.00000000e+00 1.00000000e+00]]

- 전치행렬

행과 열을 바꾼 배열의 형태

A = np.array([[1, 2, 3], [4, 5, 6]])
print("A\n", A)
print("A.shape\n", A.shape)
print("-------------------------")
A_ = A.T
print("A의 전치행렬\n", A_)
print("(A.T).shape\n", A_.shape)

# 출력 결과
A
 [[1 2 3]
 [4 5 6]]
A.shape
 (2, 3)
-------------------------
A의 전치행렬
 [[1 4]
 [2 5]
 [3 6]]
(A.T).shape
 (3, 2)

3. 3차원 텐서

보통 이미지를 나타낼 때 사용되는 텐서
- (width, height, channels)
- 일반적으로 Numpy Array로 표현

red channel, blue channel, green channel에 있는 값들이 모여 고양이 이미지가 됨

시계열 데이터 또는 시퀀스(sequence) 데이터를 표현할 때도 사용
- (samples, timesteps, features)
- (예시) 주식 가격 데이터셋, 시간에 따른 질병 발병 건수

X = np.array([[[5, 3, 2, 1],
               [5, 5, 3, 1],
               [6, 1, 2, 3]],
              [[1, 1, 1, 1],
               [3, 4, 7, 5],
               [1, 8, 3, 4]],
               [[10, 9, 3, 9],
                [5, 4, 3, 2],
                [7, 6, 3, 4]]])

print("X\n", X, end = '\n\n')
print("X.shape:", X.shape)
print("X.ndim:", X.ndim)

# 출력 결과
X
 [[[ 5  3  2  1]
  [ 5  5  3  1]
  [ 6  1  2  3]]

 [[ 1  1  1  1]
  [ 3  4  7  5]
  [ 1  8  3  4]]

 [[10  9  3  9]
  [ 5  4  3  2]
  [ 7  6  3  4]]]

X.shape: (3, 3, 4)
X.ndim: 3

B = np.array([[[2, 3, 4],[2, 3, 4]],
              [[1, 1, 1], [1, 1, 1]]])

print("행렬 B\n", B, end = '\n\n')
print("B의 전치행렬\n", B.T)

# 출력 결과
행렬 B
 [[[2 3 4]
  [2 3 4]]

 [[1 1 1]
  [1 1 1]]]

B의 전치행렬
 [[[2 1]
  [2 1]]

 [[3 1]
  [3 1]]

 [[4 1]
  [4 1]]]

- 3차원 텐서 활용 예시(이미지)

MNIST Dataset
28$\times$28 사이즈의 gray scale 이미지들로 구성
gray scale: 0~255의 값을 통해 밝기를 포현, 0으로 갈수록 어두워지고, 255로 갈수록 밝아짐

# %pip install keras
# %pip install tensorflow
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

print(train_images.ndim)
print(train_images.shape)
print(train_images.dtype)

# 출력 결과
3
(60000, 28, 28)
uint8

temp_image = train_images[3]
plt.imshow(temp_image, cmap = 'gray')
plt.show()

4. 브로드캐스팅(broadcasting)

넘파이에서 다른 형상(shape)끼리 계산 가능

더 작은 형상(shape)이 형상이 더 큰 배열에 확장이 가능해야함

(참고) 아래의 경우도 가능

# 1차원 텐서
a = np.array(10)
b = np.array([10, 20, 30])
print(np.dot(a, b))
print(a*b)

# 서로 shape이 다르지만 브로드캐스팅을 통해 계산이 가능하도록 만들어짐
# 출력 결과
[100 200 300]
[100 200 300]

# 2차원 텐서
A = np.array([[1, 2], [3, 4]])
B = np.array([10, 20])
print("행렬 A\n", A)
print("행렬 B\n", B)
print("A * B\n", A*B)

# 출력 결과
행렬 A
 [[1 2]
 [3 4]]
행렬 B
 [10 20]
A * B
 [[10 40]
 [30 80]]

# 3차원 텐서
A = np.array([[[1, 1, 1],
               [2, 2, 2]],
               [[3, 3, 3],
                [4, 4, 4]]])
B = np.array([[10, 10, 10]])

print("행렬 A\n", A)
print("A.shape:", A.shape)
print("행렬 B\n", B)
print("B.shape:", B.shape)
print("A * B\n", A*B)

# 출력 결과
행렬 A
 [[[1 1 1]
  [2 2 2]]

 [[3 3 3]
  [4 4 4]]]
A.shape: (2, 2, 3)
행렬 B
 [[10 10 10]]
B.shape: (1, 3)
A * B
 [[[10 10 10]
  [20 20 20]]

 [[30 30 30]
  [40 40 40]]]

# 브로드캐스팅 조건에 맞지 않는 경우
# 2*3 행렬
A = np.array([[1, 2, 3], [4, 5, 6]])
# 1*2 행렬
B = np.array([10, 10])
print(A*B)

# 출력 결과
ValueError: operands could not be broadcast together with shapes (2,3) (2,)

5. 4, 5차원 텐서

Color Image Datasets(4차원)
- (samples, heigth, width, channels) (Keras, Tensorflow)
- (samples, channels, height, width) (Pytorch)
동영상(5차원)
- (samples, frames, heigth, width, channels) (Keras, Tensorflow)
- (samples, frames, channels, height, width) (Pytorch)
- 예시 1) (4, 300, 1920, 1080, 3) → 1920×1080 사이즈 3채널의 300프레임 수를 가진 배치가 4

6. 텐서 크기 변환

reshape으로 텐서의 크기 변환 가능
변환 전의 원소의 개수와 변환 이후의 텐서의 개수가 같아야 함

A = np.array([[1, 2, 3], [4, 5, 6]])
print("행렬 A\n", A)
print("A.shape:", A.shape)
print("-------------------------")
# 그냥 1차원의 배열로 쭉 나열
A = A.reshape(6)
print("행렬 A\n", A)
print("A.shape:", A.shape)

# 출력 결과
행렬 A
 [[1 2 3]
 [4 5 6]]
A.shape: (2, 3)
-------------------------
행렬 A
 [1 2 3 4 5 6]
A.shape: (6,)

B = np.array([[[2, 3, 4], [2, 3, 4]],
              [[1, 1, 1], [1, 1, 1]]])
print("행렬 B\n", B)
print("B.shape:", B.shape)
print("-------------------------")
# 2*2*3 배열에서 3*4 배열로 변환
B = B.reshape(3, 4)
print("행렬 B\n", B)
print("B.shape:", B.shape)

# 출력 결과
행렬 B
 [[[2 3 4]
  [2 3 4]]

 [[1 1 1]
  [1 1 1]]]
B.shape: (2, 2, 3)
-------------------------
행렬 B
 [[2 3 4 2]
 [3 4 1 1]
 [1 1 1 1]]
B.shape: (3, 4)

-1을 통해 자동으로 형상을 지정 가능
원소의 개수에 맞게 넘파이가 자동으로 형상을 지정
- (2, 2, 3) → (3, -1) (o)
  → (2, 1, 6) (o)
  → (2, -1, -1) (x)
  → (2, 5, -1) (x)

B = np.array([[[2, 3, 4], [2, 3, 4]],
              [[1, 1, 1], [1, 1, 1]]])
print("행렬 B\n", B)
print("B.shape:", B.shape)
print("-------------------------")
# 행만 4로 지정하고 열은 넘파이가 자동으로 계산하여 4*3 배열로 변환
B = B.reshape(4, -1)
print("행렬 B\n", B)
print("B.shape:", B.shape)

# 출력 결과
행렬 B
 [[[2 3 4]
  [2 3 4]]

 [[1 1 1]
  [1 1 1]]]
B.shape: (2, 2, 3)
-------------------------
행렬 B
 [[2 3 4]
 [2 3 4]
 [1 1 1]
 [1 1 1]]
B.shape: (4, 3)

저작자표시 (새창열림)

'Python > Deep Learning' 카테고리의 다른 글

[딥러닝 기초] 신경망 학습 (0)	2023.03.14
[딥러닝 기초] 경사하강법 (0)	2023.03.13
[딥러닝 기초] 모델 학습과 손실 함수 (1)	2023.03.12
[딥러닝 기초] 신경망 구조 (0)	2023.03.09
[딥러닝 기초] 신경망 기초수학 (0)	2023.03.06

import math
import numpy as np
import matplotlib.pyplot as plt

1. 일차함수

$y$$=$$ax$$+$$b$
- $a$: 기울기, $b$: y절편
그래프 상에서 직선인 그래프(linear)

def linear_function(x):
    a = 0.5
    b = 2

    return a*x+b
    
print(linear_function(5))
# 0.5 * 5 + 2 = 4.5
# 출력 결과
4.5

# x에 -5부터 5까지 0.1단위로 증가시키며 리스트 생성
x = np.arange(-5, 5, 0.1)
# x의 모든 값을 linear_function에 넣어 각 값에 대한 y값을 생성시킴
y = linear_function(x)

# 각 x값에 대응하는 y값을 그린 그래프
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Linear Function')

2. 이차함수

$y=ax^2+bx+c$
일반적으로 두 개의 실근을 가짐

def quadratic_function(x):
    a = 1
    b = -1
    c = -2

    return a*x**2 + b*x + c 
    
print(quadratic_function(2))
# 1*2**2 + -1*2 -2 = 4 -2 -2 = 0
# 출력 결과
0

x = np.arange(-5, 5, 0.1)
y = quadratic_function(x)

plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Quadratic Function')

3. 삼차함수(다항함수)

$y=ax^3+bx^2+cx+d$

def cubic_function(x):
    a = 4
    b = 0
    c = -1
    d = -8

    return a*x**3 + b*x**2 + c*x + d
    
print(cubic_function(3))
# 4*3**3 + 0*3**2 + -1*3 + -8 = 108 + 0 + -3 + -8 = 97
# 출력 결과
97

x = np.arange(-5, 5, 0.1)
y = cubic_function(x)

plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Cubic Function')

4. 함수의 최소값 / 최대값

x = np.arange(-10, 10, 0.1)
y = my_func(x)

plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(1.5, my_func(1.5))
plt.text(1.5-1.5, my_func(1.5)+10, 'min value of f(x)\n({}, {})'.format(1.5, my_func(1.5)), fontdict={'size': 10})
plt.title('my_func')
plt.show()

min_val = min(y)
print(min_val)

# 출력 결과
7.75

5. 특정 구간 내에서 최소값 구하기

# x1과 x2 사이에서 최소값 구하기
def get_minimum(x1, x2, f):
    x = np.arange(x1, x2, 0.1)
    y = f(x)
    
    plt.plot(x, y)
    plt.xlabel('x')
    plt.ylabel('y')
    plt.title('get_minimum')
    plt.show()

    return min(y)

print(get_minimum(1, 4, my_func))

# 출력 결과
7.75

6. 지수함수 / 로그함수

지수함수·로그함수는 역함수 관계($y$$=$$x$ 직선 대칭, 단, 밑이 같을 때)
파이썬으로 직접 구현 가능

- 지수 함수

$y=a^{x} (a\ne0)$(기본형)
$y=e^{x} (e=2.71828)$

def exponential_function(x):
    a = 4
    return a**x
    
print(exponential_function(4))
print(exponential_function(0))

# 출력 결과
256
1

x = np.arange(-3, 2, 0.1)
y = exponential_function(x)

plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.ylim(-1, 15)
plt.xlim(-4, 3)
plt.title('exponential_function')
plt.show()

def exponential_fnction2(x):
    a = 4
    return math.pow(a, x)
    
print(exponential_function(4))
print(exponential_function(0))

# 출력 결과
256
1

- 밑이 $e$인 지수 함수 표현

# math 라이브러리나 numpy 라이브러리에 있는 exp 사용
print(math.exp(4))
print(np.exp(4))

# 출력 결과
54.598150033144236
54.598150033144236

- 로그 함수

$y=log_{a}(x) (a\ne1)$ (기본형)
$y=log_{10}(x)$ (상용로그)
$y=ln(x)$ (밑이 $e$인 자연로그)

# 밑이 3, 진수가 2인 로그
print(math.log(2, 3))
# 밑이 2인 로그
print(np.log2(4))
# np.log에서 log 다음 숫자가 없다면 밑이 e인 자연로그
print(np.log(4))

# 출력 결과
0.6309297535714574
2.0
1.3862943611198906

- 역함수 관계

$y$$=$$x$ 대칭

# 지수 함수
x1 = np.arange(-1, 5, 0.01)
y1 = np.exp(x1)

# 로그 함수
x2 = np.arange(0.000001, 5, 0.01)
y2 = np.log(x2)

# y=x의 직선
y3 = x1

plt.plot(x1, y1, 'r-', x2, y2, 'b-', x1, y3, 'k--')

plt.ylim(-2, 6)
plt.axvline(x = 0, color = 'k')
plt.axhline(y = 0, color = 'k')

plt.xlabel('x')
plt.ylabel('y')
plt.show()

7. 함수 조작

$y=-log_{a}(x)$와 $y=-log_{a}(1-x)$
$x=0.5$ 대칭
Logistic Regression을 위한 함수

x = np.arange(-10, 10, 0.01)
y1 = -np.log(x)
y2 = -np.log(1-x)

plt.axvline(x = 0, color = 'k')
plt.axhline(y = 0, color = 'k')

plt.grid()
plt.plot(x, y1, 'b-', x, y2, 'r-')
plt.text(0.9, 2.0, 'y=-log(1-x)', fontdict={'size': 15})
plt.text(0.1, 3, 'y=-log(x)', fontdict={'size': 15})
plt.xlim(-0.3, 1.4)
plt.ylim(-0.5, 4)
plt.scatter(0.5, -np.log(0.5))
plt.show()

8. 극한

x가 어떤 값 a에 가까이 다가갈 때 a에 '한없이 가까이 간다'일 뿐, a에 도달하지 않는다
극한을 표현할 때, 엡실론(epsilon)이라는 아주 작은 값(ex, 0.00001)등으로 표현

from sympy import *
init_printing()

x, y, z = symbols('x y z')
a, b, c, t = symbols('a b c t')

$\underset{x\to 1}{lim}(\frac{x^{3}-1}{x-1})=3$

print("극한값:", limit((x**3-1) / (x-1), x, 1))

# 출력 결과
극한값: 3

plot( ((x**3-1) / (x-1)), xlim = (-5, 5), ylim = (-1, 10))

$\underset{x\to \infty }{lim}(\frac{1+x}{x})=3$

print("극한값:", limit((1+x) / (x), x, oo))

# 출력 결과
극한값: 1

plot( ((1+x) / (x)), xlim = (-10, 10), ylim = (-5, 5))

$\underset{x\to 1 }{lim}(\frac{\sqrt{x+3}-2}{x-1})=\frac{1}{4}$

print("극한값:", limit((sqrt(x+3)-2) / (x-1), x, 1))

# 출력 결과
극한값: 1/4

plot( ((sqrt(x+3)-2) / (x-1)), xlim = (-5, 12), ylim = (-0.5, 1))

- 삼각함수의 극한

$\underset{x\to \frac{\pi }{2}+0 }{lim} \tan x=-\infty$
$\underset{x\to \frac{\pi }{2}-0 }{lim} \tan x=\infty$

print("극한값:", limit(tan(x), x, pi/2, '+'))
print("극한값:", limit(tan(x), x, pi/2, '-'))

# 출력 결과
극한값: -oo
극한값: oo

plot(tan(x), xlim = (-3.14, 3.14), ylim = (-6, 6))

$\underset{x\to 0 }{lim}\begin{pmatrix}\frac{\sin x}{x}\end{pmatrix}$

print("극한값:", limit(sin(x)/x, x, 0))

# 출력 결과
극한값: 1

plot(sin(x)/x, ylim = (-2, 2))

$\underset{x\to 0 }{lim} x\sin\begin{pmatrix}\frac{1}{x}\end{pmatrix}$

print("극한값:", limit(x * sin(1/x), x, 0))

# 출력 결과
극한값: 0

plot(x * sin(1/x), xlim = (-2, 2), ylim = (-1, 1.5))

- 지수함수, 로그함수의 극한

$\underset{x\to \infty }{lim} \begin{pmatrix}\frac{2^{x}-2^{-x}}{2^{x}+2^{-x}}\end{pmatrix}$

print("극한값:", limit((2**x - 2**(-x)) / (2**x + 2**(-x)), x, oo))

# 출력 결과
극한값: 1

plot((2**x - 2**(-x)) / (2**x + 2**(-x)), xlim = (-10, 10), ylim = (-3, 3))

$\underset{x\to \infty }{lim} (log_{2}(x+1)-log_{2}(x))=0$

print("극한값:", limit(log(x+1, 2) - log(x, 2), x, oo))

# 출력 결과
극한값: 0

plot(log(x, 2), log(x+1, 2), xlim = (-4, 6), ylim = (-4, 4))

- 자연로그($e$)의 밑

(1) $\underset{x\to \infty }{lim} \begin{pmatrix}1+\frac{1}{x}\end{pmatrix}^{x}=e$
(2) $\underset{x\to \infty }{lim} \begin{pmatrix}1+\frac{2}{x}\end{pmatrix}^{x}=e^{2}$
(3) $\underset{x\to 0 }{lim} \frac{(e^{x}-1)}{x}=1$
(4) $\underset{x\to 0 }{lim} \frac{ln(1+x)}{x}=1$

print('(1):', limit( (1+1/x)**x, x, oo ))
print('(2):', limit( (1+2/x)**x, x, oo ))
print('(3):', limit( (exp(x) -1) / x, x, 0))
print('(4):', limit( (ln(1+x) / x), x, 0))

# 출력 결과
(1): E
(2): exp(2)
(3): 1
(4): 1

# 그래프는 (4)만 출력
plot( ln(1+x) / x, xlim = (-4, 6), ylim = (-2, 8))

9. 미분

어떤 한 순간의 변화량을 표시한

- 미분과 기울기

어떤 함수를 나타내는 그래프에서 한 점의 미분값(미분계수)를 구하는 것은 해당 점에서의 접선을 의미
기울기는 방향성을 가짐
- 이용할 미분 식(수치 미분)
  $\frac{df(x)}{dx}=\displaystyle \lim_{x \to \infty}\frac{f(x+h)-f(x-h))}{2h}$
$h$는 아주 작은 수를 뜻하는데, 예를 들어 10$e$-50 정도의 수를 하면 파이썬은 이를 0.0으로 인식
따라서, 딥러닝에서 아주 작은 수를 정할 때, 1$e$-4 정도로 설정해도 무방

# 미분 계산하는 함수 정의
def numerical_differential(f, x):
    h = 1e-4
    return (f(x+h)-f(x-h)) / (2*h)

- 함수 위의 점 $(a, b)$에서의 접선의 방정식

예제: 점 (1, 7)에서의 기울기

# 직접 지정한 2차함수식
def my_func(x):
    return 2*x**2 + 3*x + 2

# 1차 선형식
# a: 직선의 x좌표
# b: 직선의 y좌표
# c: 직선의 기울기
def linear_func(a, b, c, x):
    return c*(x-a) + b

c = numerical_differential(my_func, 1)

x = np.arange(-5, 5, 0.01)
y = linear_func(1, my_func(1), c, x)

plt.xlabel('x')
plt.ylabel('y')
plt.scatter(1, my_func(1))
plt.plot(x, my_func(x), x, y, 'r-')
plt.title('f(x) & linear function')
plt.show()

- 미분 공식

$\frac{d}{dx}(c)=0$ ($c$는 상수)
$\frac{d}{dx}[cf(x)]=cf'(x)$
$\frac{d}{dx}[f(x)+g(x)]=f'(x)+g'(x)$
$\frac{d}{dx}[f(x)-g(x)]=f'(x)-g'(x)$
$\frac{d}{dx}[f(x)g(x)]=f(x)g'(x)+f'(x)g(x)$ (곱셈 공식)
$\frac{d}{dx}[\frac{f(x)}{g(x)}]=\frac{g(x)f'(x)-f(x)g'(x)}{[g(x)]^{2}}$
$\frac{d}{dx}[x^{n}]=nx^{n}-1$

- 편미분

변수가 1개짜리인 위의 수치미분과 달리, 변수가 2개 이상일 때의 미분법을 편미분이라 함
다변수 함수에서 특정 변수를 제외한 나머지 변수는 상수로 처리하여 미분을 하는 것
각 변수에 대해 미분 표시를 $\sigma $를 통해 나타냄
ex) $f(x_{0}, x_{1})=x\tfrac{2}{0}+x\tfrac{2}{1}$

from mpl_toolkits.mplot3d import Axes3D

x = np.arange(-4.0, 4.0, 0.4)
y = np.arange(-4.0, 4.0, 0.4)
X, Y = np.meshgrid(x, y)
Z = X**2 + Y**2

fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z, rstride = 1, cstride = 1, cmap = 'coolwarm')
ax.set_title('f(x, y) = x**2 + y**2')
plt.show()

편미분 예제1: $x_{0}$에 대한 편미분, $\frac{\sigma f}{\sigma x_{0}}$

x = np.array([1, 2])

def f0_function(x0):
    return (x0**2) + (2**2)

print(numerical_differential(f0_function, x[0]))

# 출력 결과
1.9999999999997797

편미분 예제2: $x_{1}$에 대한 편미분, $\frac{\sigma f}{\sigma x_{1}}$

x = np.array([1, 2])

def f1_function(x1):
    return (1**2) + (x1**2)

print(numerical_differential(f1_function, x[1]))

# 출력 결과
4.000000000004

- 기울기(gradient)

방향성을 가짐

# 기울기를 나타내는 함수
def numerical_diff(f, x):
    h = 1e-5
    # x와 모양은 같지만 전부 0으로 채워진 배열
    grad = np.zeros_like(x)

    for i in range(x.size):
        tmp = x[i]

        x[i] = tmp + h
        fxh1 = f(x)

        x[i] = tmp - h
        fxh2 = f(x)

        grad[i] = (fxh1 - fxh2) / (2*h)
        x[i] = tmp

        return grad

print(numerical_diff(my_func2, np.array([3.0, 4.0])))
print(numerical_diff(my_func2, np.array([1.0, 2.0])))

# 출력 결과
[6. 0.]
[2. 0.]

- 기울기의 의미를 그래프로 확인

기울기가 가장 낮은 장소(가운데)로부터 거리가 멀어질수록 기울기가 커짐
기울기가 커진다는 것은 영향을 많이 받는다는 의미
기울기가 작다는 것은 영향을 적게 받는다는 의미

X = np.arange(-20, 20, 2)
Y = np.arange(-20, 20, 2)
U, V = np.meshgrid(X, Y)

fig, ax = plt.subplots()
# 기울기 표시할 때 화살표로 표시할 수 있는 함수
q = ax.quiver(X, Y, U, V)
ax.quiverkey(q, X = 0.4, Y = 1.0, U = 20, label = 'Quiver Key', labelpos = 'E')
plt.grid()
plt.show()

저작자표시 (새창열림)

'Python > Deep Learning' 카테고리의 다른 글

[딥러닝 기초] 신경망 학습 (0)	2023.03.14
[딥러닝 기초] 경사하강법 (0)	2023.03.13
[딥러닝 기초] 모델 학습과 손실 함수 (1)	2023.03.12
[딥러닝 기초] 신경망 구조 (0)	2023.03.09
[딥러닝 기초] 신경망 데이터 표현 (0)	2023.03.08

감으로 코딩하던 내가 알고 코딩할 때까지