[딥러닝 기초] CNN(합성곱 신경망)(1)

2023. 3. 23. 12:45

● 합성곱 신경망(Convolutional Neural Networks, CNNs)

이미지 인식, 음성 인식 등에 자주 사용됨,
특히, 이미지 인식 분야에서 거의 모든 딥러닝 기술에 사용

https://medium.com/@pechyonkin/key-deep-learning-architectures-lenet-5-6fc3c59e6f4

- 완전 연결 계층과의 차이

완전 연결 계층(Fully-Connected Layer)은 이미지와 같은 데이터의 형상(3차원)을 무시함
모든 입력 데이터를 동긍하게 취급
즉, 데이터의 특징을 잃어버림
컨볼루션층(convolution layer)은 이미지 픽셀 사이의 관계를 고려
완전 연결 계층은 공간 정보를 손실하지만, 컨볼루션층은 공간 정보를 유지
- 이미지와 같은 2차원(흑백) 또는 3차원(컬러)의 형상을 유지
- 공간 정보를 유지하기 때문에 완전 연결 계층에 비해 적은 수의 파라미터를 요구

- 컨볼루션 신경망 구조 예시

https://www.oreilly.com/library/view/neural-network-projects/9781789138900/8e87ad66-6de3-4275-81a4-62b54436bf16.xhtml

1. 합성곱 연산

필터(filter) 연산
- 입력 데이터에 필터를 통한 어떠한 연산을 진행
- 필터에 대응하는 원소끼리 곱하고, 그 합을 구함
- 연산이 완료된 결과 데이터를 특징 맵(feature map)이라 부름
필터(filter)
- 커널(kernel)이라고도 칭함
- 흔히 사진 어플에서 사용하는 이미지 필터와 비슷한 개념
- 필터의 사이즈는 거의 항상 홀수
  - 짝수면 패딩이 비대팅이 되어버림
  - 왼쪽, 오른쪽을 다르게 주어야 함
  - 중심 위치가 존재, 즉 구별된 하나의 필셀(중심 픽셀)이 존재
- 필터의 학습 파라미터 개수는 입력 데이터의 크기와 상관없이 일정,
  따라서, 과적합 방지 가능

http://incredible.ai/artificial-intelligence/2016/06/12/Convolutional-Neural-Networks-Part1.5/

- 연산 시각화

https://www.researchgate.net/figure/An-example-of-convolution-operation-in-2D-2_fig3_324165524

- 일반적으로 합성곱 연산을 한 후의 데이터 사이즈는

$ \qquad (n-f+1) \times (n-f+1) $

$ n $: 입력 데이터의 크기

$ f $: 필터(커널)의 크기

- 예시

입력 데이터의 크기($ n $)는 5, 필터의 크기($ k $)는 3이므로, 출력 데이터의 크기는 $ (5-3+1)=3 $

https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1

2. 패딩(padding)과 스트라이드(stride)

필터(커널) 사이즈와 함께 입력 이미지와 출력 이미지의 사이즈를 결정하기 위해 사용
사용자가 결정할 수 있음

- 패딩

입력 데이터의 주변을 특정 값으로 채우는 기법
- 주로 0으로 많이 채움

https://tensorflow.blog/a-guide-to-convolution-arithmetic-for-deep-learning/

출력 데이터의 크기
$ \qquad (n+2p-f+1) \times (n+2p-f+1) $
위 그림에서, 입력 데이터의 크기($ n $)는 5, 필터의 크기($ f $)는 4, 패딩값($ p $)은 2이므로
출력 데이터의 크기는 $ (5+2 \times 2-4+1)=6 $

- 'valid'와 'same'

'valid'
- 패딩을 주지 않음
- padding=0 (0으로 채워진 테두리가 아니라 패딩을 주지 않는다는 의미)
'same'
- 패딩을 주어 입력 이미지의 크기와 연산 후의 이미지 크기를 같게 함
- 만약, 필터(커널)의 크기가 $ k $이면,
  패딩의 크기는 $ p=\frac{k-1}{2} $(단, stride=1)

- 스트라이드

필터를 적용하는 간격을 의미
아래는 그림의 간격 2

- 출력 데이터의 크기

$$ OH=\frac{H+2P-FH}{S}+1 $$

$$ OW=\frac{W+2P-FW}{S}+1 $$

입력 크기: $ (H, W) $
필터 크기: $ (FH, FW) $
출력 크기: $ (OH, OW) $
패딩, 스트라이드: $ P, S $
(주의)
- 위 식의 값에서 $ \frac{H+2P-FH}{S} $또는 $ \frac{W+2P-FW}{S} $가 정수로 나누어 떨어지는 값이어야 함
- 만약, 정수로 나누어 떨어지지 않으면
  패딩, 스트라이드 값을 조정하여 정수로 나누어 떨어지게 해야함

3. 풀링(Pooling)

필터(커널)의 사이즈 내에서 특정 값을 추출하는 과정

- 맥스 풀링(Max Pooling)

가장 많이 사용되는 방법
출력 데이터의 사이즈 계산은 컨볼루션 연산과 동일

$$ OH=\frac{H+2P-FH}{S}+1 $$

$$ OW=\frac{W+2P-FW}{S}+1 $$

일반적으로 stride=2, kernel_size=2를 통해 특징맵의 크기를 절반으로 줄이는 역할
모델이 물체의 주요한 특징을 학습할 수 있도록 해주며,
컨볼루션 신경망이 이동 불변성 특징을 가지게 해줌
- 예를 들어, 아래의 그림에서 초록색 사각형 안에 있는 2와 8의 위치를 바꾼다해도 맥스 풀링 연산은 8을 추출
모델의 파라미더 개수를 줄여주고, 연산 속도를 빠르게 해줌

https://cs231n.github.io/convolutional-networks/

- 평균 풀링 (Avg Pooling)

필터 내에 있는 픽셀값의 평균을 구하는 과정
과거에 많이 사용, 요즘은 잘 사용되지 않음
맥스 풀링과 마찬가지로 stride=1, kernel_size=2를 통해 특징맵의 사이즈를 줄이는 역할

https://www.researchgate.net/figure/Average-pooling-example_fig21_329885401

4. 합성곱 연산의 의미

- 2차원 이미지에 대한 필터 연산 예시

가장 자리 검출(Edge-Detection)
소벨 필터(Sobel Filter)
- Horizontal: 가로 방향의 미분을 구하는 필터 역할
- Vertical: 세로 방향의 미분을 구하는 필터 역할

https://www.cloras.com/blog/image-recognition/

module import

# %pip install opencv-python
import cv2
import numpy as np
import matplotlib.pyplot as plt
import urllib
import requests
from io import BytesIO

util functions

def url_to_image(url, gray = False):
    resp = urllib.request.urlopen(url)
    image = np.asarray(bytearray(resp.read()), dtype = 'uint8')

    if gray == True:
        image = cv2.imdecode(image, cv2.IMREAD_GRAYSCALE)
    else:
        image = cv2.imdecode(image, cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    return image

def filtered_image(image, filter, output_size):
    filtered_img = np.zeros((output_size, output_size))
    filter_size = filter.shape[0]

    for i in range(output_size):
        for j in range(output_size):
            # 이미지의 각 픽셀 (i, j)를 돌면서 합성곱 연산을 진행 후, 마지막에 합성곱 연산으로 filtering된 이미지 return
            multiply_values = image[i:(i+filter_size), j:(j+filter_size)] * filter
            sum_value = np.sum(multiply_values)

            if (sum_value > 255):
                sum_value = 255
            
            filtered_img[i, j] = sum_value
    
    return filtered_img

이미지 확인(예시이므로 정사각형 사이즈로 진행)

# 이미지 처리계의 hello world같은 이미지
img_url = "https://upload.wikimedia.org/wikipedia/ko/thumb/2/24/Lenna.png/440px-Lenna.png"
image = url_to_image(img_url, gray = True)
print("image.shape:", image.shape)

plt.imshow(image, cmap = "gray")
plt.show()

image.shape: (440, 440)

필터 연산 적용

vertical_filter = np.array([[1., 2., 1.],
                            [0., 0., 0.,],
                            [-1., -2., -1.]])
horizontal_filter = np.array([[1., 0., -1.],
                              [2., 0., -2.],
                              [1., 0., -1.]])
output_size = int((image.shape[0] - 3) / 1 + 1)
print("output_size:", output_size)

vertical_filtered = filtered_image(image, vertical_filter, output_size)
horizontal_filtered = filtered_image(image, horizontal_filter, output_size)

plt.figure(figsize = (10, 10))
plt.subplot(1, 2, 1)
plt.title("Vertical")
plt.imshow(vertical_filtered, cmap = 'gray')

plt.subplot(1, 2, 2)
plt.title("Horizontal")
plt.imshow(horizontal_filtered, cmap = 'gray')
plt.show()

output_size: 438

이미지 필터를 적용한 최종 결과

# vertical, horizontal 두 개의 필터 연산 결과를 제곱하여 더한 뒤 루트로 제곱근을 구한 연산 시행
sobel_img = np.sqrt(np.square(horizontal_filtered) + np.square(vertical_filtered))

plt.imshow(sobel_img, cmap = 'gray')

- 3차원 데이터의 합성곱 연산

이미지는 3차원으로 구성
- (가로, 세로, 채널수)
- 채널: RGB
색상값의 정도에 따라 color 결정

https://www.projectorcentral.com/All-About-Bit-Depth.htm?page=What-Bit-Depth-Looks-Like

이미지 확인

img_url = "https://upload.wikimedia.org/wikipedia/ko/thumb/2/24/Lenna.png/440px-Lenna.png"
# 위에서 흑백으로 출력했을 때와 다르게 'gray = True' 옵션을 제외
image = url_to_image(img_url)
print("image.shape:", image.shape)

# 출력 시에서 "cmap = 'gray'" 옵션을 제외
plt.imshow(image)
plt.show()

image.shape: (440, 440, 3)

# Red Image
image_copy = image.copy()
# 이미지의 shape의 세번째 인덱스는 R, G, B로 된 값이고 현재 3으로 세가지 값 모두 존재
# 아래에서 1, 2를 0으로 만들어 G, B 값을 빼주면 R만 남은 이미지를 만들 수 있음
image_copy[:, :, 1] = 0
image_copy[:, :, 2] = 0
image_red = image_copy

plt.imshow(image_red)
plt.show()

image_copy = image.copy()
# 아래에서 0, 2를 0으로 만들어 R, B 값을 빼주면 G만 남은 이미지를 만들 수 있음
image_copy[:, :, 0] = 0
image_copy[:, :, 2] = 0
image_green = image_copy

plt.imshow(image_green)
plt.show()

image_copy = image.copy()
# 아래에서 0, 1을 0으로 만들어 R, G 값을 빼주면 B만 남은 이미지를 만들 수 있음
image_copy[:, :, 0] = 0
image_copy[:, :, 1] = 0
image_blue = image_copy

plt.imshow(image_blue)
plt.show()

# 한번에 띄우고 흑백 이미지와 비교
fig = plt.figure(figsize = (12, 8))

title_list = ['R', 'G', 'B',
              'R - grayscale', 'G - grayscale', 'B - grayscale']
image_list = [image_red, image_green, image_blue,
              image_red[:, :, 0], image_green[:, :, 1], image_blue[:, :, 2]]

for i, image in enumerate(image_list):
    ax = fig.add_subplot(2, 3, i+1)
    ax.title.set_text("{}".format(title_list[i]))

    if i >= 3:
        plt.imshow(image, cmap = 'gray')
    else:
        plt.imshow(image)

plt.show()

- 연산 과정

각 채널마다 컨볼루션 연산을 적용
- 3채널을 모두 합쳐서 '하나의 필터'라고 칭함

각각의 결과를 더함

더한 결과에 편향을 더함

modules import

# %pip install opencv-python
import cv2
import numpy as np
import matplotlib.pyplot as plt
import urllib
import requests
from io import BytesIO

util functions

def url_to_image(url, gray = False):
    resp = urllib.request.urlopen(url)
    image = np.asarray(bytearray(resp.read()), dtype = 'uint8')

    if gray == True:
        image = cv2.imdecode(image, cv2.IMREAD_GRAYSCALE)
    else:
        image = cv2.imdecode(image, cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    return image

def conv_op(image, kernel, pad = 0, stride = 1):
    H, W, C = image.shape
    kernel_size = kernel.shape[0]

    out_h = (H + 2*pad - kernel_size) // stride + 1
    out_w = (W + 2*pad - kernel_size) // stride + 1

    filtered_img = np.zeros((out_h, out_w))
    img = np.pad(image, [(pad, pad), (pad, pad), (0, 0)], 'constant')

    for i in range(out_h):
        for j in range(out_w):
            for c in range(C):
                multiply_values = image[i:(i + kernel_size), j:(j + kernel_size), c] * kernel
                sum_value = np.sum(multiply_values)

                filtered_img[i, j] += sum_value
    
    filtered_img = filtered_img.reshape(1, out_h, out_w, -1).transpose(0, 3, 1, 2)

    return filtered_img.astype(np.uint8)

이미지 확인

img_url = "https://upload.wikimedia.org/wikipedia/ko/thumb/2/24/Lenna.png/440px-Lenna.png"
image = url_to_image(img_url)
print("image.shape:", image.shape)

plt.imshow(image)
plt.show()

필터연산 적용
- 3×3 크기의 3채널 필터 5개
- (5, 3, 3, 3) → (5개, 3채널, 세로, 가로)

# 예시 1
filter1 = np.random.randn(3, 3, 3)

print(filter1.shape)
print(filter1)

# 출력 결과
(3, 3, 3)
[[[ 1.03527724 -0.91961541  1.12674622]
  [ 0.90570621  2.43452234 -0.58178937]
  [-0.20276794 -0.69609947 -0.22246946]]

 [[-0.19455091  0.96691228  1.18181353]
  [-0.75600052 -2.92070965  0.42929136]
  [-0.43024675  0.61458207 -0.52046698]]

 [[ 0.82826973  0.55922214  0.27557231]
  [-0.47029333 -0.53727015  1.44036126]
  [-0.74869707  1.89852507  1.45523256]]]

(1, 1, 438, 438)

# 예시 2
filter2 = np.random.randn(3, 3, 3)

print(filter2.shape)
print(filter2)

# 출력 결과
(3, 3, 3)
[[[ 1.03641458  1.4153158  -0.56486124]
  [-0.1553772  -1.86455138 -0.00522765]
  [ 0.1220599   0.43514984  0.32804735]]

 [[ 0.81778856  1.64887384 -1.29579815]
  [-0.45742362 -0.23823593  1.17207619]
  [ 0.29878226  0.02336725 -0.95649443]]

 [[-0.97517188  0.91275201 -1.00159311]
  [-1.80679889 -0.40762195 -2.10950021]
  [ 1.94690784 -0.80022143 -0.04150088]]]

filtered_img2 = conv_op(image, filter2)
print(filtered_img1.shape)

plt.figure(figsize = (10, 10))
plt.subplot(1, 2, 1)
plt.title("Used Filter")
plt.imshow(filter2, cmap = 'gray')

plt.subplot(1, 2, 2)
plt.title("Result")
plt.imshow(filtered_img2[0, 0, :, :], cmap = 'gray')
plt.show()

(1, 1, 438, 438)

필터연산을 적용한 최종 결과

# 위의 예시 전부 sum
filtered_img = np.stack([filtered_img1, filtered_img2]).sum(axis = 0)
print(filtered_img.shape)

plt.imshow(filtered_img[0, 0, :, :], cmap = 'gray')
plt.show()

(1, 1, 438, 438)

전체 과정 한번에 보기

# 5개의 랜덤 필터를 만들고
np.random.seed(222)

fig = plt.figure(figsize = (8, 20))

filter_num = 5
filtered_img = []

for i in range(filter_num):
    ax = fig.add_subplot(5, 2, 2*i+1)
    ax.title.set_text("Filter {}".format(i + 1))

    filter = np.random.randn(3, 3, 3)
    plt.imshow(filter)

    ax = fig.add_subplot(5, 2, 2*i+2)
    ax.title.set_text("Result")

    filtered = conv_op(image, filter)
    filtered_img.append(filtered)
    plt.imshow(filtered[0, 0, :, :], cmap = 'gray')

plt.show()


# 만들어진 필터를 sum하여 컨볼루션 연산 하는 과정을 한번에 작성
filtered_img = np.stack(filtered_img).sum(axis = 0)
print(filtered_img.shape)

plt.imshow(filtered_img[0, 0, :, :], cmap = 'gray')
plt.show()

저작자표시

'Python > Deep Learning' 카테고리의 다른 글

[딥러닝 기초] 자연어 처리 (0)	2023.03.27
[딥러닝 기초] CNN(합성곱 신경망)(2) (0)	2023.03.23
[딥러닝 기초] 딥러닝 학습 기술 (2) (0)	2023.03.22
[딥러닝 기초] 딥러닝 학습 기술 (1) (0)	2023.03.21
[딥러닝 기초] 오차역전파(Backpropagation) (0)	2023.03.15

감으로 코딩하던 내가 알고 코딩할 때까지