[딥러닝-케라스] 케라스 순환신경망(RNN, LSTM, GRU)

2023. 6. 3. 16:28

1. 순환 신경망(Recurrent Neural Network, RNN)

루프(loop)를 가진 신경망의 한 종류
시퀀스의 원소를 순회하면서 지금가지 처리한 정보를 상태(state)에 저장

https://aditi-mittal.medium.com/understanding-rnn-and-lstm-f7cdf6dfc14e

- 순환 신경망 레이어(RNN Layer)

입력: (timesteps, input_features)
출력: (timesteps, output_features)

# numpy로 RNN 구조 표현
import numpy as np

timesteps = 100
input_features = 32
output_features = 64

inputs = np.random.random((timesteps, input_features))

state_t = np.zeros((output_features, ))

W = np.random.random((output_features, input_features))
U = np.random.random((output_features, output_features))
b = np.random.random((output_features, ))

sucessive_outputs = []

for input_t in inputs:
    output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)
    sucessive_outputs.append(output_t)
    state_t = output_t

final_output_sequence = np.stack(sucessive_outputs, axis = 0)

- 케라스의 순환층

SimpleRNN layer
입력: (batch_size, timesteps, input_features)
출력
- return_sequences로 결정할 수 있음
- 3D 텐서
  - timesteps의 출력을 모든 전체 sequences를 반환
  - (batch_size, timesteps, output_features)
- 2D 텐서
  - 입력 sequence에 대한 마지막 출력만 반환
  - (batch_size, output_features)

from tensorflow.keras.layers import SimpleRNN, Embedding
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32))  # SimpleRNN 안에 return_sequences = True옵션을 추가하면 전체 sequences를 return시켜줌
model.summary()

# 출력 결과
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, None, 32)          320000    
                                                                 
 simple_rnn (SimpleRNN)      (None, 32)                2080      
                                                                 
=================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________

네트워크의 표현력을 증가시키기 위해 여러 개의 순환층을 차례대로 쌓는 것이 유용할 때가 있음
- 이런 설정에서는 중간층들이 전체 출력 sequences를 반환하도록 설정

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences = True))
model.add(SimpleRNN(32, return_sequences = True))
model.add(SimpleRNN(32, return_sequences = True))
model.add(SimpleRNN(32))
model.summary()

# 출력 결과
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_2 (Embedding)     (None, None, 32)          320000    
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, None, 32)          2080      
                                                                 
 simple_rnn_3 (SimpleRNN)    (None, None, 32)          2080      
                                                                 
 simple_rnn_4 (SimpleRNN)    (None, None, 32)          2080      
                                                                 
 simple_rnn_5 (SimpleRNN)    (None, 32)                2080      
                                                                 
=================================================================
Total params: 328,320
Trainable params: 328,320
Non-trainable params: 0
_________________________________________________________________

- LMDB 데이터 적용

- 데이터 로드

from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

num_words = 10000
max_len = 500
batch_size = 32

(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words = num_words)
print(len(input_train))  # 25000
print(len(input_test))   # 25000

input_train = sequence.pad_sequences(input_train, maxlen = max_len)
input_test = sequence.pad_sequences(input_test, maxlen = max_len)
print(input_train.shape) # (25000, 500)
print(input_test.shape)  # (25000, 500)

- 모델 구성

from tensorflow.keras.layers import Dense

model = Sequential()

model.add(Embedding(num_words, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['acc'])

model.summary()

# 출력 결과
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_3 (Embedding)     (None, None, 32)          320000    
                                                                 
 simple_rnn_6 (SimpleRNN)    (None, 32)                2080      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
=================================================================
Total params: 322,113
Trainable params: 322,113
Non-trainable params: 0
_________________________________________________________________

- 모델 학습

history = model.fit(input_train, y_train,
                    epochs = 10,
                    batch_size = 128,
                    validation_split = 0.2)

- 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['acc']
val_acc = history.history['val_acc']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

model.evaluate(input_test, y_test)

# 출력 결과
loss: 0.6755 - acc: 0.7756
[0.6754735112190247, 0.7755600214004517]

전체 sequences가 아니라 순서대로 500개의 단어만 입력했기 때문에 성능이 낮게 나옴
simpleRNN은 긴 sequence를 처리하는데 적합하지 않음

2. LSTM과 GRU 레이어

Simple RNN은 실전에 사용하기엔 너무 단순
SimpleRNN은 이론적으로 시간 t에서 이전의 모든 timesteps의 정보를 유지할 수 있지만, 실제로는 긴 시간에 걸친 의존성은 학습할 수 없음
그레디언트 소실 문제(vanishing gradient problem)
- 이를 방지하기 위해 LSTM, GRU 같은 레이어 등장

- LSTM(Long-Short-Term Memory)

장단기 메모리 알고리즘
나중을 위해 정보를 저장함으로써 오래된 시그널이 점차 소실되는 것을 막아줌

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

- 예제 1) Reyters

IMDB와 유사한 데이터셋(텍스트 데이터)
46개의 상호 배타적인 토픽으로 이루어진 데이터셋
- 다중 분류 문제

- 데이터셋 로드

from tensorflow.keras.datasets import reuters

num_words = 10000
(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words = num_words)

print(x_train.shape) # (8982,)
print(y_train.shape) # (8982,)
print(x_test.shape)  # (2246,)
print(y_test.shape)  # (2246,)

- 데이터 전처리 및 확인

from tensorflow.keras.preprocessing.sequence import pad_sequences

max_len = 500

pad_x_train = pad_sequences(x_train, maxlen = max_len)
pad_x_test = pad_sequences(x_test, maxlen = max_len)

print(len(pad_x_train[0]))  # 500

pad_x_train[0]

- 모델 구성

LSTM 레이어도 SimpleRNN과 같이 return_sequences 인자 사용 가능

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding

model = Sequential()
model.add(Embedding(input_dim = num_words, output_dim = 64))
model.add(LSTM(64, return_sequences = True))
model.add(LSTM(32))
model.add(Dense(46, activation = 'softmax'))

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['acc'])
model.summary()

# 출력 결과
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_1 (Embedding)     (None, None, 64)          640000    
                                                                 
 lstm (LSTM)                 (None, None, 64)          33024     
                                                                 
 lstm_1 (LSTM)               (None, 32)                12416     
                                                                 
 dense (Dense)               (None, 46)                1518      
                                                                 
=================================================================
Total params: 686,958
Trainable params: 686,958
Non-trainable params: 0
_________________________________________________________________

- 모델 학습

history = model.fit(pad_x_train, y_train,
                    epochs = 20,
                    batch_size = 32,
                    validation_split = 0.2)

- 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['acc']
val_acc = history.history['val_acc']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

- 모델 평가

model.evaluate(pad_x_test, y_test)

# 출력 결과
loss: 1.6927 - acc: 0.6336
[1.692732810974121, 0.6335707902908325]

- 예제 2) IMDB 데이터셋

- 데이터 로드

from tensorflow.keras.datasets import imdb
from tensorflow.kears.preprocessing.sequence import pad_sequences

num_words = 10000
max_len = 500
batch_size = 32

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = num_words)

pad_x_train = sequence.pad_sequences(x_train, maxlen = max_len)
pad_x_test = sequence.pad_sequences(x_test, maxlen = max_len)

- 모델 구성

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding

model = Sequential()
model.add(Embedding(num_words, 32))
model.add(LSTM(32))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['acc'])
model.summray()

# 출력 결과
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_3 (Embedding)     (None, None, 32)          320000    
                                                                 
 lstm_3 (LSTM)               (None, 32)                8320      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 328,353
Trainable params: 328,353
Non-trainable params: 0
_________________________________________________________________

- 모델 학습

history = model.fit(pad_x_train, y_train,
                    epochs = 10,
                    batch_size = 128,
                    validation_split = 0.2)

- 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['acc']
val_acc = history.history['val_acc']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

- 모델 평가

model.evaluate(pad_x_test, y_test)

# 출력 결과
loss: 0.9135 - acc: 0.7898
[0.9135046601295471, 0.7898399829864502]

LSTM 쓰기전, SimpleRNN을 썻을 때 loss가 0.6755, acc가 0.7756으로 나온 것에 비해 좋은 결과가 나옴

3. Cosine 함수를 이용한 순환 신경망

# 코사인 시계열 데이터
import numpy as np

np.random.seed(111)
time = np.arange(30 * 12 + 1)
month_time = (time % 30) / 30
time_series = 20 * np.where(month_time < 0.5,
                            np.cos(2 * np.pi * month_time),
                            np.cos(2 * np.pi * month_time) + np.random.random(361))
plt.figure(figsize = (15, 8))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(np.arange(0, 30 * 11 + 1),
         time_series[:30 * 11 + 1],
         color = 'blue', alpha = 0.6, label = 'Train Data')
plt.plot(np.arange(30 * 11, 30 * 12 + 1),
         time_series[30 * 11:],
         color = 'orange', label = 'Test Data')
plt.show()

- 데이터 전처리

def make_data(time_series, n):
    x_train_full, y_train_full = list(), list()

    for i in range(len(time_series)):
        x = time_series[i:(i + n)]
        if (i + n) < len(time_series):
            x_train_full.append(x)
            y_train_full.append(time_series[i + n])
        else:
            break
    
    x_train_full, y_train_full = np.array(x_train_full), np.array(y_train_full)

    return x_train_full, y_train_full

n = 10
x_train_full, y_train_full = make_data(time_series, n)

print(x_train_full.shape) # (351, 10)
print(y_train_full.shape) # (351,)


# 뒤에 1씩 추가
x_train_full = x_train_full.reshape(-1, n, 1)
y_train_full = y_train_full.reshape(-1, n, 1)

print(x_train_full.shape) # (351, 10, 1)
print(y_train_full.shape) # (351, 1)

- 테스트 데이터셋 생성

x_train_full = x_train_full.reshape(-1, n, 1)
y_train_full = y_train_full.reshape(-1, n, 1)

print(x_train_full.shape)
print(y_train_full.shape)


# train 데이터와 test 데이터 분리
x_train = x_train_full[:30 * 11]
y_train = y_train_full[:30 * 11]

x_test = x_train_full[30 * 11:]
y_test = y_train_full[30 * 11:]

print(x_train.shape) # (330, 10, 1)
print(y_train.shape) # (330, 1)
print(x_test.shape)  # (21, 10, 1)
print(y_test.shape)  # (21, 10, 1)

- 데이터 확인

sample_series = np.arange(100)
a, b = make_data(sample_series, 10)

print(a[0])  # [0 1 2 3 4 5 6 7 8 9]
print(b[0])  # 10

- 모델 구성

from tensorflow.keras.layers import SimpleRNN, Flatten, Dense
from tensorflow.keras.models import Sequential

def build_model(n):
    model = Sequential()

    model.add(SimpleRNN(units = 32, activation = 'tanh', input_shape = (n, 1)))
    model.add(Dense(1))

    model.compile(optimizer = 'adam',
                  loss = 'mse')
    return model

model = build_model(10)
model.summary()

# 출력 결과
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 simple_rnn (SimpleRNN)      (None, 32)                1088      
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 1,121
Trainable params: 1,121
Non-trainable params: 0
_________________________________________________________________

- 모델 학습

model.fit(x_train, y_train,
          epochs = 100, batch_size = 12)

- 예측값 그려보기

prediction = model.predict(x_test)

pred_range = np.arange(len(y_train), len(y_train) + len(prediction))

plt.figure(figsize = (12, 5))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(pred_range, y_test.flatten(), color = 'orange', label = 'Ground Truth')
plt.plot(pred_range, prediction.flatten(), color = 'blue', label = 'Prediction')
plt.legend()
plt.show()

- 모델 재구성

LSTM 사용

from tensorflow.keras.layers import LSTM

def build_model2(n):
    model = Sequential()

    model.add(LSTM(units = 64, return_sequences = True, input_shape = (n, 1)))
    model.add(LSTM(32))
    model.add(Dense(1))

    model.compile(optimizer = 'adam',
                  loss = 'mse')
    return model

model2 = build_model2(10)
model2.summary()

# 출력 결과
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_4 (LSTM)               (None, 10, 64)            16896     
                                                                 
 lstm_5 (LSTM)               (None, 32)                12416     
                                                                 
 dense_5 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 29,345
Trainable params: 29,345
Non-trainable params: 0
_________________________________________________________________

- 모델 재학습 및 예측값 그려보기

model2.fit(x_train, y_train,
           epochs = 100, batch_size = 12)

prediction_2 = model_2.predict(x_test)

pred_range = np.arange(len(y_train), len(y_train) + len(prediction_2))

plt.figure(figsize = (12, 5))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(pred_range, y_test.flatten(), color = 'orange', label = 'Ground Truth')
plt.plot(pred_range, prediction.flatten(), color = 'r:', label = 'Model1 Prediction')
plt.plot(pred_range, prediction_2.flatten(), color = 'blue', label = 'Model2 Prediction')
plt.legend()
plt.show()

- 모델 재구성

GRU 사용(LSTM보다 더 쉬운 구조)

from tensorflow.keras.layers import GRU

def build_model3(n):
    model = Sequential()

    model.add(GRU(units = 30, return_sequences = True, input_shape = (n, 1)))
    model.add(GRU(30))
    model.add(Dense(1))

    model.compile(optimizer = 'adam',
                  loss = 'mse')
    return model

model_3 = build_model3(10)
model_3.summary()

# 출력 결과
Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_6 (LSTM)               (None, 10, 64)            16896     
                                                                 
 lstm_7 (LSTM)               (None, 32)                12416     
                                                                 
 dense_6 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 29,345
Trainable params: 29,345
Non-trainable params: 0
_________________________________________________________________

- 모델 재학습 및 예측값 그려보기

model_3.fit(x_train, y_train,
           epochs = 100, batch_size = 12)

prediction_3 = model_3.predict(x_test)

pred_range = np.arange(len(y_train), len(y_train) + len(prediction_3))

plt.figure(figsize = (12, 5))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(pred_range, y_test.flatten(), color = 'orange', label = 'Ground Truth')
plt.plot(pred_range, prediction.flatten(), color = 'r:', label = 'Model1 Prediction')
plt.plot(pred_range, prediction_2.flatten(), color = 'blue', label = 'Model2 Prediction')
plt.plot(pred_range, prediction_2.flatten(), color = 'blue', label = 'Model3 Prediction')
plt.legend()
plt.show()

- Conv1D

텍스트 분류나 시계열 예측같은 간단한 문제, 오디오 생성, 기계 번역 등의 문제에서 좋은 성능
timestep의 순서에 민감 X
2D Convolution
- 지역적 특징을 인식
2D Convolution
- 문맥을 인식

- Conv1D Layer

입력: (batch_size, timesteps, channels)
출력: (batch_size, timesteps, filters)
필터의 사이즈가 커져도 모델이 급격히 증가하지 않기 때문에 다양한 크기를 사용할 수 있음
데이터의 품질이 좋으면 굳이 크기를 달리하여 여러 개를 사용하지 않아도 될 수도 있음

- MaxPooling1D Layer

다운 샘플링 효과
단지 1차원형태로 바뀐 것 뿐

- GlovalMaxPooling Layer

배치 차원을 제외하고 2차원 형태를 1차원 형태로 바꾸어주는 레이어
Flatten layer로 대신 사용가능

- IMDB 데이터셋

- 데이터 로드 및 전처리

from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Dense, Embedding, Conv1D, MaxPooling1D, GlobalMaxPooling1D

num_words = 10000
max_len = 500
batch_size = 32

(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words = num_words)

print(len(input_train))  # 25000
print(len(input_test))   # 25000

pad_x_train = pad_sequences(input_train, maxlen = max_len)
pad_x_test = pad_sequences(input_test, maxlen = max_len)

print(pad_x_train.shape) # (25000, 500)
print(pad_x_test.shape)  # (25000, 500)

-모델 구성

def build_model():
    model = Sequential()

    model.add(Embedding(input_dim = num_words, output_dim = 32,
                        input_length = max_len))
    model.add(Conv1D(32, 7, activation = 'relu'))
    model.add(MaxPooling1D(7))
    model.add(Conv1D(32, 5, activation = 'relu'))
    model.add(MaxPooling1D(5))
    model.add(GlobalMaxPooling1D())
    model.add(Dense(1, activation = 'sigmoid'))

    model.compile(optimizer = RMSprop(learning_rate = 1e-4),
                  loss ='binary_crossentropy',
                  metrics = ['accuracy'])
    
    return model

model = build_model()
model.summary()

# 출력 결과
Model: "sequential_13"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_5 (Embedding)     (None, 500, 32)           320000    
                                                                 
 conv1d_2 (Conv1D)           (None, 494, 32)           7200      
                                                                 
 max_pooling1d_2 (MaxPooling  (None, 70, 32)           0         
 1D)                                                             
                                                                 
 conv1d_3 (Conv1D)           (None, 66, 32)            5152      
                                                                 
 max_pooling1d_3 (MaxPooling  (None, 13, 32)           0         
 1D)                                                             
                                                                 
 global_max_pooling1d_1 (Glo  (None, 32)               0         
 balMaxPooling1D)                                                
                                                                 
 dense_12 (Dense)            (None, 1)                 33        
                                                                 
=================================================================
Total params: 332,385
Trainable params: 332,385
Non-trainable params: 0
_________________________________________________________________

- 모델 학습

history = model.fit(pad_x_train, y_train,
                    epochs = 30,
                    batch_size = 128,
                    validation_split = 0.2)

- 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

model.evaluate(pad_x_test, y_test)

# 출력 결과
loss: 0.3534 - accuracy: 0.8526
[0.35335206985473633, 0.8525999784469604]

과적합이 일어났지만, 다른 optimizer 사용, 규제화를 걸어보는 등 다양하게 시도해볼 수 있음

저작자표시

'Python > Deep Learning' 카테고리의 다른 글

[딥러닝-케라스] 케라스 텍스트 처리 및 임베딩(2) (0)	2023.05.18
[딥러닝-케라스] 케라스 텍스트 처리 및 임베딩(1) (0)	2023.05.17
[딥러닝-케라스] 케라스 전이학습 (1)	2023.05.17
[딥러닝-케라스] 케라스 CIFAR10 CNN 모델 (0)	2023.05.16
[딥러닝-케라스] 케라스 Fashion MNIST CNN 모델 (0)	2023.05.14

감으로 코딩하던 내가 알고 코딩할 때까지

[딥러닝-케라스] 케라스 순환신경망(RNN, LSTM, GRU)

1. 순환 신경망(Recurrent Neural Network, RNN)

2. LSTM과 GRU 레이어

3. Cosine 함수를 이용한 순환 신경망

'Python > Deep Learning' 카테고리의 다른 글

+ Recent posts

티스토리툴바