1. 순환 신경망(Recurrent Neural Network, RNN)

  • 루프(loop)를 가진 신경망의 한 종류
  • 시퀀스의 원소를 순회하면서 지금가지 처리한 정보를 상태(state)에 저장

https://aditi-mittal.medium.com/understanding-rnn-and-lstm-f7cdf6dfc14e

 

  - 순환 신경망 레이어(RNN Layer)

  • 입력: (timesteps, input_features)
  • 출력: (timesteps, output_features)
# numpy로 RNN 구조 표현
import numpy as np

timesteps = 100
input_features = 32
output_features = 64

inputs = np.random.random((timesteps, input_features))

state_t = np.zeros((output_features, ))

W = np.random.random((output_features, input_features))
U = np.random.random((output_features, output_features))
b = np.random.random((output_features, ))

sucessive_outputs = []

for input_t in inputs:
    output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)
    sucessive_outputs.append(output_t)
    state_t = output_t

final_output_sequence = np.stack(sucessive_outputs, axis = 0)

 

  - 케라스의 순환층

  • SimpleRNN layer
  • 입력: (batch_size, timesteps, input_features)
  • 출력
    • return_sequences로 결정할 수 있음
    • 3D 텐서
      • timesteps의 출력을 모든 전체 sequences를 반환
      • (batch_size, timesteps, output_features)
    • 2D 텐서
      • 입력 sequence에 대한 마지막 출력만 반환
      • (batch_size, output_features)
from tensorflow.keras.layers import SimpleRNN, Embedding
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32))  # SimpleRNN 안에 return_sequences = True옵션을 추가하면 전체 sequences를 return시켜줌
model.summary()

# 출력 결과
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, None, 32)          320000    
                                                                 
 simple_rnn (SimpleRNN)      (None, 32)                2080      
                                                                 
=================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________
  • 네트워크의 표현력을 증가시키기 위해 여러 개의 순환층을 차례대로 쌓는 것이 유용할 때가 있음
    • 이런 설정에서는 중간층들이 전체 출력 sequences를 반환하도록 설정
model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences = True))
model.add(SimpleRNN(32, return_sequences = True))
model.add(SimpleRNN(32, return_sequences = True))
model.add(SimpleRNN(32))
model.summary()

# 출력 결과
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_2 (Embedding)     (None, None, 32)          320000    
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, None, 32)          2080      
                                                                 
 simple_rnn_3 (SimpleRNN)    (None, None, 32)          2080      
                                                                 
 simple_rnn_4 (SimpleRNN)    (None, None, 32)          2080      
                                                                 
 simple_rnn_5 (SimpleRNN)    (None, 32)                2080      
                                                                 
=================================================================
Total params: 328,320
Trainable params: 328,320
Non-trainable params: 0
_________________________________________________________________

 

  - LMDB 데이터 적용

  - 데이터 로드

from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

num_words = 10000
max_len = 500
batch_size = 32

(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words = num_words)
print(len(input_train))  # 25000
print(len(input_test))   # 25000

input_train = sequence.pad_sequences(input_train, maxlen = max_len)
input_test = sequence.pad_sequences(input_test, maxlen = max_len)
print(input_train.shape) # (25000, 500)
print(input_test.shape)  # (25000, 500)

 

  - 모델 구성

from tensorflow.keras.layers import Dense

model = Sequential()

model.add(Embedding(num_words, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['acc'])

model.summary()

# 출력 결과
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_3 (Embedding)     (None, None, 32)          320000    
                                                                 
 simple_rnn_6 (SimpleRNN)    (None, 32)                2080      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
=================================================================
Total params: 322,113
Trainable params: 322,113
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 학습

history = model.fit(input_train, y_train,
                    epochs = 10,
                    batch_size = 128,
                    validation_split = 0.2)

 

 

  - 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['acc']
val_acc = history.history['val_acc']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

model.evaluate(input_test, y_test)

# 출력 결과
loss: 0.6755 - acc: 0.7756
[0.6754735112190247, 0.7755600214004517]
  • 전체 sequences가 아니라 순서대로 500개의 단어만 입력했기 때문에 성능이 낮게 나옴
  • simpleRNN은 긴 sequence를 처리하는데 적합하지 않음

 

 

2. LSTM과 GRU 레이어

  • Simple RNN은 실전에 사용하기엔 너무 단순
  • SimpleRNN은 이론적으로 시간 t에서 이전의 모든 timesteps의 정보를 유지할 수 있지만, 실제로는 긴 시간에 걸친 의존성은 학습할 수 없음
  • 그레디언트 소실 문제(vanishing gradient problem)
    • 이를 방지하기 위해 LSTM, GRU 같은 레이어 등장

 

  - LSTM(Long-Short-Term Memory)

  • 장단기 메모리 알고리즘
  • 나중을 위해 정보를 저장함으로써 오래된 시그널이 점차 소실되는 것을 막아줌

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

 

  - 예제 1) Reyters

  • IMDB와 유사한 데이터셋(텍스트 데이터)
  • 46개의 상호 배타적인 토픽으로 이루어진 데이터셋
    • 다중 분류 문제

  - 데이터셋 로드

from tensorflow.keras.datasets import reuters

num_words = 10000
(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words = num_words)

print(x_train.shape) # (8982,)
print(y_train.shape) # (8982,)
print(x_test.shape)  # (2246,)
print(y_test.shape)  # (2246,)

 

  - 데이터 전처리 및 확인

from tensorflow.keras.preprocessing.sequence import pad_sequences

max_len = 500

pad_x_train = pad_sequences(x_train, maxlen = max_len)
pad_x_test = pad_sequences(x_test, maxlen = max_len)

print(len(pad_x_train[0]))  # 500

pad_x_train[0]

 

  - 모델 구성

  • LSTM 레이어도 SimpleRNN과 같이 return_sequences 인자 사용 가능
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding

model = Sequential()
model.add(Embedding(input_dim = num_words, output_dim = 64))
model.add(LSTM(64, return_sequences = True))
model.add(LSTM(32))
model.add(Dense(46, activation = 'softmax'))

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['acc'])
model.summary()

# 출력 결과
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_1 (Embedding)     (None, None, 64)          640000    
                                                                 
 lstm (LSTM)                 (None, None, 64)          33024     
                                                                 
 lstm_1 (LSTM)               (None, 32)                12416     
                                                                 
 dense (Dense)               (None, 46)                1518      
                                                                 
=================================================================
Total params: 686,958
Trainable params: 686,958
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 학습

history = model.fit(pad_x_train, y_train,
                    epochs = 20,
                    batch_size = 32,
                    validation_split = 0.2)

 

  - 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['acc']
val_acc = history.history['val_acc']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

 

  - 모델 평가

model.evaluate(pad_x_test, y_test)

# 출력 결과
loss: 1.6927 - acc: 0.6336
[1.692732810974121, 0.6335707902908325]

 

  - 예제 2) IMDB 데이터셋

  - 데이터 로드

from tensorflow.keras.datasets import imdb
from tensorflow.kears.preprocessing.sequence import pad_sequences

num_words = 10000
max_len = 500
batch_size = 32

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = num_words)

pad_x_train = sequence.pad_sequences(x_train, maxlen = max_len)
pad_x_test = sequence.pad_sequences(x_test, maxlen = max_len)

 

  - 모델 구성

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding

model = Sequential()
model.add(Embedding(num_words, 32))
model.add(LSTM(32))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['acc'])
model.summray()

# 출력 결과
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_3 (Embedding)     (None, None, 32)          320000    
                                                                 
 lstm_3 (LSTM)               (None, 32)                8320      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 328,353
Trainable params: 328,353
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 학습

history = model.fit(pad_x_train, y_train,
                    epochs = 10,
                    batch_size = 128,
                    validation_split = 0.2)

 

  - 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['acc']
val_acc = history.history['val_acc']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

 

  - 모델 평가

model.evaluate(pad_x_test, y_test)

# 출력 결과
loss: 0.9135 - acc: 0.7898
[0.9135046601295471, 0.7898399829864502]
  • LSTM 쓰기전, SimpleRNN을 썻을 때 loss가 0.6755, acc가 0.7756으로 나온 것에 비해 좋은 결과가 나옴

 

 

3. Cosine 함수를 이용한 순환 신경망

# 코사인 시계열 데이터
import numpy as np

np.random.seed(111)
time = np.arange(30 * 12 + 1)
month_time = (time % 30) / 30
time_series = 20 * np.where(month_time < 0.5,
                            np.cos(2 * np.pi * month_time),
                            np.cos(2 * np.pi * month_time) + np.random.random(361))
plt.figure(figsize = (15, 8))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(np.arange(0, 30 * 11 + 1),
         time_series[:30 * 11 + 1],
         color = 'blue', alpha = 0.6, label = 'Train Data')
plt.plot(np.arange(30 * 11, 30 * 12 + 1),
         time_series[30 * 11:],
         color = 'orange', label = 'Test Data')
plt.show()

 

  - 데이터 전처리

def make_data(time_series, n):
    x_train_full, y_train_full = list(), list()

    for i in range(len(time_series)):
        x = time_series[i:(i + n)]
        if (i + n) < len(time_series):
            x_train_full.append(x)
            y_train_full.append(time_series[i + n])
        else:
            break
    
    x_train_full, y_train_full = np.array(x_train_full), np.array(y_train_full)

    return x_train_full, y_train_full

n = 10
x_train_full, y_train_full = make_data(time_series, n)

print(x_train_full.shape) # (351, 10)
print(y_train_full.shape) # (351,)


# 뒤에 1씩 추가
x_train_full = x_train_full.reshape(-1, n, 1)
y_train_full = y_train_full.reshape(-1, n, 1)

print(x_train_full.shape) # (351, 10, 1)
print(y_train_full.shape) # (351, 1)

 

  - 테스트 데이터셋 생성

x_train_full = x_train_full.reshape(-1, n, 1)
y_train_full = y_train_full.reshape(-1, n, 1)

print(x_train_full.shape)
print(y_train_full.shape)


# train 데이터와 test 데이터 분리
x_train = x_train_full[:30 * 11]
y_train = y_train_full[:30 * 11]

x_test = x_train_full[30 * 11:]
y_test = y_train_full[30 * 11:]

print(x_train.shape) # (330, 10, 1)
print(y_train.shape) # (330, 1)
print(x_test.shape)  # (21, 10, 1)
print(y_test.shape)  # (21, 10, 1)

 

  - 데이터 확인

sample_series = np.arange(100)
a, b = make_data(sample_series, 10)

print(a[0])  # [0 1 2 3 4 5 6 7 8 9]
print(b[0])  # 10

 

  - 모델 구성

from tensorflow.keras.layers import SimpleRNN, Flatten, Dense
from tensorflow.keras.models import Sequential

def build_model(n):
    model = Sequential()

    model.add(SimpleRNN(units = 32, activation = 'tanh', input_shape = (n, 1)))
    model.add(Dense(1))

    model.compile(optimizer = 'adam',
                  loss = 'mse')
    return model

model = build_model(10)
model.summary()

# 출력 결과
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 simple_rnn (SimpleRNN)      (None, 32)                1088      
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 1,121
Trainable params: 1,121
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 학습

model.fit(x_train, y_train,
          epochs = 100, batch_size = 12)

 

  - 예측값 그려보기

prediction = model.predict(x_test)

pred_range = np.arange(len(y_train), len(y_train) + len(prediction))

plt.figure(figsize = (12, 5))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(pred_range, y_test.flatten(), color = 'orange', label = 'Ground Truth')
plt.plot(pred_range, prediction.flatten(), color = 'blue', label = 'Prediction')
plt.legend()
plt.show()

 

  - 모델 재구성

  • LSTM 사용
from tensorflow.keras.layers import LSTM

def build_model2(n):
    model = Sequential()

    model.add(LSTM(units = 64, return_sequences = True, input_shape = (n, 1)))
    model.add(LSTM(32))
    model.add(Dense(1))

    model.compile(optimizer = 'adam',
                  loss = 'mse')
    return model

model2 = build_model2(10)
model2.summary()

# 출력 결과
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_4 (LSTM)               (None, 10, 64)            16896     
                                                                 
 lstm_5 (LSTM)               (None, 32)                12416     
                                                                 
 dense_5 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 29,345
Trainable params: 29,345
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 재학습 및 예측값 그려보기

model2.fit(x_train, y_train,
           epochs = 100, batch_size = 12)

prediction_2 = model_2.predict(x_test)

pred_range = np.arange(len(y_train), len(y_train) + len(prediction_2))

plt.figure(figsize = (12, 5))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(pred_range, y_test.flatten(), color = 'orange', label = 'Ground Truth')
plt.plot(pred_range, prediction.flatten(), color = 'r:', label = 'Model1 Prediction')
plt.plot(pred_range, prediction_2.flatten(), color = 'blue', label = 'Model2 Prediction')
plt.legend()
plt.show()

 

  - 모델 재구성

  • GRU 사용(LSTM보다 더 쉬운 구조)
from tensorflow.keras.layers import GRU

def build_model3(n):
    model = Sequential()

    model.add(GRU(units = 30, return_sequences = True, input_shape = (n, 1)))
    model.add(GRU(30))
    model.add(Dense(1))

    model.compile(optimizer = 'adam',
                  loss = 'mse')
    return model

model_3 = build_model3(10)
model_3.summary()

# 출력 결과
Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_6 (LSTM)               (None, 10, 64)            16896     
                                                                 
 lstm_7 (LSTM)               (None, 32)                12416     
                                                                 
 dense_6 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 29,345
Trainable params: 29,345
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 재학습 및 예측값 그려보기

model_3.fit(x_train, y_train,
           epochs = 100, batch_size = 12)

prediction_3 = model_3.predict(x_test)

pred_range = np.arange(len(y_train), len(y_train) + len(prediction_3))

plt.figure(figsize = (12, 5))
plt.xlabel('Time')
plt.ylabel('Value')
plt.plot(pred_range, y_test.flatten(), color = 'orange', label = 'Ground Truth')
plt.plot(pred_range, prediction.flatten(), color = 'r:', label = 'Model1 Prediction')
plt.plot(pred_range, prediction_2.flatten(), color = 'blue', label = 'Model2 Prediction')
plt.plot(pred_range, prediction_2.flatten(), color = 'blue', label = 'Model3 Prediction')
plt.legend()
plt.show()

 

  - Conv1D

  • 텍스트 분류나 시계열 예측같은 간단한 문제, 오디오 생성, 기계 번역 등의 문제에서 좋은 성능
  • timestep의 순서에 민감 X
  • 2D Convolution
    • 지역적 특징을 인식
  • 2D Convolution
    • 문맥을 인식

 

  - Conv1D Layer

  • 입력: (batch_size, timesteps, channels)
  • 출력: (batch_size, timesteps, filters)
  • 필터의 사이즈가 커져도 모델이 급격히 증가하지 않기 때문에 다양한 크기를 사용할 수 있음
  • 데이터의 품질이 좋으면 굳이 크기를 달리하여 여러 개를 사용하지 않아도 될 수도 있음

 

  - MaxPooling1D Layer

  • 다운 샘플링 효과
  • 단지 1차원형태로 바뀐 것 뿐

 

  - GlovalMaxPooling Layer

  • 배치 차원을 제외하고 2차원 형태를 1차원 형태로 바꾸어주는 레이어
  • Flatten layer로 대신 사용가능

 

  - IMDB 데이터셋

  - 데이터 로드 및 전처리

from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Dense, Embedding, Conv1D, MaxPooling1D, GlobalMaxPooling1D

num_words = 10000
max_len = 500
batch_size = 32

(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words = num_words)

print(len(input_train))  # 25000
print(len(input_test))   # 25000

pad_x_train = pad_sequences(input_train, maxlen = max_len)
pad_x_test = pad_sequences(input_test, maxlen = max_len)

print(pad_x_train.shape) # (25000, 500)
print(pad_x_test.shape)  # (25000, 500)

 

  -모델 구성

def build_model():
    model = Sequential()

    model.add(Embedding(input_dim = num_words, output_dim = 32,
                        input_length = max_len))
    model.add(Conv1D(32, 7, activation = 'relu'))
    model.add(MaxPooling1D(7))
    model.add(Conv1D(32, 5, activation = 'relu'))
    model.add(MaxPooling1D(5))
    model.add(GlobalMaxPooling1D())
    model.add(Dense(1, activation = 'sigmoid'))

    model.compile(optimizer = RMSprop(learning_rate = 1e-4),
                  loss ='binary_crossentropy',
                  metrics = ['accuracy'])
    
    return model

model = build_model()
model.summary()

# 출력 결과
Model: "sequential_13"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_5 (Embedding)     (None, 500, 32)           320000    
                                                                 
 conv1d_2 (Conv1D)           (None, 494, 32)           7200      
                                                                 
 max_pooling1d_2 (MaxPooling  (None, 70, 32)           0         
 1D)                                                             
                                                                 
 conv1d_3 (Conv1D)           (None, 66, 32)            5152      
                                                                 
 max_pooling1d_3 (MaxPooling  (None, 13, 32)           0         
 1D)                                                             
                                                                 
 global_max_pooling1d_1 (Glo  (None, 32)               0         
 balMaxPooling1D)                                                
                                                                 
 dense_12 (Dense)            (None, 1)                 33        
                                                                 
=================================================================
Total params: 332,385
Trainable params: 332,385
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 학습

history = model.fit(pad_x_train, y_train,
                    epochs = 30,
                    batch_size = 128,
                    validation_split = 0.2)

 

  - 시각화

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'train loss')
plt.plot(epochs, val_loss, 'r:', label = 'validation loss')
plt.grid()
plt.legend()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'train accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'validation accuracy')
plt.grid()
plt.legend()

model.evaluate(pad_x_test, y_test)

# 출력 결과
loss: 0.3534 - accuracy: 0.8526
[0.35335206985473633, 0.8525999784469604]
  • 과적합이 일어났지만, 다른 optimizer 사용, 규제화를 걸어보는 등 다양하게 시도해볼 수 있음

9. Keras에서 Word2Vec 직접 학습

  - 데이터 준비

from tensorflow.keras.datasets import imdb

(x_train, y_train), (x_test, y_test) = imdb.load_data()
  • 단어 번호와 단어의 관계를 사전으로 만듦
  • 1번은 문장의 시작, 2번은 사전에 없는 단어(OOV)로 미리 지정
word_index = imdb.get_word_index()
index_word = {idx + 3 : word for word, idx in word_index.items()}

index_word[1] = '<START>'
index_word[2] = '<UNKNOWN>'

' '.join(index_word[i] for i in x_train[0])

# 출력 결과
"<START> this film was just brilliant casting location scenery story direction everyone's really
suited the part they played and you could just imagine being there robert redford's is an
amazing actor and now the same being director norman's father came from the same scottish
island as myself so i loved the fact there was a real connection with this film the witty
remarks throughout the film were great it was just brilliant so much that i bought the film
as soon as it was released for retail and would recommend it to everyone to watch and the fly
fishing was amazing really cried at the end it was so sad and you know what they say if you
cry at a film it must have been good and this definitely was also congratulations to the two
little boy's that played the part's of norman and paul they were just brilliant children are
often left out of the praising list i think because the stars that play them all grown up are
such a big profile for the whole film but these children are amazing and should be praised for
what they have done don't you think the whole story was so lovely because it was true and was
someone's life after all that was shared with us all"
num_words = max(index_word) + 1

 

  - 텍스트를 단어 번호로 바꾸기

texts = []
for data in x_train:
    text = ' '.join(index_word[i] for i in data)
    texts.append(text)

len(texts)  # 25000
  • Tokenizer를 사용해 텍스트를 단어로 바꿈
from keras.preprocessing.text import Tokenizer

tok = Tokenizer()
tok.fit_on_texts(texts)

new_data = tok.texts_to_sequences(texts)
new_data[0][:10]

# 출력 결과
[28, 11, 19, 13, 41, 526, 968, 1618, 1381, 63]
# 모든 데이터 문장을 토큰화하고 위의 문장을 그 토큰으로 바꾼뒤 10개만 출력

 

  - 단어쌍 만들기

from tensorflow.keras.preprocessing.sequence import make_sampling_table, skipgrams

# 전제 토큰 개수
VOCAB_SIZE = len(tok.word_index)
print(VOCAB_SIZE)  # 88581
  • 단어를 무작위로 추출하면 자주 나오는 단어가 더 많이 나오게 됨
  • 이를 방지하기위해 단어를 추출할 확률의 균형을 맞춘 샘플링 표를 생성
table = make_sampling_table(VOCAB_SIZE)
  • 두 단어씩 뽑아 좌우 2단어(window_size = 2)안에 들어있는 경우가 있는지 없는지 확인하며 데이터 생성
couples, labels = skipgrams(data, VOCAB_SIZE, window_size = 2, sampling_table = table)
couples[:5]

# 출력 결과
[[16876, 497], [9685, 21], [16876, 21917], [383, 5452], [2098, 13577]]
  • labels에는 윈도우 안에 들어있는 경우가 있으면 1, 없으면 0
labels[:5]

# 출력 결과
[1, 1, 0, 0, 0]
  • 대상 단어는 word_target으로, 맥락 단어는 word_context로 모음
word_target, word_context = zip(*couples)
  • 배열로 바꿈
word_target = np.asarray(word_target, dtype = 'int32')
word_context = np.asarray(word_context, dtype = 'int32')
labels = np.asarray(labels, dtype = 'int32')

word_target.shape    # (288,)
word_context.shape   # (288,)

 

  - skip-gram 모형

  • skip-gram 모형은 함수형 API를 사용해야 함
from tensorflow.keras.layers import Activation, Dot, Embedding, Flatten, Input, Reshape
from tensorflow.keras.models import Model

def build_model():
    input_target = Input(shape = (1, ))
    input_context = Input(shape = (1, ))

    emb = Embedding(input_dim = VOCAB_SIZE, output_dim = 8)
    target = emb(input_target)
    context = emb(input_context)

    dot = Dot(axes = 2)([target, context])
    flatten = Reshape((1, ))(dot)
    output = Activation('sigmoid')(flatten)
    skipgram = Model(inputs = [input_target, input_context], outputs = output)

    return skipgram

model = build_model()
model.summary()

# 출력 결과
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_3 (InputLayer)           [(None, 1)]          0           []                               
                                                                                                  
 input_4 (InputLayer)           [(None, 1)]          0           []                               
                                                                                                  
 embedding_5 (Embedding)        (None, 1, 8)         708648      ['input_3[0][0]',                
                                                                  'input_4[0][0]']                
                                                                                                  
 dot (Dot)                      (None, 1, 1)         0           ['embedding_5[0][0]',            
                                                                  'embedding_5[1][0]']            
                                                                                                  
 reshape (Reshape)              (None, 1)            0           ['dot[0][0]']                    
                                                                                                  
 activation (Activation)        (None, 1)            0           ['reshape[0][0]']                
                                                                                                  
==================================================================================================
Total params: 708,648
Trainable params: 708,648
Non-trainable params: 0
__________________________________________________________________________________________________

 

  - 모델 컴파일 및 학습

from tensorflow.keras.optimizers import Adam

model.compile(optimizer = Adam(),
              loss = 'binary_crossentropy',
              metrics = ['accuracy'])

model.fit([word_target, word_context], labels, epochs = 30)

 

  - 임베딩 레이어 저장 및 로드

emb = model.layers[2]
emb.get_weights()

# 출력 결과
[array([[ 0.01938832,  0.01921825, -0.0462908 , ...,  0.01147114,
         -0.04764376,  0.01121316],
        [-0.01068624, -0.04315212,  0.00839611, ..., -0.02030395,
         -0.02321514, -0.03680412],
        [ 0.00915837,  0.00973357,  0.00904005, ...,  0.01291057,
          0.04295233,  0.0488804 ],
        ...,
        [ 0.01314208,  0.02786795,  0.01130085, ...,  0.03705814,
          0.0427903 ,  0.0109529 ],
        [-0.03585767, -0.04641544, -0.02590518, ..., -0.00451361,
         -0.03019956,  0.01893195],
        [ 0.00769577, -0.02014879, -0.03623866, ..., -0.03457584,
         -0.02138668,  0.02141118]], dtype=float32)]
# 임베딩 레이어 저장
np.save('emb.npy', emb.get_weights()[0])
  • 임베딩 레이어 로드
w = np.load('emb.npy')
  • 임베딩 레이어를 추가할 때 trainable을 False로 하면 추가학습이 이루어 지지 않음
emb_ff = Embedding(input_dim = num_words, output_dim = 8, input_length = 30,
                   weights = [w], trainable = False)

 

 

10. 사전 훈련된 단어 임베딩 사용하기: GloVe 임베딩

  - 원본 IMDB 텍스트 내려받기

import wget
import os
import zipfile

wget.download("http://mng.bz/0tIo")

local_zip = '0tIo'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall()
zip_ref.close()

imdb_dir = "aclImdb"
train_dir = os.path.join(imdb_dir, 'train')

labels = []
texts = []
for label_type in ['neg', 'pos']:
    dir_name = os.path.join(train_dir, label_type)

    for fname in os.listdir(dir_name):
        if fname[-4:] == '.txt':
            f = open(os.path.join(dir_name, fname), encoding = 'utf-8')
            texts.append(f.read())
            f.close()

            if label_type == 'neg':
                labels.append(0)
            else:
                labels.append(1)

texts[0]

# 출력 결과
"Story of a man who has unnatural feelings for a pig. Starts out with a opening scene that is
a terrific example of absurd comedy. A formal orchestra audience is turned into an insane,
violent mob by the crazy chantings of it's singers. Unfortunately it stays absurd the WHOLE
time with no general narrative eventually making it just too off putting. Even those from the
era should be turned off. The cryptic dialogue would make Shakespeare seem easy to a third grader.
On a technical level it's better than you might think with some good cinematography by future
great Vilmos Zsigmond. Future stars Sally Kirkland and Frederic Forrest can be seen briefly."

labels[0]  # 0(부정적인 리뷰)

 

  - 데이터 토큰화

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

max_len = 100
training_samples = 200
validation_samples = 10000
max_words = 10000

tokenizer = Tokenizer(num_words = max_words)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)

word_index = tokenizer.word_index
print(len(word_index))  # 88582
data = pad_sequences(sequences, maxlen = max_len)
labels = np.asarray(labels)

print(data.shape)    # (25000, 100)
print(labels.shape)  # (25000,)
indices = np.arange(data.shape[0])
np.random.shuffle(indices)
data = data[indices]
labels = labels[indices]

x_train = data[:training_samples]
y_train = labels[:training_samples]
x_val = data[training_samples : training_samples + validation_samples]
y_val = labels[training_samples : training_samples + validation_samples]

print(x_train.shape)  # (200, 100)
print(y_train.shape)  # (200,)
print(x_val.shape)    # (10000, 100)
print(y_val.shape)    # (10000,)

 

  - GloVe 단어 임베딩 내려받기

import wget

wget.download("http://nlp.stanford.edu/data/glove.6B.zip")

# 압축풀기
local_zip = 'glove.6B.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall()
zip_ref.close()

 

  - 임베딩 전처리

  • GloVe 파싱
# 데이터를 라인 단위로 불러오기
glove_dir = "glove.6B"
embeddings_index = {}
f = open(os.path.join(glove_dir, 'glove.6B.100d.txt'), encoding = 'utf8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype = 'float32')
    embeddings_index[word] = coefs

f.close()

print(len(embeddings_index))  # 400000
embedding_dim = 100
embedding_mat = np.zeros((max_words, embedding_dim))
for word, i in word_index.items():
    if i < max_words:
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_mat[i] = embedding_vector

embedding_mat

# 출력 결과
array([[ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.038194  , -0.24487001,  0.72812003, ..., -0.1459    ,
         0.82779998,  0.27061999],
       [-0.071953  ,  0.23127   ,  0.023731  , ..., -0.71894997,
         0.86894   ,  0.19539   ],
       ...,
       [ 0.13787   , -0.17727   , -0.62436002, ...,  0.35506001,
         0.33443999,  0.14436001],
       [-0.88968998,  0.55208999, -0.50498998, ..., -0.54351002,
        -0.21874   ,  0.51186001],
       [-0.17381001, -0.037609  ,  0.068837  , ..., -0.097167  ,
         1.08840001,  0.22676   ]])

 

  - 모델 정의

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Flatten, Dense

model = Sequential()

model.add(Embedding(max_words, embedding_dim, input_length = max_len))
model.add(Flatten())
model.add(Dense(32, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))
model.summary()

# 출력 결과
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_7 (Embedding)     (None, 100, 100)          1000000   
                                                                 
 flatten_2 (Flatten)         (None, 10000)             0         
                                                                 
 dense_2 (Dense)             (None, 32)                320032    
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 1,320,065
Trainable params: 1,320,065
Non-trainable params: 0
_________________________________________________________________
# 가중치 설정
model.layers[0].set_weights([embedding_mat])

# 학습하지 않고 기존의 가중치값 그대로 사용
model.layers[0].trainable = False
model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['accuracy'])

history = model.fit(x_train, y_train,
                    epochs = 10,
                    batch_size = 32,
                    validation_data = (x_val, y_val))

# 모델 저장
model.save_weights('pre_trained_glove_model.h5')

 

  - 시각화

loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'Training Loss')
plt.plot(epochs, val_loss, 'r:', label = 'Validaiton Loss')
plt.legend()
plt.grid()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'Training Accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'Validaiton Accuracy')
plt.legend()
plt.grid()

 

11. 사전 훈련된 단어 임베딩을 사용하지 않고 같은 모델 훈련

model2 = Sequential()

model2.add(Embedding(max_words, embedding_dim, input_length = max_len))
model2.add(Flatten())
model2.add(Dense(32, activation = 'relu'))
model2.add(Dense(1, activation = 'sigmoid'))
model2.summary()

# 출력 결과
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_8 (Embedding)     (None, 100, 100)          1000000   
                                                                 
 flatten_3 (Flatten)         (None, 10000)             0         
                                                                 
 dense_4 (Dense)             (None, 32)                320032    
                                                                 
 dense_5 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 1,320,065
Trainable params: 1,320,065
Non-trainable params: 0
_________________________________________________________________
model2.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['accuracy'])
history2 = model2.fit(x_train, y_train,
                    epochs = 10,
                    batch_size = 32,
                    validation_data = (x_val, y_val))

loss = history2.history['loss']
val_loss = history2.history['val_loss']
acc = history2.history['accuracy']
val_acc = history2.history['val_accuracy']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'b--', label = 'Training Loss')
plt.plot(epochs, val_loss, 'r:', label = 'Validaiton Loss')
plt.legend()
plt.grid()

plt.figure()
plt.plot(epochs, acc, 'b--', label = 'Training Accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'Validaiton Accuracy')
plt.legend()
plt.grid()

 

  - 테스트 데이터 토큰화

test_dir = os.path.join(imdb_dir, 'test')

labels = []
texts = []
for label_type in ['neg', 'pos']:
    dir_name = os.path.join(test_dir, label_type)

    for fname in os.listdir(dir_name):
        if fname[-4:] == '.txt':
            f = open(os.path.join(dir_name, fname), encoding = 'utf8')
            texts.append(f.read())
            f.close()

            if label_type == 'neg':
                labels.append(0)
            else:
                labels.append(1)

sequences = tokenizer.texts_to_sequences(texts)
x_test = pad_sequences(sequences, maxlen = max_len)
y_test = np.asarray(labels)

print(x_test.shape)  # (25000, 100)
print(y_test.shape)  # (25000,)
model.load_weights('pre_trained_glove_model.h5')
model.evaluate(x_test, y_test)

# 출력 결과
loss: 0.7546 - accuracy: 0.5566
[0.754594087600708, 0.5565599799156189]

1. 용어 설명

  • 토큰(token)
    • 텍스트를 나누는 단위
    • 토큰화(tokenization): 토큰으로 나누는 작업
  • n-gram
    • 문장에서 추출한 N개(또는 그 이하)의 연속된 단어 그룹
    • 같은 개념이 '문자'에도 적용 가능

https://www.sqlservercentral.com/articles/nasty-fast-n-grams-part-1-character-level-unigrams

 

 

2. 문자 수준 원-핫 인코딩

import numpy as np

samples = ['The cat sat on the mat.',
           'The dog ate my homeworks.']

token_index = {}

for sample in samples:
    for word in sample.split():
        if word not in token_index:
            token_index[word] = len(token_index) + 1

max_len = 10
results = np.zeros(shape = (len(samples), max_len,
                            max(token_index.values()) + 1))

# 원-핫 인코딩
for i, sample in enumerate(samples):
    for j, word in list(enumerate(sample.split()))[:max_len]:
        index = token_index.get(word)
        results[i, j, index] = 1.
results

# 출력 결과
array([[[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],  # The
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],  # cat
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],  # sat
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],  # on
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],  # the
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],  # mat
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],  # The
        [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],  # dog
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],  # ate
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],  # my
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],  # homeworks
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]])

 

 

3. 케라스를 사용한 단어 수준 원-핫 인코딩

  • fit_on_texts()
  • texts_to_sequences()
  • texts_to_matrix()
from tensorflow.keras.preprocessing.text import Tokenizer

samples = ['The cat sat on the mat.',
           'The dog ate my homeworks.']

tokenizer = Tokenizer(num_words = 1000)
tokenizer.fit_on_texts(samples)

sequences = tokenizer.texts_to_sequences(samples)

ohe_results = tokenizer.texts_to_matrix(samples, mode = 'binary')

word_index = tokenizer.word_index
print(len(word_index))

# 출력 결과
9
# 9개의 토큰을 가지고 있음
# 단어의 순서
sequences

# 출력 결과
[[1, 2, 3, 4, 1, 5], [1, 6, 7, 8, 9]]
# 원-핫 인코딩 결과
print(ohe_results.shape)
print(ohe_results)

# 출력 결과
(2, 1000)
[[0. 1. 1. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]]
word_index

# 출력 결과
{'the': 1,
 'cat': 2,
 'sat': 3,
 'on': 4,
 'mat': 5,
 'dog': 6,
 'ate': 7,
 'my': 8,
 'homeworks': 9}
 
 # 단어 인덱스에 따라 sequences의 값이 정해짐

 

  - 토큰화 예제

  • OOV: Out Of Vocabulary
    • 새로운 문장에서 기존에 토큰화한 문장에 존재하지 않으면 OOV로 대체됨
from tensorflow.keras.preprocessing.text import Tokenizer

samples = ["I'm the smartest student.",
           "I'm the best student."]
tokenizer = Tokenizer(num_words = 10, oov_token = '<OOV>')
tokenizer.fit_on_texts(samples)

sequences = tokenizer.texts_to_sequences(samples)

binary_results = tokenizer.texts_to_matrix(samples, mode = 'binary')

print(tokenizer.word_index)

# 출력 결과
# 현재 tokenizer에 대한 word_index
{'<OOV>': 1, "i'm": 2, 'the': 3, 'student': 4, 'smartest': 5, 'best': 6}
binary_results

# 출력 결과
array([[0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
       [0., 0., 1., 1., 1., 0., 1., 0., 0., 0.]])
  • 테스트
test = ["I'm the fastest student."]
test_seq = tokenizer.texts_to_sequences(test)

print("word index:", tokenizer.word_index)
print("Test Text:", test)
print("Test Seq:", test_seq)

# 출력 결과
word index: {'<OOV>': 1, "i'm": 2, 'the': 3, 'student': 4, 'smartest': 5, 'best': 6}
Test Text: ["I'm the fastest student."]
Test Seq: [[2, 3, 1, 4]]

# fastest는 vocabulary에 없는 oov(out-of-vocabulary) 값이므로 1로 표시됨

 

 

4. 원-핫 단어 벡터와 단어 임베딩

  • 원-핫 단어 벡터
    • 데이터가 희소(sparse)
    • 고차원
  • 단어 임베딩
    • 밀집(dense)
    • 저차원

https://freecontent.manning.com/deep-learning-for-text/

 

 

5. 단어 임베딩

  • 단어 간 벡터 사이의 거리가 가까운, 즉 비슷한 단어들끼리 임베딩
  • 거리 외에 임베딩 공간의 특정 방향도 의미를 가질 수 있음

https://towardsdatascience.com/creating-word-embeddings-coding-the-word2vec-algorithm-in-python-using-deep-learning-b337d0ba17a8

 

  - Embedding Layer

  • 특정 단어를 나타내는 정수 인덱스를 밀집 벡터(dense vector)로 매핑하는 딕셔너리 레이어
  • 입력: (samples, sqquence_length)
  • 출력: (samples, sequences_length, dim)
from tensorflow.keras.layers import Embedding

embedding_layer = Embedding(1000, 64)
embedding_layer

# 출력 결과
<keras.layers.core.embedding.Embedding at 0x265f5b12fa0>
# embedding 객체가 출력됨

 

 

6. 예제: IMDB 데이터

  • 인터넷 영화 데이터베이스(Internet Movie Database)
  • 양극단의 리뷰 5만개로 이루어진 데이터셋
    • 훈련 데이터: 25,000개
    • 테스트 데이터: 25,000개

 

  - modules import

from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, Flatten

 

  - 데이터 로드

num_words = 1000
max_len = 20

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = num_words)

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

# 출력 결과
(25000,)
(25000,)
(25000,)
(25000,)

 

  - 데이터 확인

  • 긍정: 1
  • 부정: 0
print(x_train[0])
print(y_train[0])

# 출력 결과
# 리뷰 데이터의 sequence와 긍정/부정 결과 출력
[1, 14, 22, 16, 43, 530, 973, 2, 2, 65, 458, 2, 66, 2, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 2, 2, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2, 19, 14, 22, 4, 2, 2, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 2, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2, 2, 16, 480, 66, 2, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 2, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 2, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 2, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 2, 88, 12, 16, 283, 5, 16, 2, 113, 103, 32, 15, 16, 2, 19, 178, 32]
1

 

  - 참고) IMDB 데이터셋에서 가장 많이 사용된 단어

word_index = {}

for key, val in imdb.get_word_index().items():
    word_index[val] = key

for i in range(1, 6):
    print(word_index[i])

# 출력 결과
the
and
a
of
to

 

  - 데이터 전처리

  • 모든 데이터를 같은 길이로 맞추기
    • pad_sequence()
      • 데이터가 maxlen보다 길면 데이터를 자름
      • 데이터가 길면 padding 설정
        • pre: 데이터 앞에 0으로 채움
        • post: 데이터 뒤에 0으로 채움
  • 모든 데이터(문장 하나하나)가 같은 길이로 맞춰져야 Embedding 레이어 사용가능
from tensorflow.keras.preprocessing.sequence import pad_sequences

pad_x_train = pad_sequences(x_train, maxlen = max_len, padding = 'pre')
pad_x_test = pad_sequences(x_test, maxlen = max_len, padding = 'pre')

print(len(x_train[0]))
print(len(pad_x_train[0]))

# 출력 결과
218
20
# 최대 길이만큼 줄어듬
print(x_train[0])
print(pad_x_train[0])

# 출력 결과
[1, 14, 22, 16, 43, 530, 973, 2, 2, 65, 458, 2, 66, 2, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 2, 2, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2, 19, 14, 22, 4, 2, 2, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 2, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2, 2, 16, 480, 66, 2, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 2, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 2, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 2, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 2, 88, 12, 16, 283, 5, 16, 2, 113, 103, 32, 15, 16, 2, 19, 178, 32]
[ 65  16  38   2  88  12  16 283   5  16   2 113 103  32  15  16   2  19  178  32]

 

  - 모델 구성

model = Sequential()

model.add(Embedding(input_dim = num_words, output_dim = 32, input_length = max_len))
model.add(Flatten())
model.add(Dense(1, activation = 'sigmoid'))

model.summary()

# 출력 결과
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_1 (Embedding)     (None, 20, 32)            32000     
                                                                 
 flatten (Flatten)           (None, 640)               0         
                                                                 
 dense (Dense)               (None, 1)                 641       
                                                                 
=================================================================
Total params: 32,641
Trainable params: 32,641
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 컴파일 및 학습

model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['accuracy'])

history = model.fit(pad_x_train, y_train,
                    epochs = 10,
                    batch_size = 32,
                    validation_split = 0.2)

 

  - 시각화

import matplotlib.pyplot as plt

hist_dict = history.history

plt.plot(hist_dict['loss'], 'b--', label = 'Train Loss')
plt.plot(hist_dict['val_loss'], 'r:', label = 'Validation Loss')
plt.legend()
plt.grid()

plt.figure()
plt.plot(hist_dict['accuracy'], 'b--', label = 'Train Accuracy')
plt.plot(hist_dict['val_accuracy'], 'r:', label = 'Validation Accuracy')
plt.legend()
plt.grid()

plt.show()

 

  - 모델 평가

model.evaluate(pad_x_test, y_test)

# 출력 결과
loss: 0.5986 - accuracy: 0.7085
[0.5986294150352478, 0.7085199952125549]

 

  - 단어의 수를 늘린 후 재학습

num_words = 1000
max_len = 500

pad_x_train_2 = pad_sequences(x_train, maxlen = max_len, padding = 'pre')
pad_x_test_2 = pad_sequences(x_test, maxlen = max_len, padding = 'pre')

print(x_train[0])
print(pad_x_train_2[0])

# 출력 결과
[1, 14, 22, 16, 43, 530, 973, 2, 2, 65, 458, 2, 66, 2, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 2, 2, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2, 19, 14, 22, 4, 2, 2, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 2, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2, 2, 16, 480, 66, 2, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 2, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 2, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 2, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 2, 88, 12, 16, 283, 5, 16, 2, 113, 103, 32, 15, 16, 2, 19, 178, 32]
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   1  14  22  16  43 530
 973   2   2  65 458   2  66   2   4 173  36 256   5  25 100  43 838 112
  50 670   2   9  35 480 284   5 150   4 172 112 167   2 336 385  39   4
 172   2   2  17 546  38  13 447   4 192  50  16   6 147   2  19  14  22
   4   2   2 469   4  22  71  87  12  16  43 530  38  76  15  13   2   4
  22  17 515  17  12  16 626  18   2   5  62 386  12   8 316   8 106   5
   4   2   2  16 480  66   2  33   4 130  12  16  38 619   5  25 124  51
  36 135  48  25   2  33   6  22  12 215  28  77  52   5  14 407  16  82
   2   8   4 107 117   2  15 256   4   2   7   2   5 723  36  71  43 530
 476  26 400 317  46   7   4   2   2  13 104  88   4 381  15 297  98  32
   2  56  26 141   6 194   2  18   4 226  22  21 134 476  26 480   5 144
  30   2  18  51  36  28 224  92  25 104   4 226  65  16  38   2  88  12
  16 283   5  16   2 113 103  32  15  16   2  19 178  32]

# 500이라는 최대 길이 맞추고 남은 공간을 0으로 채움, pre이므로 앞쪽에 채움
model = Sequential()

model.add(Embedding(input_dim = num_words, output_dim = 32, input_length = max_len))
model.add(Flatten())
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['accuracy'])

history2 = model.fit(pad_x_train_2, y_train,
                    epochs = 10,
                    batch_size = 32,
                    validation_split = 0.2)

hist_dict_2 = history2.history

plt.plot(hist_dict_2['loss'], 'b--', label = 'Train Loss')
plt.plot(hist_dict_2['val_loss'], 'r:', label = 'Validation Loss')
plt.legend()
plt.grid()

plt.figure()
plt.plot(hist_dict_2['accuracy'], 'b--', label = 'Train Accuracy')
plt.plot(hist_dict_2['val_accuracy'], 'r:', label = 'Validation Accuracy')
plt.legend()
plt.grid()

plt.show()

model.evaluate(pad_x_test_2, y_test)

# 출력 결과
loss: 0.5295 - accuracy: 0.8316
[0.5295160412788391, 0.8316400051116943]

  - 위의 결과도 정확도로 봤을때는 나쁘지 않지만 과적합이 됨

  - 그 이유는

  • 단어 간 관계나 문장 구조 등 의미적 연결 고려 x
  • 시퀀스 전체를 고려한 특성을 학습하는 것은 Embedding 층 위에 RNN층이나 1D 합성곱을 추가하는 것이 좋음

 

 

● 단어 임베딩의 종류

  • LSA
  • Word2Vec
  • Blove
  • FastText
  • etc...

 

 

7. Word2Vec

  • 분류 등과 같이 별도의 레이블 없이 텍스트 자체만 있어도 학습이 가능
  • Word2Vec의 방식(주변 단어의 관계를 이용)
    • CBOW(Continuous Bag-Of-Word)
      • 주변 단어의 임베딩을 더해서 대상 단어를 예측
    • Skip-Gram
      • 대상 단어의 임베딩으로 주변 단어를 예측
      • 일반적으로 CBOW보다 성능이 좋은 편
      • 한번에 여러 단어를 예측해야하기 때문에 비효율적
      • 최근에는 negative sampling이라는 방법 사용

https://www.researchgate.net/figure/CBOW-and-Skip-Gram-neural-architectures_fig14_328160770

 

 

8. 구텐베르크 프로젝트 예제

import requests
import re

 

  - 데이터 다운로드

res = requests.get('https://www.gutenberg.org/files/2591/2591-0.txt')
res

# 출력 결과
<Response [200]>
# 200이면 잘 응답한 것
# 404면 오류 발생한 것

 

  - 데이터 전처리

grimm = res.text[2801:530661]
grimm = re.sub(r'[^a-zA-Z\. ]', ' ', grimm)
sentences = grimm.split('. ')
data = [s.split() for s in sentences]

len(data)  # 3468


data[0]

# 출력 결과
['SECOND',
 'STORY',
 'THE',
 'SALAD',
 'THE',
 'STORY',
 'OF',
 'THE',
 'YOUTH',
 'WHO',
 'WENT',
 'FORTH',
 'TO',
 'LEARN',
 'WHAT',
 'FEAR',
 'WAS',
 'KING',
 'GRISLY',
 'BEARD',
 'IRON',
 'HANS',
 'CAT',
 'SKIN',
 'SNOW',
...
 'tree',
 'which',
 'bore',
 'golden',
 'apples']
# gensim 패키지로부터 Word2Vec을 불러오기
from gensim.models.word2vec import Word2Vec
# sg인자에 0을 넘겨주면 CBOW, 1을 넘겨주면 Skip-gram
# 최소 3번은 등장한 단어, 동시 처리의 수는 4개
model = Word2Vec(data, sg = 1, vector_size = 100, window = 3, min_count = 3, workers = 4)

 

  - 모델 저장 및 로드

# 저장
model.save('word2vec.model')

# 로드
pretrained_model = Word2Vec.load('word2vec.model')

 

  - 단어를 벡터로 변환

  • wv
pretrained_model.wv['princess']

# 출력 결과
array([-0.19268924,  0.17087255, -0.13460916,  0.20450976,  0.03542079,
       -0.31665406,  0.13296   ,  0.54076153, -0.18337499, -0.21417093,
        0.02725333, -0.31845513,  0.01819889,  0.10720193,  0.16601542,
       -0.19728081,  0.05753807, -0.12273175, -0.17903367, -0.22576232,
        0.2438455 ,  0.13664703,  0.18498562, -0.1679803 ,  0.07735273,
       -0.00432668, -0.00775897, -0.08363435, -0.12566872, -0.07055762,
        0.02887373, -0.08917326,  0.17351009, -0.18784055, -0.20769958,
        0.19657052,  0.01372425, -0.074237  , -0.10052767, -0.11275681,
        0.06725535, -0.09701315,  0.02844668,  0.05958825, -0.02586031,
       -0.01711333, -0.11226629, -0.08671231,  0.1945969 ,  0.01690222,
        0.07196116, -0.08172472, -0.05373074, -0.14637838,  0.16281295,
        0.06222549,  0.10643765,  0.07477342, -0.16238536,  0.03527208,
       -0.04292673,  0.04597842,  0.13826323, -0.19217554, -0.25257504,
        0.10983958,  0.03293723,  0.4319519 , -0.21335553,  0.24770555,
       -0.00888118,  0.02231867,  0.17330043, -0.10485211,  0.35415375,
       -0.08000654,  0.01478033, -0.03938808, -0.06453493,  0.02249427,
       -0.21435274, -0.01287377, -0.2137464 ,  0.21174915, -0.1006554 ,
        0.00902446,  0.05607878,  0.16368881,  0.13859129, -0.01395336,
        0.09382439,  0.08065708, -0.056269  ,  0.09765122,  0.188912  ,
        0.1668056 , -0.01361183, -0.14287405, -0.11452819, -0.20357099],
      dtype=float32)

# 'princess'라는 단어를 벡터로 변환한 값

 

  - 유추 또는 유비(analogy)

  • wv.similarity()에 두 단어를 넣어주면 코사인 유사도를 구할 수 있음
pretrained_model.wv.similarity('king', 'prince')

# 출력 결과
0.8212076
  • wv.most_similar()에 단어를 넘겨주면 가장 유사한 단어를 추출할 수 있음
pretrained_model.wv.most_similar('king')

# 출력 결과
[('daughter', 0.9241937398910522),
 ('son', 0.9213796257972717),
 ('woman', 0.9177201390266418),
 ('man', 0.897368848323822),
 ('queen', 0.8747967481613159),
 ('miller', 0.8610494136810303),
 ('old', 0.8595746755599976),
 ('young', 0.8504902124404907),
 ('wolf', 0.8450464010238647),
 ('But', 0.8406485319137573)]
  • wv.most_similar()에 positive와 negetive라는 옵션을 넘길 수 있음
# 'man + princess - woman'을 벡터 계산을 한 값을 출력
# man이고 princess인데 woman이 아닌 단어
pretrained_model.wv.most_similar(positive = ['man', 'princess'], negative = ['woman'])

# 출력 결과
[('bird', 0.9595717787742615),
 ('prince', 0.9491060376167297),
 ('cook', 0.9410891532897949),
 ('bride', 0.9401964545249939),
 ('huntsman', 0.9375050067901611),
 ('mouse', 0.9356588125228882),
 ('cat', 0.9344455003738403),
 ('giant', 0.9341970682144165),
 ('gardener', 0.9327394366264343),
 ('maid', 0.9326624870300293)]

 

  - gensim으로 학습된 단어 임베딩을 Keras에서 불러오기 

from keras.models import Sequential
from keras.layers import Embedding

num_words, emb_dim = pretrained_model.wv.vectors.shape

print(num_words)
print(emb_dim)

# 출력 결과
2446
100

 

  - gensim으로 학습된 단어 임베딩을 Keras의 임베딩 레이어의 가중치로 설정

emb = Embedding(input_dim = num_words, output_dim = emb_dim,
                trainable = False, weights = [pretrained_model.wv.vectors])

model = Sequential()
model.add(emb)

model.summary()

# 출력 결과
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_3 (Embedding)     (None, None, 100)         244600    
                                                                 
=================================================================
Total params: 244,600
Trainable params: 0
Non-trainable params: 244,600
_________________________________________________________________
# princess에 대한 결과 벡터
i = pretrained_model.wv.index_to_key.index('princess')

model.predict([i])

# 출력 결과
array([[-0.19268924,  0.17087255, -0.13460916,  0.20450976,  0.03542079,
        -0.31665406,  0.13296   ,  0.54076153, -0.18337499, -0.21417093,
         0.02725333, -0.31845513,  0.01819889,  0.10720193,  0.16601542,
        -0.19728081,  0.05753807, -0.12273175, -0.17903367, -0.22576232,
         0.2438455 ,  0.13664703,  0.18498562, -0.1679803 ,  0.07735273,
        -0.00432668, -0.00775897, -0.08363435, -0.12566872, -0.07055762,
         0.02887373, -0.08917326,  0.17351009, -0.18784055, -0.20769958,
         0.19657052,  0.01372425, -0.074237  , -0.10052767, -0.11275681,
         0.06725535, -0.09701315,  0.02844668,  0.05958825, -0.02586031,
        -0.01711333, -0.11226629, -0.08671231,  0.1945969 ,  0.01690222,
         0.07196116, -0.08172472, -0.05373074, -0.14637838,  0.16281295,
         0.06222549,  0.10643765,  0.07477342, -0.16238536,  0.03527208,
        -0.04292673,  0.04597842,  0.13826323, -0.19217554, -0.25257504,
         0.10983958,  0.03293723,  0.4319519 , -0.21335553,  0.24770555,
        -0.00888118,  0.02231867,  0.17330043, -0.10485211,  0.35415375,
        -0.08000654,  0.01478033, -0.03938808, -0.06453493,  0.02249427,
        -0.21435274, -0.01287377, -0.2137464 ,  0.21174915, -0.1006554 ,
         0.00902446,  0.05607878,  0.16368881,  0.13859129, -0.01395336,
         0.09382439,  0.08065708, -0.056269  ,  0.09765122,  0.188912  ,
         0.1668056 , -0.01361183, -0.14287405, -0.11452819, -0.20357099]],
      dtype=float32)

● 케라스 전이학습(tramsfer learning)

https://medium.com/the-official-integrate-ai-blog/transfer-learning-explained-7d275c1e34e2

  • 새로운 모델을 만들때 기존에 학습된 모델을 사용
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten, BatchNormalization, Activation
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import *


# 예시로 학습된 vgg 데이터 불러오기
vgg16 = VGG16(weights = 'imagenet',
              input_shape = (32, 32, 3), include_top = False)

model = Sequential()
model.add(vgg16)

model.add(Flatten())
model.add(Dense(256))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(10, activation = 'softmax'))

model.summary()

# 출력 결과
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 vgg16 (Functional)          (None, 1, 1, 512)         14714688  
                                                                 
 flatten (Flatten)           (None, 512)               0         
                                                                 
 dense (Dense)               (None, 256)               131328    
                                                                 
 batch_normalization (BatchN  (None, 256)              1024      
 ormalization)                                                   
                                                                 
 activation (Activation)     (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                2570      
                                                                 
=================================================================
Total params: 14,849,610
Trainable params: 14,849,098
Non-trainable params: 512
_________________________________________________________________
  • vgg16 이외에 MobileNet, ResNet50, Xceoption 모델 등이 존재하여 전이 학습에 이용가능

 

1. 예제: Dogs vs Cats

 

  - modules import

import tensorflow as tf
from tensorflow.keras.preprocessing.image import array_to_img, img_to_array, load_img, ImageDataGenerator
from tensorflow.keras.layers import Conv2D, Flatten, MaxPool2D, Input, Dropout, Dense
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam

import os
import zipfile
import matplotlib.image as mpimg
import matplotlib.pyplot as plt

 

  - 데이터 로드

# 외부에서 데이터 가져오기
import wget

wget.download("https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip")


# 압축 해제
local_zip = 'cats_and_dogs_filtered.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
# 현재 폴더에 압축해제
zip_ref.extractall()
zip_ref.close()


# 압축해제된 폴더를 기본 경로로 지정, 폴더 내의 train과 validation 폴더에 각각 접근
base_dir = 'cats_and_dogs_filtered'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')


# 압축해제된 폴더 내의 train cat, validation cat, train dog, validation dog 폴더에 각각 접근
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

train_cat_frames = os.listdir(train_cats_dir)
train_dog_frames = os.listdir(train_dogs_dir)

 

  - 이미지 보강된 데이터 확인

# ImageDataGenerator 정의
datagen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = 'nearest'
)


# 이미지 로드
img_path = os.path.join(train_cats_dir, train_cat_frames[2])
img = load_img(img_path, target_size = (150, 150))
x = img_to_array(img)
x = x.reshape((1, ) + x.shape)

i = 0
for batch in datagen.flow(x, batch_size = 1):
    plt.figure(i)
    imgplot = plt.imshow(array_to_img(batch[0]))
    i += 1
    if i % 5 == 0:
        break

 

  - 학습, 검증 데이터셋의 Data Generator

train_datagen = ImageDataGenerator(
    rescale = 1. / 255,
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True
)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size = (150, 150),
    batch_size = 20,
    class_mode = 'binary'
)

val_datagen = ImageDataGenerator(rescale = 1. / 255)

validation_generator = val_datagen.flow_from_directory(
    validation_dir,
    target_size = (150, 150),
    batch_size = 20,
    class_mode = 'binary'
)


# 출력 결과
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.

 

  - 모델 구성 및 컴파일

model = Sequential()
model.add(Conv2D(32, (3, 3), activation = 'relu', input_shape = (150, 150, 3)))
model.add(MaxPool2D(2, 2))
model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(MaxPool2D(2, 2))
model.add(Conv2D(128, (3, 3), activation = 'relu'))
model.add(MaxPool2D(2, 2))
model.add(Conv2D(128, (3, 3), activation = 'relu'))
model.add(MaxPool2D(2, 2))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(512, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(loss = 'binary_crossentropy',
              optimizer = Adam(learning_rate = 1e-4),
              metrics = ['acc'])

model.summary()

# 출력 결과
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 148, 148, 32)      896       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 74, 74, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 72, 72, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 36, 36, 64)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 34, 34, 128)       73856     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 17, 17, 128)      0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 15, 15, 128)       147584    
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 7, 7, 128)        0         
 2D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 6272)              0         
                                                                 
 dropout (Dropout)           (None, 6272)              0         
                                                                 
 dense_2 (Dense)             (None, 512)               3211776   
                                                                 
 dense_3 (Dense)             (None, 1)                 513       
                                                                 
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 학습 및 학습 과정 시각화

history = model.fit(train_generator,
                    steps_per_epoch = 100,
                    epochs = 30,
                    batch_size = 256,
                    validation_data = validation_generator,
                    validation_steps = 50,
                    verbose = 2)

# 시각화
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))

plt.plot(epochs, loss, 'b--', label = 'Train Loss')
plt.plot(epochs, val_loss, 'b--', label = 'Validation Loss')
plt.grid()
plt.legend()

plt.plot(epochs, acc, 'b--', label = 'Train Accuracy')
plt.plot(epochs, val_acc, 'b--', label = 'Validation Accuracy')
plt.grid()
plt.legend()

plt.show()

 

  - 모델 저장

model.save('cats_and_dogs_model.h5')

 

  - 사전 훈련된 모델 사용

from tensorflow.keras.optimizers import RMSprop

conv_base = VGG16(weights = 'imagenet',
                  input_shape = (150, 150, 3), include_top = False)

def build_model_with_pretrained(convbase):
    model = Sequential()
    model.add(conv_base)
    model.add(Flatten())
    model.add(Dense(256, activation = 'relu'))
    model.add(Dense(1, activation = 'sigmoid'))

    model.compile(loss = binary_crossentropy,
                  optimizer = RMSprop(learning_rate = 2e-5),
                  metrics = ['accuracy'])
    return model
  • 파라미터 수 확인
model.build_model_with_pretrained(conv_base)
model.summary()

# 출력 결과
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 vgg16 (Functional)          (None, 4, 4, 512)         14714688  
                                                                 
 flatten_2 (Flatten)         (None, 8192)              0         
                                                                 
 dense_4 (Dense)             (None, 256)               2097408   
                                                                 
 dense_5 (Dense)             (None, 1)                 257       
                                                                 
=================================================================
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
_________________________________________________________________

 

  - 레이어 동결

  • 훈련하기 전, 합성곱 기반 레이어들의 가중치 학습을 막기 위해 이를 동결
# 동결 전
print(len(model.trainable_weights))

# 출력 결과
30


# 동결 후
conv_base.trainable = False
print(len(model.trainable_weights))

# 출력 결과
4

 

  - 모델 컴파일

  • trainable 속성을 변경했기 때문에 다시 모델을 컴파일 해야함
model.compile(loss = 'binary_crossentropy',
              optimizer = RMSprop(learning_rate = 2e-5),
              metrics = ['accuracy'])

 

  - 이미지 제너레이터

train_datagen = ImageDataGenerator(
    rescale = 1. / 255,
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True
)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size = (150, 150),
    batch_size = 20,
    class_mode = 'binary'
)

val_datagen = ImageDataGenerator(rescale = 1. / 255)

validation_generator = val_datagen.flow_from_directory(
    validation_dir,
    target_size = (150, 150),
    batch_size = 20,
    class_mode = 'binary'
)

# 출력 결과
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.

 

  - 모델 재학습

history2 = model.fit(train_generator,
                    steps_per_epoch = 100,
                    epochs = 30,
                    batch_size = 256,
                    validation_data = validation_generator,
                    validation_steps = 50,
                    verbose = 2)

acc = history2.history['accuracy']
val_acc = history2.history['val_accuracy']
loss = history2.history['loss']
val_loss = history2.history['val_loss']
epochs = range(len(acc))

plt.plot(epochs, loss, 'b--', label = 'Train Loss')
plt.plot(epochs, val_loss, 'r:', label = 'Validation Loss')
plt.grid()
plt.legend()

plt.plot(epochs, acc, 'b--', label = 'Train Accuracy')
plt.plot(epochs, val_acc, 'r:', label = 'Validation Accuracy')
plt.grid()
plt.legend()

plt.show()

 

  - 모델 저장

model.save('cats_and_dogs_with_pretrained_model.h5')

 

 

2. Feature Map 시각화

  - 모델 구성

import numpy as np
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image


# 저장된 모델 로드
model = load_model('cats_and_dogs_model.h5')
model.summary()

# 출력 결과
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 148, 148, 32)      896       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 74, 74, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 72, 72, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 36, 36, 64)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 34, 34, 128)       73856     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 17, 17, 128)      0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 15, 15, 128)       147584    
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 7, 7, 128)        0         
 2D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 6272)              0         
                                                                 
 dropout (Dropout)           (None, 6272)              0         
                                                                 
 dense_2 (Dense)             (None, 512)               3211776   
                                                                 
 dense_3 (Dense)             (None, 1)                 513       
                                                                 
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________
img_path = 'cats_and_dogs_filtered/validation/dogs/dog.2000.jpg'

img = image.load_img(img_path, target_size = (150, 150))
img_tensor = image.img_to_array(img)
img_tensor = img_tensor[np.newaxis, ...]
img_tensor /= 255.
print(img_tensor.shape)

# 출력 결과
(1, 150, 150, 3)
plt.imshow(img_tensor[0])
plt.show()

# 레이어 중 일부만(8개) 출력
conv_output = [layer.output for layer in model.layer[:8]]
conv_output

# 출력 결과
[<KerasTensor: shape=(None, 148, 148, 32) dtype=float32 (created by layer 'conv2d')>,
 <KerasTensor: shape=(None, 74, 74, 32) dtype=float32 (created by layer 'max_pooling2d')>,
 <KerasTensor: shape=(None, 72, 72, 64) dtype=float32 (created by layer 'conv2d_1')>,
 <KerasTensor: shape=(None, 36, 36, 64) dtype=float32 (created by layer 'max_pooling2d_1')>,
 <KerasTensor: shape=(None, 34, 34, 128) dtype=float32 (created by layer 'conv2d_2')>,
 <KerasTensor: shape=(None, 17, 17, 128) dtype=float32 (created by layer 'max_pooling2d_2')>,
 <KerasTensor: shape=(None, 15, 15, 128) dtype=float32 (created by layer 'conv2d_3')>,
 <KerasTensor: shape=(None, 7, 7, 128) dtype=float32 (created by layer 'max_pooling2d_3')>]
activation_model = Model(inputs = [model.input], outputs = conv_output)
activations = activation_model.predict(img_tensor)
len(activations)

# 출력 결과
8

 

  - 시각화

print(activations[0].shape)
plt.matshow(activations[0][0, :, :, 7], cmap = 'viridis')
plt.show()

# 출력 결과
(1, 148, 148, 32)

print(activations[0].shape)
plt.matshow(activations[0][0, :, :, 10], cmap = 'viridis')
plt.show()

# 출력 결과
(1, 148, 148, 32)

 

  - 중간의 모든 활성화에 대해 시각화

# 각 layer에서 이미지의 변환과정을 시각화
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)

images_per_row = 16

for layer_name, layer_activation in zip(layer_names, activations):
    num_features = layer_activation.shape[-1]

    size = layer_activation.shape[1]

    num_cols = num_features // images_per_row
    display_grid = np.zeros((size * num_cols, size * images_per_row))

    for col in range(num_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0, :, :, col * images_per_row + row]
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image =np.clip(channel_image, 0, 255).astype('unit8')
            display_grid[col * size : (col + 1) * size, row * size : (row + 1) * size] = channel_image
        
    scale = 1. / size

    plt.figure(figsize = (scale * display_grid.shape[1],
                          scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect = 'auto', cmap = 'viridis')

plt.show()

● CIFAR 10

  • 50,000개의 학습 데이터, 10,000개의 테스트 데이터로 구성
  • 데이터 복잡도가 MNIST보다 훨씬 높은 특징이 있음
    • 신경망이 특징을 검출하기 어려움

1. modules import

import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Input, Dropout, BatchNormalization
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
import numpy as np

 

 

2. 데이터 로드 및 전처리

(x_train_full, y_train_full), (x_test, y_test) = cifar10.load_data()
print(x_train_full.shape, y_train_full.shape)
print(x_test.shape, y_test.shape)

# 출력 결과
(50000, 32, 32, 3) (50000, 1)
(10000, 32, 32, 3) (10000, 1)


# 정답 데이터의 값은 레이블로 되어있음
print(y_test[0])

# 출력 결과
[3]


# 예시 데이터
np.random.seed(777)

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'sheep', 'truck']

sample_size = 9
random_idx = np.random.randint(60000, size = sample_size)

plt.figure(figsize = (5, 5))
for i, idx in enumerate(random_idx):
    plt.subplot(3, 3, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(x_train_full[i])
    plt.xlabel(class_names[int(y_train_full[i])])

plt.show()

  • 32 * 32 이미지라 화질이 낮음
# x 데이터 정규화
x_mean = np.mean(x_train_full, axis = (0, 1, 2))
x_std = np.std(x_train_full, axis = (0, 1, 2))
x_train_full = (x_train_full - x_mean) / x_std
x_test = (x_test - x_mean) / x_std


# 학습데이터와 검증데이터 분리
x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full, test_size = 0.3)


# 전처리한 데이터 형태 출력
print(x_train.shape)
print(y_train.shape)

print(x_val.shape)
print(y_val.shape)

print(x_test.shape)
print(y_test.shape)

# 출력 결과
(35000, 32, 32, 3)
(35000, 1)
(15000, 32, 32, 3)
(15000, 1)
(10000, 32, 32, 3)
(10000, 1)

 

 

3. 모델 구성 및 컴파일

def model_build():
    model = Sequential()

    input = Input(shape = (32, 32, 3))

    output = Conv2D(filters = 32, kernel_size = 3, padding = 'same', activation = 'relu')(input)
    output = MaxPool2D(pool_size = (2, 2), strides = 2, padding = 'same')(output)

    output = Conv2D(filters = 64, kernel_size = 3, padding = 'same', activation = 'relu')(output)
    output = MaxPool2D(pool_size = (2, 2), strides = 2, padding = 'same')(output)

    output = Conv2D(filters = 128, kernel_size = 3, padding = 'same', activation = 'relu')(output)
    output = MaxPool2D(pool_size = (2, 2), strides = 2, padding = 'same')(output)

    output = Flatten()(output)
    output = Dense(256, activation = 'relu')(output)
    output = Dense(128, activation = 'relu')(output)
    output = Dense(10, activation = 'softmax')(output)

    model = Model(inputs = [input], outputs = [output])

    model.compile(optimizer = Adam(learning_rate = 1e-4),
                  loss = 'sparse_categorical_crossentropy',
                  metrics = ['accuracy'])
    return model
model = model_build()
model.summary()

# 출력 결과
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv2d_3 (Conv2D)           (None, 32, 32, 32)        896       
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 8, 8, 64)         0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 8, 8, 128)         73856     
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 4, 4, 128)        0         
 2D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 2048)              0         
                                                                 
 dense_3 (Dense)             (None, 256)               524544    
                                                                 
 dense_4 (Dense)             (None, 128)               32896     
                                                                 
 dense_5 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 651,978
Trainable params: 651,978
Non-trainable params: 0
_________________________________________________________________

 

 

4. 모델 학습 및 평가

history = model.fit(x_train, y_train,
                    epochs = 30,
                    batch_size = 256,
                    validation_data = (x_val, y_val))

 

 

5. 학습 과정 시각화

plt.figure(figsize = (12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], 'b--', label = 'loss')
plt.plot(history.history['val_loss'], 'r:', label = 'val_loss')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], 'b--', label = 'accuracy')
plt.plot(history.history['val_accuracy'], 'r:', label = 'val_accuracy')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

  - 해당 모델은 성능이 좋지 않음

  - 규제화, 드롭아웃 등 과대적합을 방지하는 기술 필요

def model_build2():
    model = Sequential()

    input = Input(shape = (32, 32, 3))

    output = Conv2D(filters = 32, kernel_size = 3, padding = 'same', activation = 'relu')(input)
    output = BatchNormalization()(output)
    output = MaxPool2D(pool_size = (2, 2), strides = 2, padding = 'same')(output)

    output = Conv2D(filters = 64, kernel_size = 3, padding = 'same', activation = 'relu')(output)
    output = BatchNormalization()(output)
    output = MaxPool2D(pool_size = (2, 2), strides = 2, padding = 'same')(output)

    output = Conv2D(filters = 128, kernel_size = 3, padding = 'same', activation = 'relu')(output)
    output = BatchNormalization()(output)
    output = MaxPool2D(pool_size = (2, 2), strides = 2, padding = 'same')(output)
    output = Dropout(0.5)(output)

    output = Flatten()(output)
    output = Dense(256, activation = 'relu')(output)
    output = Dropout(0.5)(output)
    output = Dense(128, activation = 'relu')(output)
    output = Dense(10, activation = 'softmax')(output)

    model = Model(inputs = [input], outputs = [output])

    model.compile(optimizer = Adam(learning_rate = 1e-4),
                  loss = 'sparse_categorical_crossentropy',
                  metrics = ['accuracy'])
    return model
model2 = model_build2()
model2.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_4 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv2d_6 (Conv2D)           (None, 32, 32, 32)        896       
                                                                 
 batch_normalization (BatchN  (None, 32, 32, 32)       128       
 ormalization)                                                   
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 conv2d_7 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 batch_normalization_1 (Batc  (None, 16, 16, 64)       256       
 hNormalization)                                                 
                                                                 
 max_pooling2d_7 (MaxPooling  (None, 8, 8, 64)         0         
 2D)                                                             
                                                                 
 conv2d_8 (Conv2D)           (None, 8, 8, 128)         73856     
                                                                 
 batch_normalization_2 (Batc  (None, 8, 8, 128)        512       
 hNormalization)                                                 
                                                                 
 max_pooling2d_8 (MaxPooling  (None, 4, 4, 128)        0         
 2D)                                                             
                                                                 
 dropout (Dropout)           (None, 4, 4, 128)         0         
                                                                 
 flatten_2 (Flatten)         (None, 2048)              0         
                                                                 
 dense_6 (Dense)             (None, 256)               524544    
                                                                 
 dropout_1 (Dropout)         (None, 256)               0         
                                                                 
 dense_7 (Dense)             (None, 128)               32896     
                                                                 
 dense_8 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 652,874
Trainable params: 652,426
Non-trainable params: 448
_________________________________________________________________

 

 

6. 모델 학습 및 평가

history2 = model2.fit(x_train, y_train,
                      epochs = 30,
                      batch_size = 256,
                      validation_data = (x_val, y_val))

 

 

7. 학습 과정 시각화

plt.figure(figsize = (12, 4))

plt.subplot(1, 2, 1)
plt.plot(history2.history['loss'], 'b--', label = 'loss')
plt.plot(history2.history['val_loss'], 'r:', label = 'val_loss')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history2.history['accuracy'], 'b--', label = 'accuracy')
plt.plot(history2.history['val_accuracy'], 'r:', label = 'val_accuracy')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

  • 검증데이터의 결과가 많이 개선됨

1. modules import 

%load_ext tensorboard
import datetime
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets.fashion_mnist import load_data
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Dropout, Input, Flatten

 

 

2. 데이터 로드 및 전처리

(x_train, y_train), (x_test, y_test) = load_data()

x_train = x_train[..., np.newaxis]
x_test = x_test[..., np.newaxis]

x_train = x_train / 255.
x_test = x_test / 255.

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

# 출력 결과
(60000, 28, 28, 1)
(60000,)
(10000, 28, 28, 1)
(10000,)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

 

3. 모델 구성 및 컴파일

def build_model():
    model = Sequential()

    input = Input(shape = (28, 28, 1))
    output = Conv2D(filters = 32, kernel_size = (3, 3))(input)
    output = Conv2D(filters = 64, kernel_size = (3, 3))(output)
    output = Conv2D(filters = 64, kernel_size = (3, 3))(output)
    output = Flatten()(output)
    output = Dense(units = 128, activation = 'relu')(output)
    output = Dense(units = 64, activation = 'relu')(output)
    output = Dense(units = 10, activation = 'softmax')(output)

    model = Model(inputs = [input], outputs = [output])

    model.compile(optimizer = 'adam',
                  loss = 'sparse_categorical_crossentropy',
                  metrics = ['acc'])
    return model

model_1 = build_model()
model_1.summary()

# 출력 결과
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 64)        18496     
                                                                 
 conv2d_2 (Conv2D)           (None, 22, 22, 64)        36928     
                                                                 
 flatten (Flatten)           (None, 30976)             0         
                                                                 
 dense (Dense)               (None, 128)               3965056   
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 10)                650       
                                                                 
=================================================================
Total params: 4,029,706
Trainable params: 4,029,706
Non-trainable params: 0
_________________________________________________________________

 

 

4. 모델 학습

hist_1 = model_1.fit(x_train, y_train,
                     epochs = 25,
                     validation_split = 0.3,
                     batch_size = 128)

 

 

5. 학습 결과 시각화

plt.figure(figsize = (12, 4))
plt.subplot(1, 2, 1)
plt.plot(hist_1.history['loss'], 'b--', label = 'loss')
plt.plot(hist_1.history['val_loss'], 'r:', label = 'val_loss')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(hist_1.history['acc'], 'b--', label = 'accuracy')
plt.plot(hist_1.history['val_acc'], 'r:', label = 'val_accuracy')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

 

 

6. 모델 평가

model_1.evaluate(x_test, y_test)

# 출력 결과
loss: 1.1168 - acc: 0.8566
[1.116817831993103, 0.8565999865531921]

 

 

7. 모델 재구성(학습 파라미터 수 비교)

def build_model_2():
    model = Sequential()

    input = Input(shape = (28, 28, 1))
    output = Conv2D(filters = 32, kernel_size = (3, 3))(input)
    output = MaxPool2D(strides = (2, 2))(output)
    output = Conv2D(filters = 64, kernel_size = (3, 3))(output)
    output = MaxPool2D(strides = (2, 2))(output)
    output = Conv2D(filters = 64, kernel_size = (3, 3))(output)
    output = MaxPool2D(strides = (2, 2))(output)
    output = Flatten()(output)
    output = Dense(units = 128, activation = 'relu')(output)
    output = Dropout(0.3)(output)
    output = Dense(units = 64, activation = 'relu')(output)
    output = Dropout(0.3)(output)
    output = Dense(units = 10, activation = 'softmax')(output)

    model = Model(inputs = [input], outputs = [output])

    model.compile(optimizer = 'adam',
                  loss = 'sparse_categorical_crossentropy',
                  metrics = ['acc'])
    return model

model_2 = build_model_2()
model_2.summary()

# 출력 결과
Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_6 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_7 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_8 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 1, 1, 64)         0         
 2D)                                                             
                                                                 
 flatten_2 (Flatten)         (None, 64)                0         
                                                                 
 dense_6 (Dense)             (None, 128)               8320      
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_7 (Dense)             (None, 64)                8256      
                                                                 
 dropout_1 (Dropout)         (None, 64)                0         
                                                                 
 dense_8 (Dense)             (None, 10)                650       
                                                                 
=================================================================
Total params: 72,970
Trainable params: 72,970
Non-trainable params: 0
_________________________________________________________________
  • 학습 파라미터 수가 줄어듦

 

 

8. 모델 재학습

hist_2 = model_2.fit(x_train, y_train,
                     epochs = 25,
                     validation_split = 0.3,
                     batch_size = 128)

# 재학습 결과 시각화
plt.figure(figsize = (12, 4))
plt.subplot(1, 2, 1)
plt.plot(hist_2.history['loss'], 'b--', label = 'loss')
plt.plot(hist_2.history['val_loss'], 'r:', label = 'val_loss')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(hist_2.history['acc'], 'b--', label = 'accuracy')
plt.plot(hist_2.history['val_acc'], 'r:', label = 'val_accuracy')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

  • 처음 모델보다 학습데이터에 오버피팅이 덜 된 모습

 

9. 모델 재평가

model_2.evaluate(x_test, y_test)

# 출력 결과
loss: 0.4026 - acc: 0.8830
[0.4026452302932739, 0.8830000162124634]

 

 

10. 모델 성능 높이기(많은 레이어 쌓기)

from tensorflow.keras.layers import BatchNormalization, ReLU

def build_model_3():
    model = Sequential()

    input = Input(shape = (28, 28, 1))
    output = Conv2D(filters = 32, kernel_size = 3, activation = 'relu', padding = 'same')(input)
    output = Conv2D(filters = 64, kernel_size = 3, activation = 'relu', padding = 'valid')(output)
    output = MaxPool2D(strides = (2, 2))(output)
    output = Dropout(0.5)(output)

    output = Conv2D(filters = 128, kernel_size = 3, activation = 'relu', padding = 'same')(output)
    output = Conv2D(filters = 256, kernel_size = 3, activation = 'relu', padding = 'valid')(output)
    output = MaxPool2D(strides = (2, 2))(output)
    output = Dropout(0.5)(output)

    output = Flatten()(output)
    output = Dense(units = 256, activation = 'relu')(output)
    output = Dropout(0.5)(output)
    output = Dense(units = 100, activation = 'relu')(output)
    output = Dropout(0.5)(output)
    output = Dense(units = 10, activation = 'softmax')(output)

    model = Model(inputs = [input], outputs = [output])

    model.compile(optimizer = 'adam',
                  loss = 'sparse_categorical_crossentropy',
                  metrics = ['acc'])
    return model

model_3 = build_model_3()
model_3.summary()

# 출력 결과
Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_4 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_9 (Conv2D)           (None, 28, 28, 32)        320       
                                                                 
 conv2d_10 (Conv2D)          (None, 26, 26, 64)        18496     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 13, 13, 64)       0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 13, 13, 64)        0         
                                                                 
 conv2d_11 (Conv2D)          (None, 13, 13, 128)       73856     
                                                                 
 conv2d_12 (Conv2D)          (None, 11, 11, 256)       295168    
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 5, 5, 256)        0         
 2D)                                                             
                                                                 
 dropout_3 (Dropout)         (None, 5, 5, 256)         0         
                                                                 
 flatten_3 (Flatten)         (None, 6400)              0         
                                                                 
 dense_9 (Dense)             (None, 256)               1638656   
                                                                 
 dropout_4 (Dropout)         (None, 256)               0         
                                                                 
 dense_10 (Dense)            (None, 100)               25700     
                                                                 
 dropout_5 (Dropout)         (None, 100)               0         
                                                                 
 dense_11 (Dense)            (None, 10)                1010      
                                                                 
=================================================================
Total params: 2,053,206
Trainable params: 2,053,206
Non-trainable params: 0
_________________________________________________________________

 

  - 모델 학습 및 결과 시각화

hist_3 = model_3.fit(x_train, y_train,
                     epochs = 25,
                     validation_split = 0.3,
                     batch_size = 128)

  - 과적합은 되지 않았지만 층을 늘려도 좋은 성능을 낼 수 있음

plt.figure(figsize = (12, 4))
plt.subplot(1, 2, 1)
plt.plot(hist_3.history['loss'], 'b--', label = 'loss')
plt.plot(hist_3.history['val_loss'], 'r:', label = 'val_loss')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(hist_3.history['acc'], 'b--', label = 'accuracy')
plt.plot(hist_3.history['val_acc'], 'r:', label = 'val_accuracy')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

model_3.evaluate(x_test, y_test)

# 출력 결과
loss: 0.2157 - acc: 0.9261
[0.21573999524116516, 0.9261000156402588]

 

 

11. 모델 성능 높이기(이미지 보강, Image Augmentation)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_generator = ImageDataGenerator(
    rotation_range = 10,
    zoom_range = 0.2,
    share_range = 0.6,
    width_shift_range = 0.1,
    height_shift_range = 0.1,
    horizontal_flip = True,
    vertival_flip = False
)

augment_size = 200

print(x_train.shape)
print(x_train[0].shape)

# 출력 결과
(60000, 28, 28, 1)
(28, 28, 1)
x_augment = image_generator.flow(np.tile(x_train[0].reshape(28 * 28 * 1), augment_size).reshape(28 * 28 * 1),
                                 np.zeros(augment_size), batch_size = augment_size, shuffle = False).next()[0]

plt.figure(figsize = (10, 10))
for i in range(1, 101):
    plt.subplot(10, 10, i)
    plt.axis('off')
    plt.imshow(x_augment[i - 1].reshape(28, 28), cmap = 'gray')

  • 위의 코드를 사용해 학습에 사용할 데이터 추가
from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_generator = ImageDataGenerator(
    rotation_range = 15,
    zoom_range = 0.1,
    share_range = 0.6,
    width_shift_range = 0.15,
    height_shift_range = 0.1,
    horizontal_flip = True,
    vertival_flip = False
)

augment_size = 30000

random_mask = np.random.randint(x_train.shape[0], size = augment_size)
x_augmented = x_train[random_mask].copy()
y_augmented = y_train[random_mask].copy()

x_augmented = image_generator.flow(x_augmented, np.zeros(augment_size),
                                   batch_size = augment_size, shuffle = False).next()[0]
x_train = np.concatenate((x_train, x_augmented))
y_train = np.concatenate((y_train, y_augmented))

# 생성한 augment 30000개가 더 추가됨
print(x_train.shape)

# 출력 결과
(90000, 28, 28, 1)

 

  - 모델 학습 및 결과 시각화

model_4 = build_model_3()
model_4.summary()

# 출력 결과
Model: "model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_5 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_13 (Conv2D)          (None, 28, 28, 32)        320       
                                                                 
 conv2d_14 (Conv2D)          (None, 26, 26, 64)        18496     
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 13, 13, 64)       0         
 2D)                                                             
                                                                 
 dropout_6 (Dropout)         (None, 13, 13, 64)        0         
                                                                 
 conv2d_15 (Conv2D)          (None, 13, 13, 128)       73856     
                                                                 
 conv2d_16 (Conv2D)          (None, 11, 11, 256)       295168    
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 5, 5, 256)        0         
 2D)                                                             
                                                                 
 dropout_7 (Dropout)         (None, 5, 5, 256)         0         
                                                                 
 flatten_4 (Flatten)         (None, 6400)              0         
                                                                 
 dense_12 (Dense)            (None, 256)               1638656   
                                                                 
 dropout_8 (Dropout)         (None, 256)               0         
                                                                 
 dense_13 (Dense)            (None, 100)               25700     
                                                                 
 dropout_9 (Dropout)         (None, 100)               0         
                                                                 
 dense_14 (Dense)            (None, 10)                1010      
                                                                 
=================================================================
Total params: 2,053,206
Trainable params: 2,053,206
Non-trainable params: 0
_________________________________________________________________
hist_4 = model_4.fit(x_train, y_train,
                     epochs = 25,
                     validation_spli = 0.3,
                     batch_size = 128)

plt.figure(figsize = (12, 4))
plt.subplot(1, 2, 1)
plt.plot(hist_4.history['loss'], 'b--', label = 'loss')
plt.plot(hist_4.history['val_loss'], 'r:', label = 'val_loss')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(hist_4.history['acc'], 'b--', label = 'accuracy')
plt.plot(hist_4.history['val_acc'], 'r:', label = 'val_accuracy')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

model_4.evaluate(x_test, y_test)

# 출력 결과
loss: 0.2023 - acc: 0.9313
[0.2023032009601593, 0.9312999844551086]

 

  - 학습 인자를 이전과 다르게 주면서 학습하면 더 잘 나올 것

1. 주요 레이어

  - Conv2D

  • tensorflow.keras.layers.Conv2D
  • tf.nn.conv2d
import tensorflow as tf
from tensorflow.keras.layers import Conv2D

import matplotlib.pyplot as plt

import numpy as np
from sklearn.datasets import load_sample_image

china = load_sample_image('china.jpg') / 255.
print(china.dtype)
print(china.shape)

# 출력 결과
float64
(427, 640, 3)


plt.imshow(china)
plt.show

flower = load_sample_image('flower.jpg') / 255.
print(flower.dtype)
print(flower.shape)

# 출력 결과
float64
(427, 640, 3)


plt.imshow(flower)
plt.show()

images = np.array([china, flower])
batch_size, height, width, channels = images.shape
print(images.shape)

# 출력 결과
(2, 427, 640, 3)
# 필터 적용
filters = np.zeros(shape = (7, 7, channels, 2), dtype = np.float32)
# 수직선 추가
filters[:, 3, :, 0] = 1
# 수평선 추가
filters[3, :, :, 1] = 1

print(filters.shape)

# 출력 결과
(7, 7, 3, 2)
# 텐서플로우로 conv2d 사용하는 방법
outputs = tf.nn.conv2d(images, filters, strides = 1, padding = 'SAME')
print(outputs.shape)
plt.imshow(outputs[0, :, :, 1], cmap = 'gray')
plt.show()

# 출력 결과
(2, 427, 640, 2)

plt.imshow(outputs[0, :, :, 0], cmap = 'gray')
plt.show()

# keras로 conv2d 사용하는 방법
conv = Conv2D(filters = 32, kernel_size = 3, strides = 1,
              padding = 'same', activation = 'relu')

 

  - MaxPool2D

  • 텐서플로우 저수준 딥러닝 API
    • tf.nn.max_pool
    • 사용자가 사이즈를 맞춰줘야함
    • keras의 모델의 층으로 사용하고  싶으면 Lambda 층으로 감싸줘야함
  • Keras 고수준 API
    • keras.layers.MaxPool2D
import tensorflow as tf
from tensorflow.keras.layers import MaxPool2D, Lambda

output = tf.nn.max_pool(images,
                        ksize = (1, 1, 1, 3),
                        strides = (1, 1, 1, 3),
                        padding = 'VALID')

# 텐서플로우에서 max pool 사용하는 방법
output_keras = Lambda(
    lambda X: tf.nn.maxpool(X, ksize = (1, 1, 1, 3), strides = (1, 1, 1, 3), padding = 'VALID')
)


# 케라스에서 max pool 사용하는 방법
max_pool = MaxPool2D(pool_size = 2)
flower = load_sample_image('flower.jpg') / 255.
print(flower.dtype)
print(flower.shape)

# 출력 결과
float64
(427, 640, 3)


# 차원 추가
flower = np.expand_dims(flower, axis = 0)
flower.shape

# 출력 결과
(1, 427, 640, 3)


# pool size를 2로 maxpool 적용으로 데이터 수는 1/2
output = Conv2D(filters = 32, kernel_size = 3, strides = 1, padding = 'SAME', activation = 'relu')(flower)
output = MaxPool2D(pool_size = 2)(output)
output.shape

# 출력 결과
TensorShape([1, 213, 320, 32])
plt.imshow(output[0, :, :, 8], cmap = 'gray')
plt.show()

사이즈가 줄어든 만큼 원본보다 해상도가 줄어듦

 

  - AvgPool2D

  • 텐서플로우 저수준 딥러닝 API
    • tf.nn.avg_pool
  • 케라스 고수준 API
    • keras.layers.AvgPool2D
from tensorflow.keras.layers import AvgPool2D

# 원본
flower.shape

# 출력 결과
(1, 427, 640, 3)


# AvgPool 적용(데이터 크기 1/2)
output = Conv2D(filters = 32, kernel_size = 3, strides = 1, padding = 'SAME', activation = 'relu')(flower)
output = AvgPool2D(pool_size = 2)(output)
output.shape

# 출력 결과
TensorShape([1, 213, 320, 32])
plt.imshow(output[0, :, : , 8], cmap = 'gray')
plt.show()

 

  - GlobalAvgPool2D(전역 평균 풀링 층)

  • keras.layers.GlobalAvgPool2D()
  • 특징 맵 각각의 평균값을 출력하는 것이므로, 특성맵에 있는 대부분의 정보를 잃음
  • 출력층에는 유용할 수 있음
from tensorflow.keras.layers import GlobalAvgPool2D

output = Conv2D(filters = 32, kernel_size=  3, strides = 1, padding = 'SAME', activation = 'relu')(flower)
output = GlobalAvgPool2D()(output)
output.shape

# 출력 결과
TensorShape([1, 32])

 

 

2. 예제로 보는 CNN 구조와 학습

● 일반적인 구조

  - modules import

%load_ext tensorboard

import datetime
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, AvgPool2D, Dropout

from tensorflow.keras import datasets
from tensorflow.keras.utils import to_categorical, plot_model

 

  - 데이터 로드 및 전처리

(x_train, y_train), (x_test, y_test) = datasets.fashion_mnist.load_data()

# 원본 데이터 형태
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

# 출력 결과
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)


# x 데이터에 축 하나씩 추가
x_train = x_train[:, :, :, np.newaxis]
x_test = x_test[:, :, :, np.newaxis]
print(x_train.shape)
print(x_test.shape)

# 출력 결과
(60000, 28, 28, 1)
(10000, 28, 28, 1)


# y 데이터 카테고리화
num_classes = 10

y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)
print(y_train.shape)
print(y_test.shape)

# 출력 결과
print(y_train.shape)
print(y_test.shape)


# x 데이터 표준화
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255.
x_test /= 255.

 

  -  CNN을 위한 간단한 모델

def build():
    model = Sequential([Conv2D(64, 7, activation = 'relu', padding = 'same', input_shape = [28, 28, 1]),
                        MaxPool2D(pool_size = 2),
                        Conv2D(128, 3, activation = 'relu', padding = 'same'),
                        MaxPool2D(pool_size = 2),
                        Conv2D(256, 3, activation = 'relu', padding = 'SAME'),
                        MaxPool2D(pool_size = 2),
                        Flatten(),
                        Dense(128, activation = 'relu'),
                        Dropout(0.5),
                        Dense(64, activation = 'relu'),
                        Dropout(0.5),
                        Dense(10, activation = 'softmax')])
    return model

 

  - 모델 컴파일

model = build()
model.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])
model.summary()

# 출력 결과
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_11 (Conv2D)          (None, 28, 28, 64)        3200      
                                                                 
 max_pooling2d_9 (MaxPooling  (None, 14, 14, 64)       0         
 2D)                                                             
                                                                 
 conv2d_12 (Conv2D)          (None, 14, 14, 128)       73856     
                                                                 
 max_pooling2d_10 (MaxPoolin  (None, 7, 7, 128)        0         
 g2D)                                                            
                                                                 
 conv2d_13 (Conv2D)          (None, 7, 7, 256)         295168    
                                                                 
 max_pooling2d_11 (MaxPoolin  (None, 3, 3, 256)        0         
 g2D)                                                            
                                                                 
 flatten_2 (Flatten)         (None, 2304)              0         
                                                                 
 dense_6 (Dense)             (None, 128)               295040    
                                                                 
 dropout_4 (Dropout)         (None, 128)               0         
                                                                 
 dense_7 (Dense)             (None, 64)                8256      
                                                                 
 dropout_5 (Dropout)         (None, 64)                0         
                                                                 
 dense_8 (Dense)             (None, 10)                650       
                                                                 
=================================================================
Total params: 676,170
Trainable params: 676,170
Non-trainable params: 0
_________________________________________________________________
plot_model(model)

 

  - Hyper Parameters

callbacks = [tf.keras.callbacks.TensorBoard(log_dir = './logs')]
EPOCHS = 20
BATCH_SIZE = 200
VERBOSE = 1

 

  - 모델 학습(GPU 추천)

  • validation_split을 통해 검증 데이터셋을 생성
hist = model.fit(x_train, y_train,
                 epochs = EPOCHS,
                 batch_size = BATCH_SIZE,
                 validation_split = 0.3,
                 callbacks = callbacks,
                 verbose = VERBOSE)

 

  • 텐서보드로 확인
log_dir = '.logs' + datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
%tensorboard --logdir logs/

 

● LeNet-5(코드 출처: https://datahacker.rs/lenet-5-implementation-tensorflow-2-0/)

  • CNN의 초창기 모델
  • 필기체 인식을 위한 모델

https://www.researchgate.net/figure/The-LeNet-5-Architecture-a-convolutional-neural-network_fig4_321586653

  - modules import

import datetime
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, AvgPool2D, Dropout

from tensorflow.keras import datasets
from tensorflow.keras.utils import to_categorical, plot_model

from sklearn.model_selection import train_test_split

 

  - 데이터 로드 및 전처리

(x_train_full, y_train_full), (x_test, y_test) = datasets.mnist.load_data()

x_train, x_val ,y_train, y_val = train_test_split(x_train_full, y_train_full, test_size = 0.3, random_state = 777)

x_train = x_train[..., np.newaxis]
x_val = x_val[..., np.newaxis]
x_test = x_test[..., np.newaxis]

num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_val = to_categorical(y_val, num_classes)
y_test = to_categorical(y_test, num_classes)

x_train = x_train.astype('float32')
x_val = x_val.astype('float32')
x_test = x_test.astype('float32')

x_train /= 255.
x_val /= 255.
x_test /= 255.

print(x_train.shape)
print(y_train.shape)
print(x_val.shape)
print(y_val.shape)
print(x_test.shape)
print(y_test.shape)

# 출력 결과
(42000, 28, 28, 1)
(42000, 10)
(18000, 28, 28, 1)
(18000, 10)
(10000, 28, 28, 1)
(10000, 10)

 

  -  모델 구성 및 컴파일

class LeNet(Sequential):
    def __init__(self, input_shape, nb_classes):
        super().__init__()

        self.add(Conv2D(6, kernel_size = (5, 5), strides = (1, 1), activation = 'tanh', input_shape = input_shape, padding = 'SAME'))
        self.add(AvgPool2D(pool_size = (2, 2), strides = (2, 2), padding = 'valid'))
        self.add(Conv2D(16, kernel_size = (5, 5), strides = (1, 1), axtivation = 'tanh', padding = 'valid'))
        self.sdd(AvgPool2D(pool_size = (2, 2), strides = (2, 2), padding = 'valid'))
        self.add(Flatten())
        self.add(Dense(120, activation = 'tanh'))
        self.add(Dense(84, activation = 'tanh'))
        self.add(Dense(nb_classes, activation = 'softmax'))

        self.compile(optimizer = 'adam',
                     loss = 'categorical_crossentropy',
                     metrics = ['accuracy'])

model = LeNet(input_shape = (28, 28, 1), nb_classes = 10)
model.summary()

# 출력 결과
Model: "le_net_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_17 (Conv2D)          (None, 28, 28, 6)         156       
                                                                 
 average_pooling2d_3 (Averag  (None, 14, 14, 6)        0         
 ePooling2D)                                                     
                                                                 
 conv2d_18 (Conv2D)          (None, 10, 10, 16)        2416      
                                                                 
 average_pooling2d_4 (Averag  (None, 5, 5, 16)         0         
 ePooling2D)                                                     
                                                                 
 flatten_3 (Flatten)         (None, 400)               0         
                                                                 
 dense_9 (Dense)             (None, 120)               48120     
                                                                 
 dense_10 (Dense)            (None, 84)                10164     
                                                                 
 dense_11 (Dense)            (None, 10)                850       
                                                                 
=================================================================
Total params: 61,706
Trainable params: 61,706
Non-trainable params: 0
_________________________________________________________________
plot_model(model, show_shapes = True)

 

  - Hyper Parameters

EPOCHS = 20
BATHC_SIZE = 128
VERBOSE = 1

 

  - 모델 학습

hist = model.fit(x_train, y_train,
                 epochs = EPOCHS,
                 batch_size = BATCH_SIZE,
                 validation_data = (x_val, y_val),
                 verbose = VERBOSE)

 

  - 학습 결과 시각화

plt.figure(figsize = (12, 6))

plt.subplot(1, 2, 1)
plt.plot(hist.history['loss'], 'b-', label = 'loss')
plt.plot(hist.history['val_loss'], 'm--', label = 'val_loss')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(hist.history['accuracy'], 'g-', label = 'accuracy')
plt.plot(hist.history['val_accuracy'], 'r-', label = 'val_accuracy')
plt.xlabel('Epochs')
plt.grid()
plt.legend()

plt.show()

 

  - 모델 평가

model.evaluate(x_test, y_test)

# 출력 결과
313/313 [==============================] - 3s 7ms/step - loss: 0.0564 - accuracy: 0.9854
[0.0564129501581192, 0.9854000210762024]

1. Data API

 

Module: tf.data  |  TensorFlow v2.12.0

tf.data.Dataset API for input pipelines.

www.tensorflow.org

  • tf.data.datasets

 

  - tf.data.datasets

import tensorflow as tf
import tensorflow_datasets as tfds

# 데이터셋 확인
builders = tfds.list_builders()
print(builders)

# 출력 결과
['abstract_reasoning',
'accentdb',
'aeslc',
'aflw2k3d',
'ag_news_subset',
...
'yelp_polarity_reviews',
'yes_no',
'youtube_vis']
# mnist 데이터 생성
data, info = tfds.load('mnist', with_info = True)
train_data, test_data = data['train'], data['test']

print(info)

# 출력 결과
tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='C:\\Users\\YONG\\tensorflow_datasets\\mnist\\3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)

 

  - tf.data

  • 생성
    • from_tensor_slices(): 개별 또는 다중 넘파이를 받고, 배치를 지원
    • from_tensors(): 배치를 지원하지 않음
    • froom_generator(): 생성자 하무에서 입력을 취함
  • 변환
    • batch(): 순차적으로 지정한 배치사이즈로 데이터셋을 분할
    • repeat(): 데이터를 복제
    • shuffle(): 데이터를 무작위로 섞음
    • map(): 데이터에 함수를 적용
    • filter(): 데이터를 거르고자 할 때 사용
  • 반복
    • next_batch = iterator.get_next() 사용

 

  - from_tensor_slices

import numpy as np

num_items = 20
num_list = np.arange(num_items)

num_list_dataset = tf.data.Dataset.from_tensor_slices(num_list)
num_list_dataset

# 출력 결과
# shape은 아직 없는 상태
<TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int32, name=None)>
for item in num_list_dataset:
    print(item)

# 출력 결과
# tensor 20개 생성
tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(8, shape=(), dtype=int32)
tf.Tensor(9, shape=(), dtype=int32)
tf.Tensor(10, shape=(), dtype=int32)
tf.Tensor(11, shape=(), dtype=int32)
tf.Tensor(12, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)
tf.Tensor(14, shape=(), dtype=int32)
tf.Tensor(15, shape=(), dtype=int32)
tf.Tensor(16, shape=(), dtype=int32)
tf.Tensor(17, shape=(), dtype=int32)
tf.Tensor(18, shape=(), dtype=int32)
tf.Tensor(19, shape=(), dtype=int32)

 

  - from_generator()

  • 해당 클래스 메서드를 사용하면 생성자에서 데이터셋 생성 가능
  • output_types, output_shapes 인수로 출력 자료형과 크기를 지정해주어야 함
import itertools

# i는 1씩 증가하고 i의 개수만큼 1을 배열에 추가
def gen():
    for i in itertools.count(1):
        yield(i, [1] * i)

# 위에서 만든 generator
# 출력 형식은 int64
# 출력 형태는 TensorShape([])
dataset = tf.data.Dataset.from_generator(
    gen,
    (tf.int64, tf.int64),
    (tf.TensorShape([]), tf.TensorShape([None]))
)
list(dataset.take(3).as_numpy_iterator())

# 출력 결과
[(1, array([1], dtype=int64)),
 (2, array([1, 1], dtype=int64)),
 (3, array([1, 1, 1], dtype=int64))]
# stop이 없이 위의 코드와 같이 gen을 돌리면 무한히 돌아감
def gen(stop):
    for i in itertools.count(1):
        if i < stop:
            yield(i, [1] * i)

dataset = tf.data.Dataset.from_generator(
    gen, args = [10],
    output_types = (tf.int64,  tf.int64),
    output_shapes = (tf.TensorShape([]), tf.TensorShape([None]))
)

list(dataset.take(5).as_numpy_iterator())

# 출력 결과
[(1, array([1], dtype=int64)),
 (2, array([1, 1], dtype=int64)),
 (3, array([1, 1, 1], dtype=int64)),
 (4, array([1, 1, 1, 1], dtype=int64)),
 (5, array([1, 1, 1, 1, 1], dtype=int64))]

 

  - batch, repeat

  • batch(): 배치 사이즈 크기
  • repeat(): 반복 횟수
# 배치사이즈 7, 3번 반복
dataset = num_list_dataset.repeat(3).batch(7)
for item in dataset:
    print(item)

# 출력 결과
# 배치 사이즈가 7이므로 7개씩 나뉨
# 그렇게 3번 반복
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int32)
tf.Tensor([ 7  8  9 10 11 12 13], shape=(7,), dtype=int32)
tf.Tensor([14 15 16 17 18 19  0], shape=(7,), dtype=int32)
tf.Tensor([1 2 3 4 5 6 7], shape=(7,), dtype=int32)
tf.Tensor([ 8  9 10 11 12 13 14], shape=(7,), dtype=int32)
tf.Tensor([15 16 17 18 19  0  1], shape=(7,), dtype=int32)
tf.Tensor([2 3 4 5 6 7 8], shape=(7,), dtype=int32)
tf.Tensor([ 9 10 11 12 13 14 15], shape=(7,), dtype=int32)
tf.Tensor([16 17 18 19], shape=(4,), dtype=int32)
# 뒤에 남는 수 없이 정확한 배치 사이즈로 나누어 떨어지도록 하고 싶으면
# drop_remainder = True 옵션 설정
dataset = num_list_dataset.repeat(3).batch(7, drop_remainder = True)

for item in dataset:
    print(item)

# 출력 결과
# 마지막에 4개만 있던 데이터 사라짐
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int32)
tf.Tensor([ 7  8  9 10 11 12 13], shape=(7,), dtype=int32)
tf.Tensor([14 15 16 17 18 19  0], shape=(7,), dtype=int32)
tf.Tensor([1 2 3 4 5 6 7], shape=(7,), dtype=int32)
tf.Tensor([ 8  9 10 11 12 13 14], shape=(7,), dtype=int32)
tf.Tensor([15 16 17 18 19  0  1], shape=(7,), dtype=int32)
tf.Tensor([2 3 4 5 6 7 8], shape=(7,), dtype=int32)
tf.Tensor([ 9 10 11 12 13 14 15], shape=(7,), dtype=int32)

 

  - map, filter

  • 전처리 단계레서 시행하여 원하지 않는 데이터를 거를 수 있음
  • tf.Tensor 자료형을 다룸
# map 함수 적용
from tensorflow.data import Dataset

# [1, 2, 3, 4, 5]의 리스트
dataset = Dataset.range(1, 6)
# 리스트 각 값에 2씩 곱하는 과정을 map 함수로 적용
dataset = dataset.map(lambda x: x * 2)
list(dataset.as_numpy_iterator())

# 출력 결과
[2, 4, 6, 8, 10]


# as_numpy_iterator()형태로 출력하지 않고 그대로 출력하는 경우
dataset = Dataset.range(5)
result = dataset.map(lambda x: x + 1)
result

# 출력 결과
<MapDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
# map 함수를 사용해 원하는 데이터만 전처리하여 가져올 수 있음
elements = [(1, 'one'), (2, 'two'), (3, 'three')]
dataset = Dataset.from_generator(lambda: elements, (tf.int32, tf.string))
result = dataset.map(lambda x_int, y_str: x_int)
list(result.as_numpy_iterator())

# 출력 결과
[1, 2, 3]
dataset = Dataset.range(3)

# 1. 기본적인 선언
def g(x):
    return tf.constant(10.5), tf.constant(['One', 'Two', 'Three'])

result = dataset.map(g)
# 각 원소의 스펙 확인
result.element_spec

# 출력 결과
(TensorSpec(shape=(), dtype=tf.float32, name=None),
 TensorSpec(shape=(3,), dtype=tf.string, name=None))
 
 
 # 2. tf.constant로 텐서플로우 타입을 명시하지 않아도 기본적으로 적용됨
 def h(x):
    return 10.5, ['One', 'Two', 'Three'], np.array([1., 2.], dtype = np.float64)

result = dataset.map(h)
result.element_spec

# 출력 결과
(TensorSpec(shape=(), dtype=tf.float32, name=None),
 TensorSpec(shape=(3,), dtype=tf.string, name=None),
 TensorSpec(shape=(2,), dtype=tf.float64, name=None))
 
 
 # 3. 내부에 데이터 리스트 형태 추가
 def i(x):
    return (10.5, [12.5, 11.1]), "One", "Two"

result = dataset.map(i)
result.element_spec

# 출력 결과
((TensorSpec(shape=(), dtype=tf.float32, name=None),
  TensorSpec(shape=(2,), dtype=tf.float32, name=None)),
 TensorSpec(shape=(), dtype=tf.string, name=None),
 TensorSpec(shape=(), dtype=tf.string, name=None))
# 1. 필터로 조건 지정
dataset = Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.filter(lambda x: x < 3)
list(dataset.as_numpy_iterator())

# 출력 결과
[1, 2]


# 2. 필터를 함수로 지정가능
# 1이랑 같은 것만 필터링
def filter_fn(x):
    return tf.math.equal(x, 1)

dataset = dataset.filter(filter_fn)
list(dataset.as_numpy_iterator())

 

  - shuffle, take

# 데이터 가져오기
dataset, info = tfds.load('imdb_reviews', with_info = True, as_supervised = True)

train_dataset = dataset['train']
# 5개로 구분하여 셔플하고, 2개를 가져오기
train_dataset = train_dataset.batch(5).shuffle(5).take(2)

for data in train_dataset:
    print(data)

# 출력 결과(영화 리뷰 데이터를 5개 가져와 섞은 뒤 그 중 2개를 출력한 것)
(<tf.Tensor: shape=(5,), dtype=string, numpy=
array([b'It was disgusting and painful. What a waste of a cast! I swear, the audience (1/2 full) laughed TWICE in 90 minutes. This is not a lie. Do not even rent it.<br /><br />Zeta Jones was just too mean to be believable.<br /><br />Cusack was OK. Just OK. I felt sorry for him (the actor) in case people remember this mess.<br /><br />Roberts was the same as she always is. Charming and sweet, but with no purpose. The "romance" with John was completely unbelievable.',
       b'This is a straight-to-video movie, so it should go without saying that it\'s not going to rival the first Lion King, but that said, this was downright good.<br /><br />My kids loved this, but that\'s a given, they love anything that\'s a cartoon. The big shock was that *I* liked it too, it was laugh out loud funny at some parts (even the fart jokes*), had lots of rather creative tie-ins with the first movie, and even some jokes that you had to be older to understand (but without being risqu\xc3\xa9 like in Shrek ["do you think he\'s compensating for something?"]).<br /><br />A special note on the fart jokes, I was surprised to find that none of the jokes were just toilet noises (in fact there were almost no noises/imagery at all, the references were actually rather subtle), they actually had a setup/punchline/etc, and were almost in good taste. I\'d like my kids to think that there\'s more to humor than going to the bathroom, and this movie is fine in those regards.<br /><br />Hmm what else? The music was so-so, not nearly as creative as in the first or second movie, but plenty of fun for the kids. No painfully corny moments, which was a blessing for me. A little action but nothing too scary (the Secret of NIMH gave my kids nightmares, not sure a G rating was appropriate for that one...)<br /><br />All in all I\'d say this is a great movie for kids of any age, one that\'s 100% safe to let them watch (I try not to be overly sensitive but I\'ve had to jump up and turn off the TV during a few movies that were less kid-appropriate than expected) - but you\'re safe to leave the room during this one. I\'d say stick around anyway though, you might find that you enjoy it too :)',
       b'Finally, Timon and Pumbaa in their own film...<br /><br />\'The Lion King 1 1/2: Hakuna Matata\' is an irreverent new take on a classic tale. Which classic tale, you ask? Why, \'The Lion King\' of course!<br /><br />Yep, if there\'s one thing that Disney is never short of, it\'s narcissism.<br /><br />But that doesn\'t mean that this isn\'t a good film. It\'s basically the events of \'The Lion King\' as told from Timon and Pumbaa\'s perspective. And it\'s because of this that you\'ll have to know the story of \'The Lion King\' by heart to see where they\'re coming from.<br /><br />Anyway, at one level I was watching this and thinking "Oh my god this is so lame..." and on another level I was having a ball. Much of the humour is predictable - I mean, when Pumbaa makes up two beds, a big one for himself and a small one for Timon, within the first nanosecond we all know that Timon is going to take the big one. But that doesn\'t stop it from being hilarious, which, IMO, is \'Hakuna Matata\' in a nutshell. It\'s not what happens, it\'s how.<br /><br />And a note of warning: there are also some fart jokes. Seriously, did you expect anything else in a film where Pumbaa takes centre stage? But as fart jokes go, these are especially good, and should satisfy even the most particular connoisseur.<br /><br />The returning voice talent is great. I\'m kinda surprised that some of the actors were willing to return, what with most of them only having two or three lines (if they\'re lucky). Whoopi Goldberg is particularly welcome.<br /><br />The music is also great. From \'Digga Tunnah\' at the start to \'That\'s all I need\', an adaption of \'Warthog Rhapsody\' (a song that was cut from \'The Lion King\' and is frankly much improved in this incarnation), the music leaves me with nothing to complain about whatsoever.<br /><br />In the end, Timon and Pumbaa are awesome characters, and while it may be argued that \'Hakuna Matata\' is simply an excuse to see them in various fun and assorted compromising situations then so be it. It\'s rare to find characters that you just want to spend time with.<br /><br />Am I starting to sound creepy?<br /><br />Either way, \'The Lion King 1 1/2\' is great if you\'ve seen \'The Lion King\' far too many times. Especially if you are right now thinking "Don\'t be silly, there\'s no such thing as seeing \'The Lion King\' too many times!"',
       b'Indian Directors have it tough, They have to compete with movies like "Laggan" where 11 henpecked,Castrated males defend their village and half of them are certifiable idiots. "Devdas", a hapless, fedar- festooned foreign return drinking to oblivion, with characters running in endless corridors oblivious to any one\'s feelings or sentiments-alas they live in an ornate squalor of red tapestry and pageantry. But to make a good movie, you have to tight-rope walk to appease the frontbenchers who are the quentessential gapers who are mesmerized with Split skirts and Dishum-Dishum fights preferably involving a nitwit "Bollywood" leading actor who is marginally handsome. So you can connect with a director who wants to tell a tale of Leonine village head who in own words "defending his Village" this is considered a violent movie or too masculine for a male audience. There are very few actors who can convey the anger and pathos like Nana Patekar (Narasimhan). Nana Patekar lets you in his courtyard and watch him beret and mock the Politician when his loyal admirers burst in laughter with every word of satire thrown at him, meanwhile his daughter is bathing his Grandson.This is as authentic a scene you can get in rural India. Nana Patekar is the essential actor who belongs to the old school of acting which is a disappearing breed in Hindi Films. The violence depicted is an intricate part of storytelling with Song&Dances thrown in for the gawkers without whom movies won\'t sell, a sad but true state of affairs. Faster this changes better for "Bollywood". All said and done this is one good Movie.',
       b"Nathan Detroit runs illegal craps games for high rollers in NYC, but the heat is on and he can't find a secure location. He bets chronic gambler Sky Masterson that Sky can't make a prim missionary, Sarah Brown, go out to dinner with him. Sky takes up the challenge, but both men have some surprises in store \xc2\x85<br /><br />This is one of those expensive fifties MGM musicals in splashy colour, with big sets, loud music, larger-than-life roles and performances to match; Broadway photographed for the big screen if you like that sort of thing, which I don't. My main problem with these type of movies is simply the music. I like all kinds of music, from Albinoni to ZZ Top, but Broadway show tunes in swing time with never-ending pah-pah-tah-dah trumpet flourishes at the end of every fourth bar aren't my cup of tea. This was written by the tag team of Frank Loesser, Mankiewicz, Jo Swerling and Abe Burrows (based on a couple of Damon Runyon stories), and while the plot is quite affable the songs are weak. Blaine's two numbers for example are identical, unnecessary, don't advance the plot and grate on the ears (and are also flagrantly misogynistic if that sort of thing bothers you). There are only two memorable tunes, Luck Be A Lady (sung by Brando, not Sinatra as you might expect) and Sit Down, You're Rockin' The Boat (nicely performed by Kaye) but you have to sit through two hours to get to them. The movie's trump card is a young Brando giving a thoughtful, laid-back performance; he also sings quite well and even dances a little, and is evenly matched with the always interesting Simmons. The sequence where the two of them escape to Havana for the night is a welcome respite from all the noise, bustle and vowel-murdering of Noo Yawk. Fans of musicals may dig this, but in my view a musical has to do something more than just film the stage show."],
      dtype=object)>, <tf.Tensor: shape=(5,), dtype=int64, numpy=array([0, 1, 1, 1, 0], dtype=int64)>)
(<tf.Tensor: shape=(5,), dtype=string, numpy=
array([b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.",
       b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.',
       b'Mann photographs the Alberta Rocky Mountains in a superb fashion, and Jimmy Stewart and Walter Brennan give enjoyable performances as they always seem to do. <br /><br />But come on Hollywood - a Mountie telling the people of Dawson City, Yukon to elect themselves a marshal (yes a marshal!) and to enforce the law themselves, then gunfighters battling it out on the streets for control of the town? <br /><br />Nothing even remotely resembling that happened on the Canadian side of the border during the Klondike gold rush. Mr. Mann and company appear to have mistaken Dawson City for Deadwood, the Canadian North for the American Wild West.<br /><br />Canadian viewers be prepared for a Reefer Madness type of enjoyable howl with this ludicrous plot, or, to shake your head in disgust.',
       b'This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own business as you descend into a big arm-chair and mellow for a couple of hours. Wonderful performances from Cher and Nicolas Cage (as always) gently row the plot along. There are no rapids to cross, no dangerous waters, just a warm and witty paddle through New York life at its best. A family film in every sense and one that deserves the praise it received.',
       b'As others have mentioned, all the women that go nude in this film are mostly absolutely gorgeous. The plot very ably shows the hypocrisy of the female libido. When men are around they want to be pursued, but when no "men" are around, they become the pursuers of a 14 year old boy. And the boy becomes a man really fast (we should all be so lucky at this age!). He then gets up the courage to pursue his true love.'],
      dtype=object)>, <tf.Tensor: shape=(5,), dtype=int64, numpy=array([0, 0, 0, 1, 1], dtype=int64)>)

 

  - get_next()

dataset = Dataset.range(2)
for element in dataset:
    print(element)

# 출력 결과
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
dataset = Dataset.range(2)
iterator = iter(dataset)

print(dataset)
# 다음 데이터에 접근
print(iterator.get_next())
print(iterator.get_next())

# 출력 결과
<RangeDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
a = np.random.randint(0, 10, size = (2, 3))
print(a)

dataset = Dataset.from_tensor_slices(a)
iterator = iter(dataset)

print(iterator.get_next())
print(iterator.get_next())

# 출력 결과
# a의 원래 2행짜리 데이터에서 get_next()가 실행될 때마다 다음 행에 접근
[[0 7 2]
 [6 1 4]]
tf.Tensor([0 7 2], shape=(3,), dtype=int32)
tf.Tensor([6 1 4], shape=(3,), dtype=int32)

 

2. tf.dataset을 이용한 Fashion-MNIST 분류

  - modules import

import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten, Dropout, Activation, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.datasets.fashion_mnist import load_data

 

  - 데이터 로드

(x_train, y_train), (x_test, y_test) = load_data()

# 데이터 형태 확인
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

# 출력 결과
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)

 

  - 데이터 전처리

x_train = x_train / 255.
x_test = x_test / 255.

 

  - tf.data 이용

train_ds = Dataset.from_tensor_slices((x_train, y_train))
train_ds = train_ds.shuffle(1000)
train_ds = train_ds.batch(32)

test_ds = Dataset.from_tensor_slices((x_test, y_test))
test_ds = test_ds.batch(32)

 

  - 데이터 확인

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneakers', 'Bag', 'Ankle boot']

for image, label in train_ds.take(2):
    plt.title("{}".format(class_names[label[0]]))
    plt.imshow(image[0, :, :], cmap = 'gray')
    plt.show()

 

  - 모델 생성

  • 임의의 모델
def build_model():
    input = Input(shape = (28, 28), name = 'input')
    flatten = Flatten(input_shape = [28, 28], name = 'flatten')(input)
    hidden1 = Dense(256, kernel_initializer = 'he_normal', name = 'hidden1')(flatten)
    hidden1 = BatchNormalization()(hidden1)
    hidden1 = Activation('relu')(hidden1)
    dropout1 = Dropout(0.5)(hidden1)

    hidden2 = Dense(100, kernel_initializer = 'he_normal', name = 'hidden2')(dropout1)
    hidden2 = BatchNormalization()(hidden2)
    hidden2 = Activation('relu')(hidden2)
    dropout2 = Dropout(0.5)(hidden2)

    hidden3 = Dense(100, kernel_initializer = 'he_normal', name = 'hidden3')(dropout2)
    hidden3 = BatchNormalization()(hidden3)
    hidden3 = Activation('relu')(hidden3)
    dropout3 = Dropout(0.5)(hidden3)

    hidden4 = Dense(50, kernel_initializer = 'he_normal', name = 'hidden4')(dropout3)
    hidden4 = BatchNormalization()(hidden4)
    hidden4 = Activation('relu')(hidden4)
    dropout4 = Dropout(0.5)(hidden4)

    output = Dense(10, activation = 'softmax', name = 'output')(dropout4)

    model = Model(inputs = [input], outputs = [output])

    return model
model = build_model()

model.summary()

# 출력 결과
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (InputLayer)          [(None, 28, 28)]          0         
                                                                 
 flatten (Flatten)           (None, 784)               0         
                                                                 
 hidden1 (Dense)             (None, 256)               200960    
                                                                 
 batch_normalization (BatchN  (None, 256)              1024      
 ormalization)                                                   
                                                                 
 activation (Activation)     (None, 256)               0         
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 hidden2 (Dense)             (None, 100)               25700     
                                                                 
 batch_normalization_1 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_1 (Activation)   (None, 100)               0         
                                                                 
 dropout_1 (Dropout)         (None, 100)               0         
                                                                 
 hidden3 (Dense)             (None, 100)               10100     
                                                                 
 batch_normalization_2 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_2 (Activation)   (None, 100)               0         
                                                                 
 dropout_2 (Dropout)         (None, 100)               0         
                                                                 
 hidden4 (Dense)             (None, 50)                5050      
                                                                 
 batch_normalization_3 (Batc  (None, 50)               200       
 hNormalization)                                                 
                                                                 
 activation_3 (Activation)   (None, 50)                0         
                                                                 
 dropout_3 (Dropout)         (None, 50)                0         
                                                                 
 output (Dense)              (None, 10)                510       
                                                                 
=================================================================
Total params: 244,344
Trainable params: 243,332
Non-trainable params: 1,012
_________________________________________________________________

 

  - 모델 컴파일

  • 평가(metrics)방식의 다른 방법
    • tf.keras.metrics.Mean
    • tf.keras.metrics.SparseCategoricalAccuracy
  • 위 두 방식을 이용하여 loss 값을 좀 더 smooth하게 만들기(평균을 내는 방식)
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

train_loss = tf.keras.metrics.Mean(name = 'train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name = 'train_accuracy')

test_loss = tf.keras.metrics.Mean(name = 'test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name = 'test_accuracy')

 

  - 모델 학습

  • tf.function으로 인해 학습이 시작되면 그래프를 생성하여 속도가 빠름
# tf.function 사용시 오토 그래프 생성으로 성능 향상
@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        predictions = model(images)
        loss = loss_object(labels, predictions)
    
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    train_loss(loss)
    train_accuracy(labels, predictions)

@tf.function
def test_step(images, labels):
    predictions = model(images)
    t_loss = loss_object(labels, predictions)

    test_loss(t_loss)
    test_accuracy(labels, predictions)

epochs = 20

for epoch in range(epochs):
    for images, labels in train_ds:
        train_step(images, labels)
    
    for test_images, test_labels in test_ds:
        test_step(test_images, test_labels)

    template = "Epochs: {:3d}\tLoss: {:.4f}\tAccuracy: {:.4f}\tTest Loss: {:.4f}\tTest Accuracy: {:.4f}"
    print(template.format(epoch + 1,
                          train_loss.result(),
                          train_accuracy.result() * 100,
                          test_loss.result(),
                          test_accuracy.result() * 100))

# 출력 결과
Epochs:   1	Loss: 0.3975	Accuracy: 85.4906	Test Loss: 0.3896	Test Accuracy: 85.6400
Epochs:   2	Loss: 0.3756	Accuracy: 86.2650	Test Loss: 0.3840	Test Accuracy: 85.9050
Epochs:   3	Loss: 0.3586	Accuracy: 86.8523	Test Loss: 0.3768	Test Accuracy: 86.2340
Epochs:   4	Loss: 0.3450	Accuracy: 87.3364	Test Loss: 0.3706	Test Accuracy: 86.4583
Epochs:   5	Loss: 0.3333	Accuracy: 87.7414	Test Loss: 0.3684	Test Accuracy: 86.6014
Epochs:   6	Loss: 0.3232	Accuracy: 88.0877	Test Loss: 0.3648	Test Accuracy: 86.7925
Epochs:   7	Loss: 0.3144	Accuracy: 88.3983	Test Loss: 0.3639	Test Accuracy: 86.8289
Epochs:   8	Loss: 0.3066	Accuracy: 88.6765	Test Loss: 0.3618	Test Accuracy: 87.0010
Epochs:   9	Loss: 0.2994	Accuracy: 88.9215	Test Loss: 0.3595	Test Accuracy: 87.1400
Epochs:  10	Loss: 0.2927	Accuracy: 89.1588	Test Loss: 0.3595	Test Accuracy: 87.1833
Epochs:  11	Loss: 0.2864	Accuracy: 89.3894	Test Loss: 0.3573	Test Accuracy: 87.3015
Epochs:  12	Loss: 0.2808	Accuracy: 89.5865	Test Loss: 0.3570	Test Accuracy: 87.3336
Epochs:  13	Loss: 0.2753	Accuracy: 89.7777	Test Loss: 0.3583	Test Accuracy: 87.4113
Epochs:  14	Loss: 0.2703	Accuracy: 89.9568	Test Loss: 0.3577	Test Accuracy: 87.4900
Epochs:  15	Loss: 0.2654	Accuracy: 90.1251	Test Loss: 0.3583	Test Accuracy: 87.5524
Epochs:  16	Loss: 0.2609	Accuracy: 90.2880	Test Loss: 0.3615	Test Accuracy: 87.5750
Epochs:  17	Loss: 0.2565	Accuracy: 90.4376	Test Loss: 0.3626	Test Accuracy: 87.6426
Epochs:  18	Loss: 0.2525	Accuracy: 90.5751	Test Loss: 0.3634	Test Accuracy: 87.6910
Epochs:  19	Loss: 0.2484	Accuracy: 90.7171	Test Loss: 0.3651	Test Accuracy: 87.7324
Epochs:  20	Loss: 0.2446	Accuracy: 90.8512	Test Loss: 0.3667	Test Accuracy: 87.7555

 

  - 모델 학습: 2번째 방법(Keras)

from sklearn.model_selection import train_test_split

(x_train_full, y_train_full), (x_test, y_test) = load_data()

x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full, test_size = 0.3, random_state = 777)

x_train = x_train / 255.
x_val = x_val / 255.
x_test = x_test / 255.

print(x_train.shape)
print(y_train.shape)
print(x_val.shape)
print(y_val.shape)
print(x_test.shape)
print(y_test.shape)

model = build_model()
model.compile(optimizer = 'sgd',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

model.summary()

# 출력 결과
(42000, 28, 28)
(42000,)
(18000, 28, 28)
(18000,)
(10000, 28, 28)
(10000,)
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (InputLayer)          [(None, 28, 28)]          0         
                                                                 
 flatten (Flatten)           (None, 784)               0         
                                                                 
 hidden1 (Dense)             (None, 256)               200960    
                                                                 
 batch_normalization_4 (Batc  (None, 256)              1024      
 hNormalization)                                                 
                                                                 
 activation_4 (Activation)   (None, 256)               0         
                                                                 
 dropout_4 (Dropout)         (None, 256)               0         
                                                                 
 hidden2 (Dense)             (None, 100)               25700     
                                                                 
 batch_normalization_5 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_5 (Activation)   (None, 100)               0         
                                                                 
 dropout_5 (Dropout)         (None, 100)               0         
                                                                 
 hidden3 (Dense)             (None, 100)               10100     
                                                                 
 batch_normalization_6 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_6 (Activation)   (None, 100)               0         
                                                                 
 dropout_6 (Dropout)         (None, 100)               0         
                                                                 
 hidden4 (Dense)             (None, 50)                5050      
                                                                 
 batch_normalization_7 (Batc  (None, 50)               200       
 hNormalization)                                                 
                                                                 
 activation_7 (Activation)   (None, 50)                0         
                                                                 
 dropout_7 (Dropout)         (None, 50)                0         
                                                                 
 output (Dense)              (None, 10)                510       
                                                                 
=================================================================
Total params: 244,344
Trainable params: 243,332
Non-trainable params: 1,012
_________________________________________________________________
from tensorflow.keras.callbacks import EarlyStopping

early_stopping_cb = EarlyStopping(patience = 3, monitor = 'val_loss',
                                  restore_best_weights = True)
history = model.fit(x_train, y_train,
                    batch_size = 256,
                    epochs = 200,
                    shuffle = True,
                    validation_data = (x_val, y_val),
                    callbacks = [early_stopping_cb])

  - 모델 평가

model.evaluate(x_test, y_test, batch_size = 100)

# 출력 결과
loss: 0.4427 - accuracy: 0.8464
[0.44270941615104675, 0.8464000225067139]

 

  - 결과 확인

# 첫번째 테스트 데이터 결과
test_img = x_test[0, :, :]
plt.title(class_names[y_test[0]])
plt.imshow(test_img, cmap = 'gray')
plt.show()

pred = model.predict(test_img.reshape(1, 28, 28))
pred.shape

# 출력 결과
(1, 10)


pred

# 출력 결과
array([[8.9198991e-05, 3.5745958e-05, 7.4570953e-06, 1.5882608e-05,
        8.0741156e-06, 3.3398017e-02, 4.0778108e-05, 1.1560775e-01,
        7.1698561e-04, 8.5008013e-01]], dtype=float32)


# 가장 확률이 높은 것을 정답으로 출력
class_names[np.argmax(pred)]

# 출력 결과
'Ankle boot'

 

  - Test Batch Dataset

test_batch = x_test[:32, :, :]
test_batch_y = y_test[:32]
print(test_batch.shape)

# 출력 결과
(32, 28, 28)
preds = model.predict(test_batch)
preds.shape

# 출력 결과
(32, 10)
pred_arg = np.argmax(preds, -1)

num_rows = 8
num_cols = 4
num_images = num_rows * num_cols

plt.figure(figsize = (16, 10))

for idx in range(1, 33, 1):
    plt.subplot(num_rows, num_cols, idx)
    plt.title('Predicted: {}, True: {}'.format(class_names[pred_arg[idx - 1]],
                                               class_names[test_batch_y[idx - 1]]))
    plt.imshow(test_batch[idx - 1], cmap = 'gray')

plt.show()

+ Recent posts