1. Data API

 

Module: tf.data  |  TensorFlow v2.12.0

tf.data.Dataset API for input pipelines.

www.tensorflow.org

  • tf.data.datasets

 

  - tf.data.datasets

import tensorflow as tf
import tensorflow_datasets as tfds

# 데이터셋 확인
builders = tfds.list_builders()
print(builders)

# 출력 결과
['abstract_reasoning',
'accentdb',
'aeslc',
'aflw2k3d',
'ag_news_subset',
...
'yelp_polarity_reviews',
'yes_no',
'youtube_vis']
# mnist 데이터 생성
data, info = tfds.load('mnist', with_info = True)
train_data, test_data = data['train'], data['test']

print(info)

# 출력 결과
tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='C:\\Users\\YONG\\tensorflow_datasets\\mnist\\3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)

 

  - tf.data

  • 생성
    • from_tensor_slices(): 개별 또는 다중 넘파이를 받고, 배치를 지원
    • from_tensors(): 배치를 지원하지 않음
    • froom_generator(): 생성자 하무에서 입력을 취함
  • 변환
    • batch(): 순차적으로 지정한 배치사이즈로 데이터셋을 분할
    • repeat(): 데이터를 복제
    • shuffle(): 데이터를 무작위로 섞음
    • map(): 데이터에 함수를 적용
    • filter(): 데이터를 거르고자 할 때 사용
  • 반복
    • next_batch = iterator.get_next() 사용

 

  - from_tensor_slices

import numpy as np

num_items = 20
num_list = np.arange(num_items)

num_list_dataset = tf.data.Dataset.from_tensor_slices(num_list)
num_list_dataset

# 출력 결과
# shape은 아직 없는 상태
<TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int32, name=None)>
for item in num_list_dataset:
    print(item)

# 출력 결과
# tensor 20개 생성
tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
tf.Tensor(5, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(7, shape=(), dtype=int32)
tf.Tensor(8, shape=(), dtype=int32)
tf.Tensor(9, shape=(), dtype=int32)
tf.Tensor(10, shape=(), dtype=int32)
tf.Tensor(11, shape=(), dtype=int32)
tf.Tensor(12, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)
tf.Tensor(14, shape=(), dtype=int32)
tf.Tensor(15, shape=(), dtype=int32)
tf.Tensor(16, shape=(), dtype=int32)
tf.Tensor(17, shape=(), dtype=int32)
tf.Tensor(18, shape=(), dtype=int32)
tf.Tensor(19, shape=(), dtype=int32)

 

  - from_generator()

  • 해당 클래스 메서드를 사용하면 생성자에서 데이터셋 생성 가능
  • output_types, output_shapes 인수로 출력 자료형과 크기를 지정해주어야 함
import itertools

# i는 1씩 증가하고 i의 개수만큼 1을 배열에 추가
def gen():
    for i in itertools.count(1):
        yield(i, [1] * i)

# 위에서 만든 generator
# 출력 형식은 int64
# 출력 형태는 TensorShape([])
dataset = tf.data.Dataset.from_generator(
    gen,
    (tf.int64, tf.int64),
    (tf.TensorShape([]), tf.TensorShape([None]))
)
list(dataset.take(3).as_numpy_iterator())

# 출력 결과
[(1, array([1], dtype=int64)),
 (2, array([1, 1], dtype=int64)),
 (3, array([1, 1, 1], dtype=int64))]
# stop이 없이 위의 코드와 같이 gen을 돌리면 무한히 돌아감
def gen(stop):
    for i in itertools.count(1):
        if i < stop:
            yield(i, [1] * i)

dataset = tf.data.Dataset.from_generator(
    gen, args = [10],
    output_types = (tf.int64,  tf.int64),
    output_shapes = (tf.TensorShape([]), tf.TensorShape([None]))
)

list(dataset.take(5).as_numpy_iterator())

# 출력 결과
[(1, array([1], dtype=int64)),
 (2, array([1, 1], dtype=int64)),
 (3, array([1, 1, 1], dtype=int64)),
 (4, array([1, 1, 1, 1], dtype=int64)),
 (5, array([1, 1, 1, 1, 1], dtype=int64))]

 

  - batch, repeat

  • batch(): 배치 사이즈 크기
  • repeat(): 반복 횟수
# 배치사이즈 7, 3번 반복
dataset = num_list_dataset.repeat(3).batch(7)
for item in dataset:
    print(item)

# 출력 결과
# 배치 사이즈가 7이므로 7개씩 나뉨
# 그렇게 3번 반복
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int32)
tf.Tensor([ 7  8  9 10 11 12 13], shape=(7,), dtype=int32)
tf.Tensor([14 15 16 17 18 19  0], shape=(7,), dtype=int32)
tf.Tensor([1 2 3 4 5 6 7], shape=(7,), dtype=int32)
tf.Tensor([ 8  9 10 11 12 13 14], shape=(7,), dtype=int32)
tf.Tensor([15 16 17 18 19  0  1], shape=(7,), dtype=int32)
tf.Tensor([2 3 4 5 6 7 8], shape=(7,), dtype=int32)
tf.Tensor([ 9 10 11 12 13 14 15], shape=(7,), dtype=int32)
tf.Tensor([16 17 18 19], shape=(4,), dtype=int32)
# 뒤에 남는 수 없이 정확한 배치 사이즈로 나누어 떨어지도록 하고 싶으면
# drop_remainder = True 옵션 설정
dataset = num_list_dataset.repeat(3).batch(7, drop_remainder = True)

for item in dataset:
    print(item)

# 출력 결과
# 마지막에 4개만 있던 데이터 사라짐
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int32)
tf.Tensor([ 7  8  9 10 11 12 13], shape=(7,), dtype=int32)
tf.Tensor([14 15 16 17 18 19  0], shape=(7,), dtype=int32)
tf.Tensor([1 2 3 4 5 6 7], shape=(7,), dtype=int32)
tf.Tensor([ 8  9 10 11 12 13 14], shape=(7,), dtype=int32)
tf.Tensor([15 16 17 18 19  0  1], shape=(7,), dtype=int32)
tf.Tensor([2 3 4 5 6 7 8], shape=(7,), dtype=int32)
tf.Tensor([ 9 10 11 12 13 14 15], shape=(7,), dtype=int32)

 

  - map, filter

  • 전처리 단계레서 시행하여 원하지 않는 데이터를 거를 수 있음
  • tf.Tensor 자료형을 다룸
# map 함수 적용
from tensorflow.data import Dataset

# [1, 2, 3, 4, 5]의 리스트
dataset = Dataset.range(1, 6)
# 리스트 각 값에 2씩 곱하는 과정을 map 함수로 적용
dataset = dataset.map(lambda x: x * 2)
list(dataset.as_numpy_iterator())

# 출력 결과
[2, 4, 6, 8, 10]


# as_numpy_iterator()형태로 출력하지 않고 그대로 출력하는 경우
dataset = Dataset.range(5)
result = dataset.map(lambda x: x + 1)
result

# 출력 결과
<MapDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
# map 함수를 사용해 원하는 데이터만 전처리하여 가져올 수 있음
elements = [(1, 'one'), (2, 'two'), (3, 'three')]
dataset = Dataset.from_generator(lambda: elements, (tf.int32, tf.string))
result = dataset.map(lambda x_int, y_str: x_int)
list(result.as_numpy_iterator())

# 출력 결과
[1, 2, 3]
dataset = Dataset.range(3)

# 1. 기본적인 선언
def g(x):
    return tf.constant(10.5), tf.constant(['One', 'Two', 'Three'])

result = dataset.map(g)
# 각 원소의 스펙 확인
result.element_spec

# 출력 결과
(TensorSpec(shape=(), dtype=tf.float32, name=None),
 TensorSpec(shape=(3,), dtype=tf.string, name=None))
 
 
 # 2. tf.constant로 텐서플로우 타입을 명시하지 않아도 기본적으로 적용됨
 def h(x):
    return 10.5, ['One', 'Two', 'Three'], np.array([1., 2.], dtype = np.float64)

result = dataset.map(h)
result.element_spec

# 출력 결과
(TensorSpec(shape=(), dtype=tf.float32, name=None),
 TensorSpec(shape=(3,), dtype=tf.string, name=None),
 TensorSpec(shape=(2,), dtype=tf.float64, name=None))
 
 
 # 3. 내부에 데이터 리스트 형태 추가
 def i(x):
    return (10.5, [12.5, 11.1]), "One", "Two"

result = dataset.map(i)
result.element_spec

# 출력 결과
((TensorSpec(shape=(), dtype=tf.float32, name=None),
  TensorSpec(shape=(2,), dtype=tf.float32, name=None)),
 TensorSpec(shape=(), dtype=tf.string, name=None),
 TensorSpec(shape=(), dtype=tf.string, name=None))
# 1. 필터로 조건 지정
dataset = Dataset.from_tensor_slices([1, 2, 3])
dataset = dataset.filter(lambda x: x < 3)
list(dataset.as_numpy_iterator())

# 출력 결과
[1, 2]


# 2. 필터를 함수로 지정가능
# 1이랑 같은 것만 필터링
def filter_fn(x):
    return tf.math.equal(x, 1)

dataset = dataset.filter(filter_fn)
list(dataset.as_numpy_iterator())

 

  - shuffle, take

# 데이터 가져오기
dataset, info = tfds.load('imdb_reviews', with_info = True, as_supervised = True)

train_dataset = dataset['train']
# 5개로 구분하여 셔플하고, 2개를 가져오기
train_dataset = train_dataset.batch(5).shuffle(5).take(2)

for data in train_dataset:
    print(data)

# 출력 결과(영화 리뷰 데이터를 5개 가져와 섞은 뒤 그 중 2개를 출력한 것)
(<tf.Tensor: shape=(5,), dtype=string, numpy=
array([b'It was disgusting and painful. What a waste of a cast! I swear, the audience (1/2 full) laughed TWICE in 90 minutes. This is not a lie. Do not even rent it.<br /><br />Zeta Jones was just too mean to be believable.<br /><br />Cusack was OK. Just OK. I felt sorry for him (the actor) in case people remember this mess.<br /><br />Roberts was the same as she always is. Charming and sweet, but with no purpose. The "romance" with John was completely unbelievable.',
       b'This is a straight-to-video movie, so it should go without saying that it\'s not going to rival the first Lion King, but that said, this was downright good.<br /><br />My kids loved this, but that\'s a given, they love anything that\'s a cartoon. The big shock was that *I* liked it too, it was laugh out loud funny at some parts (even the fart jokes*), had lots of rather creative tie-ins with the first movie, and even some jokes that you had to be older to understand (but without being risqu\xc3\xa9 like in Shrek ["do you think he\'s compensating for something?"]).<br /><br />A special note on the fart jokes, I was surprised to find that none of the jokes were just toilet noises (in fact there were almost no noises/imagery at all, the references were actually rather subtle), they actually had a setup/punchline/etc, and were almost in good taste. I\'d like my kids to think that there\'s more to humor than going to the bathroom, and this movie is fine in those regards.<br /><br />Hmm what else? The music was so-so, not nearly as creative as in the first or second movie, but plenty of fun for the kids. No painfully corny moments, which was a blessing for me. A little action but nothing too scary (the Secret of NIMH gave my kids nightmares, not sure a G rating was appropriate for that one...)<br /><br />All in all I\'d say this is a great movie for kids of any age, one that\'s 100% safe to let them watch (I try not to be overly sensitive but I\'ve had to jump up and turn off the TV during a few movies that were less kid-appropriate than expected) - but you\'re safe to leave the room during this one. I\'d say stick around anyway though, you might find that you enjoy it too :)',
       b'Finally, Timon and Pumbaa in their own film...<br /><br />\'The Lion King 1 1/2: Hakuna Matata\' is an irreverent new take on a classic tale. Which classic tale, you ask? Why, \'The Lion King\' of course!<br /><br />Yep, if there\'s one thing that Disney is never short of, it\'s narcissism.<br /><br />But that doesn\'t mean that this isn\'t a good film. It\'s basically the events of \'The Lion King\' as told from Timon and Pumbaa\'s perspective. And it\'s because of this that you\'ll have to know the story of \'The Lion King\' by heart to see where they\'re coming from.<br /><br />Anyway, at one level I was watching this and thinking "Oh my god this is so lame..." and on another level I was having a ball. Much of the humour is predictable - I mean, when Pumbaa makes up two beds, a big one for himself and a small one for Timon, within the first nanosecond we all know that Timon is going to take the big one. But that doesn\'t stop it from being hilarious, which, IMO, is \'Hakuna Matata\' in a nutshell. It\'s not what happens, it\'s how.<br /><br />And a note of warning: there are also some fart jokes. Seriously, did you expect anything else in a film where Pumbaa takes centre stage? But as fart jokes go, these are especially good, and should satisfy even the most particular connoisseur.<br /><br />The returning voice talent is great. I\'m kinda surprised that some of the actors were willing to return, what with most of them only having two or three lines (if they\'re lucky). Whoopi Goldberg is particularly welcome.<br /><br />The music is also great. From \'Digga Tunnah\' at the start to \'That\'s all I need\', an adaption of \'Warthog Rhapsody\' (a song that was cut from \'The Lion King\' and is frankly much improved in this incarnation), the music leaves me with nothing to complain about whatsoever.<br /><br />In the end, Timon and Pumbaa are awesome characters, and while it may be argued that \'Hakuna Matata\' is simply an excuse to see them in various fun and assorted compromising situations then so be it. It\'s rare to find characters that you just want to spend time with.<br /><br />Am I starting to sound creepy?<br /><br />Either way, \'The Lion King 1 1/2\' is great if you\'ve seen \'The Lion King\' far too many times. Especially if you are right now thinking "Don\'t be silly, there\'s no such thing as seeing \'The Lion King\' too many times!"',
       b'Indian Directors have it tough, They have to compete with movies like "Laggan" where 11 henpecked,Castrated males defend their village and half of them are certifiable idiots. "Devdas", a hapless, fedar- festooned foreign return drinking to oblivion, with characters running in endless corridors oblivious to any one\'s feelings or sentiments-alas they live in an ornate squalor of red tapestry and pageantry. But to make a good movie, you have to tight-rope walk to appease the frontbenchers who are the quentessential gapers who are mesmerized with Split skirts and Dishum-Dishum fights preferably involving a nitwit "Bollywood" leading actor who is marginally handsome. So you can connect with a director who wants to tell a tale of Leonine village head who in own words "defending his Village" this is considered a violent movie or too masculine for a male audience. There are very few actors who can convey the anger and pathos like Nana Patekar (Narasimhan). Nana Patekar lets you in his courtyard and watch him beret and mock the Politician when his loyal admirers burst in laughter with every word of satire thrown at him, meanwhile his daughter is bathing his Grandson.This is as authentic a scene you can get in rural India. Nana Patekar is the essential actor who belongs to the old school of acting which is a disappearing breed in Hindi Films. The violence depicted is an intricate part of storytelling with Song&Dances thrown in for the gawkers without whom movies won\'t sell, a sad but true state of affairs. Faster this changes better for "Bollywood". All said and done this is one good Movie.',
       b"Nathan Detroit runs illegal craps games for high rollers in NYC, but the heat is on and he can't find a secure location. He bets chronic gambler Sky Masterson that Sky can't make a prim missionary, Sarah Brown, go out to dinner with him. Sky takes up the challenge, but both men have some surprises in store \xc2\x85<br /><br />This is one of those expensive fifties MGM musicals in splashy colour, with big sets, loud music, larger-than-life roles and performances to match; Broadway photographed for the big screen if you like that sort of thing, which I don't. My main problem with these type of movies is simply the music. I like all kinds of music, from Albinoni to ZZ Top, but Broadway show tunes in swing time with never-ending pah-pah-tah-dah trumpet flourishes at the end of every fourth bar aren't my cup of tea. This was written by the tag team of Frank Loesser, Mankiewicz, Jo Swerling and Abe Burrows (based on a couple of Damon Runyon stories), and while the plot is quite affable the songs are weak. Blaine's two numbers for example are identical, unnecessary, don't advance the plot and grate on the ears (and are also flagrantly misogynistic if that sort of thing bothers you). There are only two memorable tunes, Luck Be A Lady (sung by Brando, not Sinatra as you might expect) and Sit Down, You're Rockin' The Boat (nicely performed by Kaye) but you have to sit through two hours to get to them. The movie's trump card is a young Brando giving a thoughtful, laid-back performance; he also sings quite well and even dances a little, and is evenly matched with the always interesting Simmons. The sequence where the two of them escape to Havana for the night is a welcome respite from all the noise, bustle and vowel-murdering of Noo Yawk. Fans of musicals may dig this, but in my view a musical has to do something more than just film the stage show."],
      dtype=object)>, <tf.Tensor: shape=(5,), dtype=int64, numpy=array([0, 1, 1, 1, 0], dtype=int64)>)
(<tf.Tensor: shape=(5,), dtype=string, numpy=
array([b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.",
       b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.',
       b'Mann photographs the Alberta Rocky Mountains in a superb fashion, and Jimmy Stewart and Walter Brennan give enjoyable performances as they always seem to do. <br /><br />But come on Hollywood - a Mountie telling the people of Dawson City, Yukon to elect themselves a marshal (yes a marshal!) and to enforce the law themselves, then gunfighters battling it out on the streets for control of the town? <br /><br />Nothing even remotely resembling that happened on the Canadian side of the border during the Klondike gold rush. Mr. Mann and company appear to have mistaken Dawson City for Deadwood, the Canadian North for the American Wild West.<br /><br />Canadian viewers be prepared for a Reefer Madness type of enjoyable howl with this ludicrous plot, or, to shake your head in disgust.',
       b'This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own business as you descend into a big arm-chair and mellow for a couple of hours. Wonderful performances from Cher and Nicolas Cage (as always) gently row the plot along. There are no rapids to cross, no dangerous waters, just a warm and witty paddle through New York life at its best. A family film in every sense and one that deserves the praise it received.',
       b'As others have mentioned, all the women that go nude in this film are mostly absolutely gorgeous. The plot very ably shows the hypocrisy of the female libido. When men are around they want to be pursued, but when no "men" are around, they become the pursuers of a 14 year old boy. And the boy becomes a man really fast (we should all be so lucky at this age!). He then gets up the courage to pursue his true love.'],
      dtype=object)>, <tf.Tensor: shape=(5,), dtype=int64, numpy=array([0, 0, 0, 1, 1], dtype=int64)>)

 

  - get_next()

dataset = Dataset.range(2)
for element in dataset:
    print(element)

# 출력 결과
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
dataset = Dataset.range(2)
iterator = iter(dataset)

print(dataset)
# 다음 데이터에 접근
print(iterator.get_next())
print(iterator.get_next())

# 출력 결과
<RangeDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
a = np.random.randint(0, 10, size = (2, 3))
print(a)

dataset = Dataset.from_tensor_slices(a)
iterator = iter(dataset)

print(iterator.get_next())
print(iterator.get_next())

# 출력 결과
# a의 원래 2행짜리 데이터에서 get_next()가 실행될 때마다 다음 행에 접근
[[0 7 2]
 [6 1 4]]
tf.Tensor([0 7 2], shape=(3,), dtype=int32)
tf.Tensor([6 1 4], shape=(3,), dtype=int32)

 

2. tf.dataset을 이용한 Fashion-MNIST 분류

  - modules import

import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten, Dropout, Activation, BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.datasets.fashion_mnist import load_data

 

  - 데이터 로드

(x_train, y_train), (x_test, y_test) = load_data()

# 데이터 형태 확인
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

# 출력 결과
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)

 

  - 데이터 전처리

x_train = x_train / 255.
x_test = x_test / 255.

 

  - tf.data 이용

train_ds = Dataset.from_tensor_slices((x_train, y_train))
train_ds = train_ds.shuffle(1000)
train_ds = train_ds.batch(32)

test_ds = Dataset.from_tensor_slices((x_test, y_test))
test_ds = test_ds.batch(32)

 

  - 데이터 확인

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneakers', 'Bag', 'Ankle boot']

for image, label in train_ds.take(2):
    plt.title("{}".format(class_names[label[0]]))
    plt.imshow(image[0, :, :], cmap = 'gray')
    plt.show()

 

  - 모델 생성

  • 임의의 모델
def build_model():
    input = Input(shape = (28, 28), name = 'input')
    flatten = Flatten(input_shape = [28, 28], name = 'flatten')(input)
    hidden1 = Dense(256, kernel_initializer = 'he_normal', name = 'hidden1')(flatten)
    hidden1 = BatchNormalization()(hidden1)
    hidden1 = Activation('relu')(hidden1)
    dropout1 = Dropout(0.5)(hidden1)

    hidden2 = Dense(100, kernel_initializer = 'he_normal', name = 'hidden2')(dropout1)
    hidden2 = BatchNormalization()(hidden2)
    hidden2 = Activation('relu')(hidden2)
    dropout2 = Dropout(0.5)(hidden2)

    hidden3 = Dense(100, kernel_initializer = 'he_normal', name = 'hidden3')(dropout2)
    hidden3 = BatchNormalization()(hidden3)
    hidden3 = Activation('relu')(hidden3)
    dropout3 = Dropout(0.5)(hidden3)

    hidden4 = Dense(50, kernel_initializer = 'he_normal', name = 'hidden4')(dropout3)
    hidden4 = BatchNormalization()(hidden4)
    hidden4 = Activation('relu')(hidden4)
    dropout4 = Dropout(0.5)(hidden4)

    output = Dense(10, activation = 'softmax', name = 'output')(dropout4)

    model = Model(inputs = [input], outputs = [output])

    return model
model = build_model()

model.summary()

# 출력 결과
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (InputLayer)          [(None, 28, 28)]          0         
                                                                 
 flatten (Flatten)           (None, 784)               0         
                                                                 
 hidden1 (Dense)             (None, 256)               200960    
                                                                 
 batch_normalization (BatchN  (None, 256)              1024      
 ormalization)                                                   
                                                                 
 activation (Activation)     (None, 256)               0         
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 hidden2 (Dense)             (None, 100)               25700     
                                                                 
 batch_normalization_1 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_1 (Activation)   (None, 100)               0         
                                                                 
 dropout_1 (Dropout)         (None, 100)               0         
                                                                 
 hidden3 (Dense)             (None, 100)               10100     
                                                                 
 batch_normalization_2 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_2 (Activation)   (None, 100)               0         
                                                                 
 dropout_2 (Dropout)         (None, 100)               0         
                                                                 
 hidden4 (Dense)             (None, 50)                5050      
                                                                 
 batch_normalization_3 (Batc  (None, 50)               200       
 hNormalization)                                                 
                                                                 
 activation_3 (Activation)   (None, 50)                0         
                                                                 
 dropout_3 (Dropout)         (None, 50)                0         
                                                                 
 output (Dense)              (None, 10)                510       
                                                                 
=================================================================
Total params: 244,344
Trainable params: 243,332
Non-trainable params: 1,012
_________________________________________________________________

 

  - 모델 컴파일

  • 평가(metrics)방식의 다른 방법
    • tf.keras.metrics.Mean
    • tf.keras.metrics.SparseCategoricalAccuracy
  • 위 두 방식을 이용하여 loss 값을 좀 더 smooth하게 만들기(평균을 내는 방식)
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

train_loss = tf.keras.metrics.Mean(name = 'train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name = 'train_accuracy')

test_loss = tf.keras.metrics.Mean(name = 'test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name = 'test_accuracy')

 

  - 모델 학습

  • tf.function으로 인해 학습이 시작되면 그래프를 생성하여 속도가 빠름
# tf.function 사용시 오토 그래프 생성으로 성능 향상
@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        predictions = model(images)
        loss = loss_object(labels, predictions)
    
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    train_loss(loss)
    train_accuracy(labels, predictions)

@tf.function
def test_step(images, labels):
    predictions = model(images)
    t_loss = loss_object(labels, predictions)

    test_loss(t_loss)
    test_accuracy(labels, predictions)

epochs = 20

for epoch in range(epochs):
    for images, labels in train_ds:
        train_step(images, labels)
    
    for test_images, test_labels in test_ds:
        test_step(test_images, test_labels)

    template = "Epochs: {:3d}\tLoss: {:.4f}\tAccuracy: {:.4f}\tTest Loss: {:.4f}\tTest Accuracy: {:.4f}"
    print(template.format(epoch + 1,
                          train_loss.result(),
                          train_accuracy.result() * 100,
                          test_loss.result(),
                          test_accuracy.result() * 100))

# 출력 결과
Epochs:   1	Loss: 0.3975	Accuracy: 85.4906	Test Loss: 0.3896	Test Accuracy: 85.6400
Epochs:   2	Loss: 0.3756	Accuracy: 86.2650	Test Loss: 0.3840	Test Accuracy: 85.9050
Epochs:   3	Loss: 0.3586	Accuracy: 86.8523	Test Loss: 0.3768	Test Accuracy: 86.2340
Epochs:   4	Loss: 0.3450	Accuracy: 87.3364	Test Loss: 0.3706	Test Accuracy: 86.4583
Epochs:   5	Loss: 0.3333	Accuracy: 87.7414	Test Loss: 0.3684	Test Accuracy: 86.6014
Epochs:   6	Loss: 0.3232	Accuracy: 88.0877	Test Loss: 0.3648	Test Accuracy: 86.7925
Epochs:   7	Loss: 0.3144	Accuracy: 88.3983	Test Loss: 0.3639	Test Accuracy: 86.8289
Epochs:   8	Loss: 0.3066	Accuracy: 88.6765	Test Loss: 0.3618	Test Accuracy: 87.0010
Epochs:   9	Loss: 0.2994	Accuracy: 88.9215	Test Loss: 0.3595	Test Accuracy: 87.1400
Epochs:  10	Loss: 0.2927	Accuracy: 89.1588	Test Loss: 0.3595	Test Accuracy: 87.1833
Epochs:  11	Loss: 0.2864	Accuracy: 89.3894	Test Loss: 0.3573	Test Accuracy: 87.3015
Epochs:  12	Loss: 0.2808	Accuracy: 89.5865	Test Loss: 0.3570	Test Accuracy: 87.3336
Epochs:  13	Loss: 0.2753	Accuracy: 89.7777	Test Loss: 0.3583	Test Accuracy: 87.4113
Epochs:  14	Loss: 0.2703	Accuracy: 89.9568	Test Loss: 0.3577	Test Accuracy: 87.4900
Epochs:  15	Loss: 0.2654	Accuracy: 90.1251	Test Loss: 0.3583	Test Accuracy: 87.5524
Epochs:  16	Loss: 0.2609	Accuracy: 90.2880	Test Loss: 0.3615	Test Accuracy: 87.5750
Epochs:  17	Loss: 0.2565	Accuracy: 90.4376	Test Loss: 0.3626	Test Accuracy: 87.6426
Epochs:  18	Loss: 0.2525	Accuracy: 90.5751	Test Loss: 0.3634	Test Accuracy: 87.6910
Epochs:  19	Loss: 0.2484	Accuracy: 90.7171	Test Loss: 0.3651	Test Accuracy: 87.7324
Epochs:  20	Loss: 0.2446	Accuracy: 90.8512	Test Loss: 0.3667	Test Accuracy: 87.7555

 

  - 모델 학습: 2번째 방법(Keras)

from sklearn.model_selection import train_test_split

(x_train_full, y_train_full), (x_test, y_test) = load_data()

x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full, test_size = 0.3, random_state = 777)

x_train = x_train / 255.
x_val = x_val / 255.
x_test = x_test / 255.

print(x_train.shape)
print(y_train.shape)
print(x_val.shape)
print(y_val.shape)
print(x_test.shape)
print(y_test.shape)

model = build_model()
model.compile(optimizer = 'sgd',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

model.summary()

# 출력 결과
(42000, 28, 28)
(42000,)
(18000, 28, 28)
(18000,)
(10000, 28, 28)
(10000,)
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (InputLayer)          [(None, 28, 28)]          0         
                                                                 
 flatten (Flatten)           (None, 784)               0         
                                                                 
 hidden1 (Dense)             (None, 256)               200960    
                                                                 
 batch_normalization_4 (Batc  (None, 256)              1024      
 hNormalization)                                                 
                                                                 
 activation_4 (Activation)   (None, 256)               0         
                                                                 
 dropout_4 (Dropout)         (None, 256)               0         
                                                                 
 hidden2 (Dense)             (None, 100)               25700     
                                                                 
 batch_normalization_5 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_5 (Activation)   (None, 100)               0         
                                                                 
 dropout_5 (Dropout)         (None, 100)               0         
                                                                 
 hidden3 (Dense)             (None, 100)               10100     
                                                                 
 batch_normalization_6 (Batc  (None, 100)              400       
 hNormalization)                                                 
                                                                 
 activation_6 (Activation)   (None, 100)               0         
                                                                 
 dropout_6 (Dropout)         (None, 100)               0         
                                                                 
 hidden4 (Dense)             (None, 50)                5050      
                                                                 
 batch_normalization_7 (Batc  (None, 50)               200       
 hNormalization)                                                 
                                                                 
 activation_7 (Activation)   (None, 50)                0         
                                                                 
 dropout_7 (Dropout)         (None, 50)                0         
                                                                 
 output (Dense)              (None, 10)                510       
                                                                 
=================================================================
Total params: 244,344
Trainable params: 243,332
Non-trainable params: 1,012
_________________________________________________________________
from tensorflow.keras.callbacks import EarlyStopping

early_stopping_cb = EarlyStopping(patience = 3, monitor = 'val_loss',
                                  restore_best_weights = True)
history = model.fit(x_train, y_train,
                    batch_size = 256,
                    epochs = 200,
                    shuffle = True,
                    validation_data = (x_val, y_val),
                    callbacks = [early_stopping_cb])

  - 모델 평가

model.evaluate(x_test, y_test, batch_size = 100)

# 출력 결과
loss: 0.4427 - accuracy: 0.8464
[0.44270941615104675, 0.8464000225067139]

 

  - 결과 확인

# 첫번째 테스트 데이터 결과
test_img = x_test[0, :, :]
plt.title(class_names[y_test[0]])
plt.imshow(test_img, cmap = 'gray')
plt.show()

pred = model.predict(test_img.reshape(1, 28, 28))
pred.shape

# 출력 결과
(1, 10)


pred

# 출력 결과
array([[8.9198991e-05, 3.5745958e-05, 7.4570953e-06, 1.5882608e-05,
        8.0741156e-06, 3.3398017e-02, 4.0778108e-05, 1.1560775e-01,
        7.1698561e-04, 8.5008013e-01]], dtype=float32)


# 가장 확률이 높은 것을 정답으로 출력
class_names[np.argmax(pred)]

# 출력 결과
'Ankle boot'

 

  - Test Batch Dataset

test_batch = x_test[:32, :, :]
test_batch_y = y_test[:32]
print(test_batch.shape)

# 출력 결과
(32, 28, 28)
preds = model.predict(test_batch)
preds.shape

# 출력 결과
(32, 10)
pred_arg = np.argmax(preds, -1)

num_rows = 8
num_cols = 4
num_images = num_rows * num_cols

plt.figure(figsize = (16, 10))

for idx in range(1, 33, 1):
    plt.subplot(num_rows, num_cols, idx)
    plt.title('Predicted: {}, True: {}'.format(class_names[pred_arg[idx - 1]],
                                               class_names[test_batch_y[idx - 1]]))
    plt.imshow(test_batch[idx - 1], cmap = 'gray')

plt.show()

● 과대적합, 과소적합을 막기 위한 방법들

  • 모델의 크기 축소
  • 초기화
  • 옵티마이저
  • 배치 정규화
  • 규제화

 

1. 모델의 크기 축소

  • 가장 단순한 방법
  • 모델의 크기를 줄인다는 것은 학습 파라미터의 수를 줄이는 것
# 데이터 준비
from tensorflow.keras.datasets import imdb
import numpy as np

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = 10000)

def vectorize_seq(seqs, dim = 10000):
    results = np.zeros((len(seqs), dim))
    for i, seq in enumerate(seqs):
        results[i, seq] = 1.
    
    return results

x_train = vectorize_seq(train_data)
x_test = vectorize_seq(test_data)

y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
# 모델1
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model_1 = Sequential([Dense(16, activation = 'relu', input_shape = (10000, ), name = 'input'),
                      Dense(16, activation = 'relu', name = 'hidden'),
                      Dense(1, activation = 'sigmoid', name = 'output')])
model_1.summary()

# 출력 결과
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (Dense)               (None, 16)                160016    
                                                                 
 hidden (Dense)              (None, 16)                272       
                                                                 
 output (Dense)              (None, 1)                 17        
                                                                 
=================================================================
Total params: 160,305
Trainable params: 160,305
Non-trainable params: 0
_________________________________________________________________
# 모델2
model_2 = Sequential([Dense(7, activation = 'relu', input_shape = (10000, ), name = 'input2'),
                      Dense(7, activation = 'relu', name = 'hidden2'),
                      Dense(1, activation = 'sigmoid', name = 'output2')])
model_2.summary()

# 출력 결과
odel: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input2 (Dense)              (None, 7)                 70007     
                                                                 
 hidden2 (Dense)             (None, 7)                 56        
                                                                 
 output2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 70,071
Trainable params: 70,071
Non-trainable params: 0
_________________________________________________________________
  • 모델1과 모델2 차이점은 모델의 크기 차이
# 모델 학습
model_1.compile(optimizer = 'rmsprop',
                loss = 'binary_crossentropy',
                metrics = ['acc'])
model_2.compile(optimizer = 'rmsprop',
                loss = 'binary_crossentropy',
                metrics = ['acc'])

model_1_hist = model_1.fit(x_train, y_train,
                           epochs = 20,
                           batch_size = 512,
                           validation_data = (x_test, y_test))
model_2_hist = model_2.fit(x_train, y_train,
                           epochs = 20,
                           batch_size = 512,
                           validation_data = (x_test, y_test))
# 비교
epochs = range(1, 21)
model_1_val_loss = model_1_hist.history['val_loss']
model_2_val_loss = model_2_hist.history['val_loss']

import matplotlib.pyplot as plt

plt.plot(epochs, model_1_val_loss, 'r+', label = 'Model_1')
plt.plot(epochs, model_2_val_loss, 'bo', label = 'Model_2')
plt.xlabel('Epochs')
plt.ylabel('Validation Loss')
plt.legend()
plt.grid()
plt.show()

  • model_2(더 작은 모델)이 조금 더 나중에 과대적합 발생

 

 

2. 모델의 크기 축소(2)

# 모델 구성
model_3 = Sequential([Dense(1024, activation = 'relu', input_shape = (10000, ), name = 'input3'),
                      Dense(1024, activation = 'relu', name = 'hidden3'),
                      Dense(1, activation = 'sigmoid', name = 'output3')])

model_3.compile(optimizer = 'rmsprop',
                loss = 'binary_crossentropy',
                metrics = ['acc'])

model_3.summary()

# 출력 결과
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input3 (Dense)              (None, 1024)              10241024  
                                                                 
 hidden3 (Dense)             (None, 1024)              1049600   
                                                                 
 output3 (Dense)             (None, 1)                 1025      
                                                                 
=================================================================
Total params: 11,291,649
Trainable params: 11,291,649
Non-trainable params: 0
_________________________________________________________________
# 모델 학습
model_3_hist = model_3.fit(x_train, y_train,
                           epochs = 20,
                           batch_size = 512,
                           validation_data = (x_test, y_test))
# 시각화
model_3_val_loss = model_3_hist.history['val_loss']

plt.plot(epochs, model_1_val_loss, 'r+', label = 'Model_1')
plt.plot(epochs, model_2_val_loss, 'r+', label = 'Model_2')
plt.plot(epochs, model_3_val_loss, 'r+', label = 'Model_3')
plt.xlabel('Epochs')
plt.ylabel('Validation Loss')
plt.legend()
plt.grid()
plt.show()

  • 볼륨이 큰 신경망일수록 빠르게 훈련데이터 모델링 가능(학습 손실이 낮아짐)
  • 과대적합에는 더욱 민감해짐
  • 이는 학습-검증 데이터의 손실을 보면 알 수 있음
# 학습 데이터의 loss 값도 비교
model_1_train_loss = model_1_hist.history['loss']
model_2_train_loss = model_2_hist.history['loss']
model_3_train_loss = model_3_hist.history['loss']

plt.plot(epochs, model_1_train_loss, 'r+', label = 'Model_1')
plt.plot(epochs, model_2_train_loss, 'r+', label = 'Model_2')
plt.plot(epochs, model_3_train_loss, 'r+', label = 'Model_3')
plt.xlabel('Epochs')
plt.ylabel('Training Loss')
plt.legend()
plt.grid()
plt.show()

 

 

3. 가중치 초기화

  - 초기화 전략

  • Glorot Initialization(Xavier)
    • 활성화 함수
      • 없음
      • tanh
      • sigmoid
      • softmax
  • He Initialization
    • 활성화 함수
      • ReLU
      • LeakyReLU
      • ELU 등
from tensorflow.keras.layers import Dense, LeakyReLU, Activation
from tensorflow.keras.models import Sequential

model = Sequential([Dense(30, kernel_initializer = 'he_normal', input_shape = [10, 10]),
                    LeakyReLU(alpha = 0.2),
                    Dense(1, kernel_initializer = 'he_normal'),
                    Activation('softmax')])
model.summary()

# 출력 결과
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10, 30)            330       
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 10, 30)            0         
                                                                 
 dense_1 (Dense)             (None, 10, 1)             31        
                                                                 
 activation (Activation)     (None, 10, 1)             0         
                                                                 
=================================================================
Total params: 361
Trainable params: 361
Non-trainable params: 0
_________________________________________________________________

 

 

4. 고속 옵티마이저

  - 모멘텀 최적화

$$ v \leftarrow \alpha v - \gamma \frac{\partial L}{\partial W} $$

$$ W \leftarrow W + v $$

  • \(\alpha\): 관성계수
  • \(v\): 속도
  • \(\gamma\): 학습률
  • \(\frac{\partial L}{\partial W}\): 손실함수에 대한 미분
import tensorflow as tf
from tensorflow.keras.optimizers import SGD

# momentum 값이 관성계수(알파값)
optimizer = SGD(learning_rate = 0.001, momentum = 0.9)

 

  - 네스테로프(Nesterov)

  • 모멘텀의 방향으로 조금 앞선 곳에서 손실함수의 미분을 구함
  • 시간이 지날수록 조금 더 빨리 최솟값에 도달
    \(m \leftarrow \beta m - \eta \bigtriangledown_{\theta}J(\theta + \beta m)\)
    \(\theta \leftarrow \theta + m\)
  • \(h\): 기존의 기울기를 제곱하여 더한 값
  • \(\eta\): 학습률
  • \(\bigtriangledown_{\theta}J(\theta)\): \(\theta\)에 대한 미분(그라디언트)

http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture08.pdf

optimizer = SGD(learning_rate = 0.001, momentum = 0.9, nesterov = True)

 

  - AdaGrad

  • 보통 간단한 모델에는 효과 좋을 수는 있으나, 심층 신경망 모델에서는 사용 X(사용하지 않는 것이 좋은 것으로 밝혀짐)
    \(h \leftarrow h+\frac{\partial L}{\partial W} \odot \frac{\partial L}{\partial W}\)
    \(W \leftarrow W+\gamma \frac{1}{\sqrt{h}} \frac{\partial L}{\partial W}\)
  • \(h\): 기존의 기울기를제곱하여 더한 값
  • \(\gamma\): 학습률
  • \(\frac{\partial L}{\partial W}\): \(W\)에 대한 미분
from tensorflow.keras.optimizers import Adagrad

optimizer = Adagrad(learning_rate = 0.001)

 

  - RMSprop

$$ s \leftarrow \beta s+(1-\beta)\bigtriangledown_{\theta}J(\theta) \otimes \bigtriangledown_{\theta}J(\theta) $$

$$ \theta \leftarrow \theta - \eta \bigtriangledown_{\theta}J(\theta)\oslash \sqrt{s+\epsilon} $$

  • \(s\): 그라디언트의 제곱을 감쇠율을 곱한 후 더함
  • \(\eta\): 학습률
  • \(\bigtriangledown_{\theta}J(\theta)\): 손실함수의 미분값
from tensorflow.keras.optimizers import RMSprop

optimizer = RMSprop(learning_rate = 0.001, rho = 0.9)

 

  - Adam

$$ m \leftarrow \beta_{1}m-(1-\beta_{1})\frac{\partial L}{\partial W} $$

$$ s \leftarrow \beta_{2}s+(1-\beta_{2}\frac{\partial L}{\partial W}\odot\frac{\partial L}{\partial W} $$

$$ \hat{m} \leftarrow \frac{m}{1-\beta^{t}_{1}} $$

$$ \hat{s} \leftarrow \frac{s}{1-\beta^{t}_{2}} $$

$$ W \leftarrow W+\gamma \hat{m} \oslash \sqrt{\hat{s}+\epsilon} $$

  • \(\beta\): 지수 평균의 업데이트 계수
  • \(\gamma\): 학습률
  • \(\beta_{1} \approx 0.9, \beta_{2} \approx 0.999 \)
  • \( \frac{\partial L}{\partial W} \): \(W\)에 대한 미분
from tensorflow.keras.optimizers import Adam

# beta_1과 beta_2에 지정한 값은 디폴트 값으로, 어느정도 가장 좋은 값이라고 증명된 값
optimizer = Adam(learning_rate = 0.001, beta_1 = 0.9, beta_2 = 0.999)

 

 

5. 배치 정규화

  • 모델에 주입되는 샘플들을 균일하게 만드는 방법
  • 학습 후 새로운 데이터에 잘 일반화 할 수 있도록 도와줌
  • 데이터 전처리 단계에서 진행해도 되지만 정규화가 되어서 layer에 들어갔다는 보장이 없음
  • 주로 Dense 또는 Conv2D Layer 후, 활성화 함수 이전에 놓임
from tensorflow.keras.layers import BatchNormalization, Dense, Activation
from tensorflow.keras.utils import plot_model

model = Sequential()
model.add(Dense(32, input_shape = (28 * 28, ), kernel_initializer = 'he_normal'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.summary()
plot_model(model, show_shapes = True)

# 출력 결과
Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_4 (Dense)             (None, 32)                25120     
                                                                 
 batch_normalization_1 (Batc  (None, 32)               128       
 hNormalization)                                                 
                                                                 
 activation_2 (Activation)   (None, 32)                0         
                                                                 
=================================================================
Total params: 25,248
Trainable params: 25,184
Non-trainable params: 64
_________________________________________________________________

 

 

6. 규제화

  • 복잡한 네트워크 일수록 네트워크 복잡도에 제한을 두어 가중치가 작은 값을 가지도록 함
  • 가중치의 분포가 더 균일하게 됨
  • 네트워크 손실함수에 큰 가중치에 연관된 비용을 추가
    • L1 규제: 가중치의 절댓값에 비례하는 비용이 추가
    • L2 규제: 가중치의 제곱에 비례한느 비용이 추가(흔히 가중치 감쇠라고도 불림)
    • 위의 두 규제가 합쳐진 경우도 존재
# l2 모델 구성
from tensorflow.keras.regularizers import l1, l2, l1_l2

l2_model = Sequential([Dense(16, kernel_regularizer = l2(0.001), activation = 'relu', input_shape = (10000, )),
                       Dense(16, kernel_regularizer = l2(0.001), activation = 'relu'),
                       Dense(1, activation = 'sigmoid')])
l2_model.compile(optimizer = 'rmsprop',
                 loss = 'binary_crossentropy',
                 metrics = ['acc'])
l2_model.summary()
plot_model(l2_model, show_shapes = True)

# 출력 결과
Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_5 (Dense)             (None, 16)                160016    
                                                                 
 dense_6 (Dense)             (None, 16)                272       
                                                                 
 dense_7 (Dense)             (None, 1)                 17        
                                                                 
=================================================================
Total params: 160,305
Trainable params: 160,305
Non-trainable params: 0
_________________________________________________________________

# l2 모델 학습
l2_model_hist = l2_model.fit(x_train, y_train,
                             epochs = 20,
                             batch_size = 512,
                             validation_data = (x_test, y_test))
# l2 모델 시각화
l2_model_val_loss = l2_model_hist.history['val_loss']

epochs = range(1, 21)
plt.plot(epochs, model_1_val_loss, 'r+', label = 'Model_1')
plt.plot(epochs, l2_model_val_loss, 'bo', label = 'Model_L2-regularized')
plt.xlabel('Epochs')
plt.ylabel('Validation Loss')
plt.legend()
plt.grid()
plt.show()

 

# l1 모델 구성
l1_model = Sequential([Dense(16, kernel_regularizer = l1(0.001), activation = 'relu', input_shape = (10000, )),
                       Dense(16, kernel_regularizer = l1(0.001), activation = 'relu'),
                       Dense(1, activation = 'sigmoid')])
l1_model.compile(optimizer = 'rmsprop',
                 loss = 'binary_crossentropy',
                 metrics = ['acc'])
l1_model.summary()
plot_model(l1_model, show_shapes = True)


# 출력 결과
Model: "sequential_17"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_14 (Dense)            (None, 16)                160016    
                                                                 
 dense_15 (Dense)            (None, 16)                272       
                                                                 
 dense_16 (Dense)            (None, 1)                 17        
                                                                 
=================================================================
Total params: 160,305
Trainable params: 160,305
Non-trainable params: 0
_________________________________________________________________

# l1 모델 학습
l1_model_hist = l1_model.fit(x_train, y_train,
                             epochs = 20,
                             batch_size = 512,
                             validation_data = (x_test, y_test))
# l1 모델 시각화
l1_model_val_loss = l1_model_hist.history['val_loss']

epochs = range(1, 21)
plt.plot(epochs, model_1_val_loss, 'r+', label = 'Model_1')
plt.plot(epochs, l1_model_val_loss, 'bo', label = 'Model_L1-regularized')
plt.plot(epochs, l2_model_val_loss, 'g--', label = 'Model_L2-regularized')
plt.xlabel('Epochs')
plt.ylabel('Validation Loss')
plt.legend()
plt.grid()
plt.show()

 

# l1_l2 모델 구성
l1_l2_model = Sequential([Dense(16, kernel_regularizer = l1_l2(l1 = 0.0001, l2 = 0.0001), activation = 'relu', input_shape = (10000, )),
                          Dense(16, kernel_regularizer = l1_l2(l1 = 0.0001, l2 = 0.0001), activation = 'relu'),
                          Dense(1, activation = 'sigmoid')])
l1_l2_model.compile(optimizer = 'rmsprop',
                    loss = 'binary_crossentropy',
                    metrics = ['acc'])
l1_l2_model.summary()
plot_model(l1_l2_model, show_shapes = True)

# 출력 결과
l1_l2_model = Sequential([Dense(16, kernel_regularizer = l1_l2(l1 = 0.0001, l2 = 0.0001), activation = 'relu', input_shape = (10000, )),
                          Dense(16, kernel_regularizer = l1_l2(l1 = 0.0001, l2 = 0.0001), activation = 'relu'),
                          Dense(1, activation = 'sigmoid')])
l1_l2_model.compile(optimizer = 'rmsprop',
                    loss = 'binary_crossentropy',
                    metrics = ['acc'])
l1_l2_model.summary()
plot_model(l1_l2_model, show_shapes = True)

 

# l1_l2 모델 학습
l1_l2_model_hist = l1_l2_model.fit(x_train, y_train,
                                   epochs = 20,
                                   batch_size = 512,
                                   validation_data = (x_test, y_test))
# l1_l2 모델 시각화
l1_l2_model_val_loss = l1_l2_model_hist.history['val_loss']

epochs = range(1, 21)
plt.plot(epochs, model_1_val_loss, 'r+', label = 'Model_1')
plt.plot(epochs, l1_l2_model_val_loss, 'ko', label = 'Model_L1_L2-regularized')
plt.plot(epochs, l1_model_val_loss, 'bo', label = 'Model_L1-regularized')
plt.plot(epochs, l2_model_val_loss, 'g--', label = 'Model_L2-regularized')
plt.xlabel('Epochs')
plt.ylabel('Validation Loss')
plt.legend()
plt.grid()
plt.show()

 

 

7. 드롭아웃(Dropout)

  • 신경망을 위해 사용되는 규제 기법 중 가장 효과적이고 널리 사용되는 방법
  • 신경망의 레이어에 드롭아웃을 적용하면 훈련하는 동안 무작위로 층의 일부 특성(노드)를 제외
    • 예를 들어, 벡터 [1.0, 3.2, 0.6, 0.8, 1.1]에 대해 드롭아웃을 적용하면 무작위로 0으로 바뀜
      ([0, 3.2, 0.6, 0.8, 0]과 같이 바뀜)
    • 보통 0.2~0.5 사이의 비율로 지정됨
  • 테스트 단계에서는 그 어떤 노드도 드롭아웃 되지 않음
    • 대신 해당 레이어의 출력 노드를 드롭아웃 비율에 맞게 줄여줌
# 모델 구성
from tensorflow.keras.layers import Dropout

dropout_model = Sequential([Dense(16, activation = 'relu', input_shape = (10000, )),
                             Dropout(0.5),
                             Dense(16, activation = 'relu'),
                             Dropout(0.5),
                             Dense(1, activation = 'sigmoid')])
dropout_model.compile(optimizer = 'rmsprop',
                      loss = 'binary_crossentropy',
                      metrics = ['acc'])
dropout_model.summary()
plot_model(dropout_model, show_shapes = True)

# 출력 결과
Model: "sequential_19"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_20 (Dense)            (None, 16)                160016    
                                                                 
 dropout (Dropout)           (None, 16)                0         
                                                                 
 dense_21 (Dense)            (None, 16)                272       
                                                                 
 dropout_1 (Dropout)         (None, 16)                0         
                                                                 
 dense_22 (Dense)            (None, 1)                 17        
                                                                 
=================================================================
Total params: 160,305
Trainable params: 160,305
Non-trainable params: 0
_________________________________________________________________

# 모델 학습
dropout_model_hist = dropout_model.fit(x_train, y_train,
                                       epochs = 20,
                                       batch_size = 512,
                                       validation_data = (x_test, y_test))
# 시각화
dropout_model_val_loss = dropout_model_hist.history['val_loss']

epochs = range(1, 21)
plt.plot(epochs, model_1_val_loss, 'r+', label = 'Model_1')
plt.plot(epochs, dropout_model_val_loss, 'co', label = 'Model_Dropout')
plt.xlabel('Epochs')
plt.ylabel('Validation Loss')
plt.legend()
plt.grid()
plt.show()

데이터 분석가가 되고 싶어 여러 데이터 분석 프로젝트, 공모전에 참가하였습니다.

대부분 공공 데이터를 사용한 공모전이었고 그래서 공공 데이터를 많이 사용해봤습니다.

공공에는 거의 모든 분야에 대한 데이터가 존재하였고, 선택한 주제에 맞게 찾아 쓰면 아주 쉽게 데이터 수집을 할 수 있었습니다.

하지만 공공에 공개되는 데이터이다보니 개인정보의 문제도 있고 대용량이다보니 관리가 어려운 면이 있어 품질이 좋지는 못하다고 느꼈습니다. 또한, 매년 지속적으로 갱신되는 데이터는 매년 기준이나 입력자가 다르면 입력 내용이 통일되지 못해 방대한 전처리 과정이 필요한 경우도 있었습니다.

이런 경우, 빠른 시간 내에 데이터를 분석하기 어렵고 그 결과 또한 좋지 못한 경우도 있었습니다.

그래서 저는 데이터 품질이 데이터 분석 이전에 해결되어야 할 문제임을 확실하게 느낄 수 있었습니다. 데이터 분석가가 된다면 이런 공공 데이터를 사용할 때 이 품질을 어떻게 향상시켜 사용할 수 있을지, 사내 데이터를 사용한다면 품질을 어떻게 관리하고 사용해야하는지 알아두어야함을 깨달았습니다.

그래서 데이터 품질을 공부하기 위해 선택한 책이 '데이터 품질의 비밀'입니다.이 책은 O'REILLY의 데이터 품질에 관한 첫번째 책이며 한빛미디어의 임프린트인 디코딩에서 새로 출간된 책으로 데이터 품질에 대한 최신의 내용을 만나볼 수 있습니다.

 

책의 목차를 참고하여 이 책에서 배울수 있는 것은 다음과 같습니다.

  • 데이터 품질이 중요한 이유
  • 데이터 품질을 고려한 데이터 시스템 구축
  • 데이터 수집·정제·변환·테스트 과정에서 데이터 품질 관리
  • 데이터 파이프라인 모니터링 및 이상 탐지를 통한 품질 관리
  • 데이터 신뢰성을 위한 아키텍처
  • 대규모 데이터의 품질 문제 해결
  • 엔드 투 엔트 데이터 계보 구축
  • 데이터 품질 민주화
  • 데이터 품질 관련 사례

 

 

책에는 이해를 돕기위한 실제 툴의 사진도 삽입되어 있었고, SQL문으로 작성한 쿼리를 통해 더 유용한 쿼릴르 작성하는 방법을 구체적으로 학습할 수 있었습니다.

 

또한, 데이터 품질 관리를 위해 팀이 함께 수행해야할 업무를 구체적으로 알려주며 회사에도 공유하면 좋을 내용들이었습니다.

 

데이터 품질이 정말 중요하다고 느껴 공부하기 위해 서평 이벤트에 참여하여 해당 책을 받기는 했지만 정말 많은 도움이 되었던 것 같습니다. 데이터 품질에 관한 O'REILLY의 첫번째 책이라는 점에서 책장에 하나 쯤 있어도 좋겠다는 소장 욕구가 생기기도 하고 두고두고 데이터 품질을 관리하기 위해 읽을 수 있는 책인 것 같습니다.

https://www.tensorflow.org/tutorials/keras/classification?hl=ko

1. modules import

import tensorflow as tf
from tensorflow.keras.datasets.fashion_mnist import load_data
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras import models
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import plot_model

from sklearn.model_selection import train_test_split

import numpy as np
import matplotlib.pyplot as plt

 

 

2. 데이터셋 로드

tf.random.set_seed(111)

(x_train_full, y_train_full), (x_test, y_test) = load_data()

x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full, test_size = 0.3, random_state = 111)

print("학습 데이터: {}\t레이블: {}".format(x_train_full.shape, y_train_full.shape))
print("학습 데이터: {}\t레이블: {}".format(x_train.shape, y_train.shape))
print("검증 데이터: {}\t레이블: {}".format(x_val.shape, y_val.shape))
print("테스트 데이터: {}\t레이블: {}".format(x_test.shape, y_test.shape))

# 출력 결과
학습 데이터: (60000, 28, 28)	레이블: (60000,)
학습 데이터: (42000, 28, 28)	레이블: (42000,)
검증 데이터: (18000, 28, 28)	레이블: (18000,)
테스트 데이터: (10000, 28, 28)	레이블: (10000,)

 

 

3. 데이터 확인

# 정답의 집합
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'bag', 'Ankle boot']

# 첫번째 데이터의 정답 확인
class_names[y_train[0]]

# 출력 결과
'Pullover'
plt.figure()
plt.imshow(x_train[0])
plt.colorbar()
plt.grid()
plt.show()

# 랜덤하게 4개의 데이터 추출하여 출력
num_sample = 4
random_idxs = np.random.randint(60000, size =num_sample)
plt.figure(figsize = (15, 10))
for i, idx in enumerate(random_idxs):
    image = x_train_full[idx, :]
    label = y_train_full[idx]

    plt.subplot(1, len(random_idxs), i+1)
    plt.imshow(image)
    plt.title("Index: {}, Label: {}".format(idx, class_names[label]))

 

 

4. 데이터 전처리

  • Normalization
  • flatten
  • oss = 'sparse_categorical_crossentropy
# Normalization
x_train = (x_train.reshape(-1, 28*28)) / 255.
x_val = (x_val.reshape(-1, 28*28)) / 255.
x_test = (x_test.reshape(-1, 28*28)) / 255.

 

 

5. 모델 구성(함수형 API)

input = Input(shape = (784, ), name = 'input')
hidden1 = Dense(256, activation = 'relu', name = 'hidden1')(input)
hidden2 = Dense(128, activation = 'relu', name = 'hidden2')(hidden1)
hidden3 = Dense(64, activation = 'relu', name = 'hidden3')(hidden2)
hidden4 = Dense(32, activation = 'relu', name = 'hidden4')(hidden3)
output = Dense(10, activation = 'softmax', name = 'output')(hidden4)
model = Model(inputs = [input], outputs = [output])
model.summary()

# 출력 결과
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (InputLayer)          [(None, 784)]             0         
                                                                 
 hidden1 (Dense)             (None, 256)               200960    
                                                                 
 hidden2 (Dense)             (None, 128)               32896     
                                                                 
 hidden3 (Dense)             (None, 64)                8256      
                                                                 
 hidden4 (Dense)             (None, 32)                2080      
                                                                 
 output (Dense)              (None, 10)                330       
                                                                 
=================================================================
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________
plot_model(model)

 

 

6. 모델 컴파일

model.compile(loss = 'sparse_categorical_crossentropy',
              optimizer = RMSprop(learning_rate = 0.01),
              metrics = ['acc'])

 

 

7. 모델 학습

  • 모델 시각화를 위해 history 변수에 학습과정 저장
history = model.fit(x_train, y_train,
                    epochs = 10,
                    batch_size = 128,
                    validation_data = (x_val, y_val))

# 출력 결과
Epoch 1/10
329/329 [==============================] - 16s 33ms/step - loss: 0.8969 - acc: 0.6897 - val_loss: 0.5580 - val_acc: 0.7997
Epoch 2/10
329/329 [==============================] - 6s 19ms/step - loss: 0.5179 - acc: 0.8132 - val_loss: 0.5554 - val_acc: 0.8124
Epoch 3/10
329/329 [==============================] - 5s 15ms/step - loss: 0.4643 - acc: 0.8321 - val_loss: 0.7202 - val_acc: 0.7992
Epoch 4/10
329/329 [==============================] - 5s 14ms/step - loss: 0.4484 - acc: 0.8414 - val_loss: 0.5157 - val_acc: 0.7810
Epoch 5/10
329/329 [==============================] - 5s 15ms/step - loss: 0.4242 - acc: 0.8497 - val_loss: 0.5527 - val_acc: 0.8212
Epoch 6/10
329/329 [==============================] - 5s 15ms/step - loss: 0.4175 - acc: 0.8523 - val_loss: 0.6034 - val_acc: 0.8197
Epoch 7/10
329/329 [==============================] - 5s 15ms/step - loss: 0.4107 - acc: 0.8566 - val_loss: 0.6612 - val_acc: 0.8046
Epoch 8/10
329/329 [==============================] - 5s 15ms/step - loss: 0.4029 - acc: 0.8594 - val_loss: 0.6940 - val_acc: 0.7671
Epoch 9/10
329/329 [==============================] - 5s 14ms/step - loss: 0.3955 - acc: 0.8603 - val_loss: 0.5032 - val_acc: 0.8444
Epoch 10/10
329/329 [==============================] - 5s 14ms/step - loss: 0.3969 - acc: 0.8653 - val_loss: 0.5266 - val_acc: 0.8257

 

 

8. 학습 결과 시각화

history_dict = history.history

loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(loss) + 1)
fig = plt.figure(figsize = (10, 5))

ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(epochs, loss, color = 'blue', label = 'train_loss')
ax1.plot(epochs, val_loss, color = 'red', label = 'val_loss')
ax1.set_title('Train and Validation Loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.grid()
ax1.legend()

acc = history_dict['acc']
val_acc = history_dict['val_acc']

ax2 = fig.add_subplot(1, 2, 2)
ax2.plot(epochs, acc, color = 'blue', label = 'train_acc')
ax2.plot(epochs, val_acc, color = 'red', label = 'val_acc')
ax2.set_title('Train and Validation Accuracy')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.grid()
ax2.legend()

  • 검증데이터(val_loss, val_acc)가 일정하지 않고 튀는 현상 발생
  • 다른 옵티마이저로 실행
    • 데이터셋 로드 - 데이터 전처리 - 모델 구성 다시 진행
from tensorflow.keras.optimizers import SGD

model.compile(loss = 'sparse_categorical_crossentropy',
              optimizer = SGD(learning_rate = 0.01),
              metrics = ['acc'])

history2 = model.fit(x_train, y_train,
                     epochs = 10,
                     batch_size = 128,
                     validation_data = (x_val, y_val))

# 출력 결과
Epoch 1/10
329/329 [==============================] - 13s 32ms/step - loss: 0.3495 - acc: 0.8706 - val_loss: 0.3795 - val_acc: 0.8644
Epoch 2/10
329/329 [==============================] - 9s 27ms/step - loss: 0.3172 - acc: 0.8811 - val_loss: 0.3691 - val_acc: 0.8689
Epoch 3/10
329/329 [==============================] - 6s 19ms/step - loss: 0.3072 - acc: 0.8848 - val_loss: 0.3621 - val_acc: 0.8713
Epoch 4/10
329/329 [==============================] - 8s 25ms/step - loss: 0.3017 - acc: 0.8864 - val_loss: 0.3590 - val_acc: 0.8728
Epoch 5/10
329/329 [==============================] - 7s 23ms/step - loss: 0.2977 - acc: 0.8880 - val_loss: 0.3572 - val_acc: 0.8728
Epoch 6/10
329/329 [==============================] - 7s 21ms/step - loss: 0.2950 - acc: 0.8888 - val_loss: 0.3548 - val_acc: 0.8733
Epoch 7/10
329/329 [==============================] - 4s 12ms/step - loss: 0.2925 - acc: 0.8896 - val_loss: 0.3542 - val_acc: 0.8756
Epoch 8/10
329/329 [==============================] - 3s 11ms/step - loss: 0.2903 - acc: 0.8904 - val_loss: 0.3526 - val_acc: 0.8756
Epoch 9/10
329/329 [==============================] - 3s 10ms/step - loss: 0.2887 - acc: 0.8911 - val_loss: 0.3520 - val_acc: 0.8757
Epoch 10/10
329/329 [==============================] - 3s 11ms/step - loss: 0.2870 - acc: 0.8915 - val_loss: 0.3526 - val_acc: 0.8756
# 다시 시각화
history_dict = history2.history

loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(loss) + 1)
fig = plt.figure(figsize = (10, 5))

ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(epochs, loss, color = 'blue', label = 'train_loss')
ax1.plot(epochs, val_loss, color = 'red', label = 'val_loss')
ax1.set_title('Train and Validation Loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.grid()
ax1.legend()

acc = history_dict['acc']
val_acc = history_dict['val_acc']

ax2 = fig.add_subplot(1, 2, 2)
ax2.plot(epochs, acc, color = 'blue', label = 'train_acc')
ax2.plot(epochs, val_acc, color = 'red', label = 'val_acc')
ax2.set_title('Train and Validation Accuracy')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.grid()
ax2.legend()

  • 학습 데이터의 loss값과 정확도가 검증 데이터의 loss값과 정확도와 차이가 있어보이지만 값으로 보면 큰 차이는 아님
  • loss값의 차이는 가장 큰 구간에서 0.06정도, 정확도는 가장 큰 구간이 0.025정

 

 

9. 모델 평가(1)

  • optimizer: SGD()로 학습한 모델
  • evaluate()
model.evaluate(x_test, y_test)

# 출력 결과
313/313 [==============================] - 2s 6ms/step - loss: 0.3862 - acc: 0.8661
[0.38618436455726624, 0.866100013256073]

 

 

10. 학습된 모델을 통해 값 예측

pred_ys = model.predict(x_test)

print(pred_ys.shape)
np.set_printoptions(precision = 7)
print(pred_ys[0])

# 출력 결과
# 정답 집합 10개 각각이 정답일 확률을 표시
(10000, 10)
[4.2854483e-21 1.0930411e-15 1.6151620e-17 3.9182383e-11 2.9266587e-15
 3.3629590e-03 4.9878759e-17 1.0700015e-03 2.2493745e-13 9.9556702e-01]
# 10개의 정답 집합 각각에 속할 확률 중 가장 높은 확률을 가진 값을 정답으로 채택하고 결과 확인
arg_pred_y = np.argmax(pred_ys, axis = 1)
plt.imshow(x_test[0].reshape(-1, 28))
plt.title('Predicted Class: {}'.format(class_names[arg_pred_y[0]]))
plt.show()

# 이미지 출력
def plot_image(i, pred_ys, y_test, img):
    pred_ys, y_test, img = pred_ys[i], y_test[i], img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    
    plt.imshow(img, cmap = plt.cm.binary)

    predicted_label = np.argmax(pred_ys)
    if predicted_label == y_test:
        color = 'blue'
    else:
        color = 'red'
    
    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                         100 * np.max(pred_ys),
                                         class_names[y_test]),
                                         color = color)

# 전체 정답 집합 중 해당 데이터를 정답으로 예측한 확률 표시
def plot_value_array(i, pred_ys, true_label):
    pred_ys, true_label = pred_ys[i], true_label[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    thisplot = plt.bar(range(10), pred_ys, color = '#777777')
    plt.ylim([0, 1])
    predicted_label = np.argmax(pred_ys)

    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')
# 첫번째 데이터 정답 확인
i = 0
plt.figure(figsize = (8, 4))
plt.subplot(1, 2, 1)
plot_image(i, pred_ys, y_test, x_test.reshape(-1, 28, 28))
plt.subplot(1, 2, 2)
plot_value_array(i, pred_ys, y_test)
plt.show()

# 랜덤으로 추출하여 정답 확인
num_rows = 5
num_cols = 3
num_images = num_rows * num_cols

random_num = np.random.randint(10000, size = num_images)
plt.figure(figsize = (2 * 2 * num_cols, 2 * num_rows))
for idx, num in enumerate(random_num):
    plt.subplot(num_rows, 2 * num_cols, 2 * idx + 1)
    plot_image(num, pred_ys, y_test, x_test.reshape(-1, 28, 28))
    plt.subplot(num_rows, 2 * num_cols, 2 * idx + 2)
    plot_value_array(num, pred_ys, y_test)

plt.show()

 

 

11. 모델 평가(2)

  • optimizer: SGD()로 학습한 모델
  • 혼동 행렬(Confusion Matrix)
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
from tensorflow.keras.utils import to_categorical

y_test_che = to_categorical(y_test)
plt.figure(figsize = (8, 8))
cm2 = confusion_matrix(np.argmax(y_test_che, axis = 1), np.argmax(pred_ys, axis = -1))
sns.heatmap(cm2, annot = True, fmt = 'd', cmap = 'Blues')
plt.xlabel("Predicted Label")
plt.ylabel("True Label")

 

 

12. 모델 평가(3)

  • optimizer: SGD()로 학습한 모델
  • 분류 보고서
print(classification_report(np.argmax(y_test_che, axis = -1), np.argmax(pred_ys, axis = -1)))

# 출력 결과
              precision    recall  f1-score   support

           0       0.78      0.85      0.81      1000
           1       0.99      0.96      0.98      1000
           2       0.75      0.81      0.78      1000
           3       0.86      0.88      0.87      1000
           4       0.77      0.75      0.76      1000
           5       0.97      0.95      0.96      1000
           6       0.68      0.57      0.62      1000
           7       0.93      0.96      0.94      1000
           8       0.96      0.97      0.96      1000
           9       0.96      0.95      0.96      1000

    accuracy                           0.87     10000
   macro avg       0.86      0.87      0.86     10000
weighted avg       0.86      0.87      0.86     10000

1. 분석 기간: 2022.11.10 ~ 2022.11.17

 

2. 분석개요

  - 전국 산부인과의 개수, 전체 의사 중 산부인과 의사 비율은 감소 추세

 

  - 산부인과 요양급여비용 심사실적, 모성사망비는 증가한 것으로 보아 산부인과의 필요성은 늘고 있음

 

  - 부산 지역을 기준으로 산부인과 127곳 중 분만실을 갖춘 곳은 32곳

  - 가임기 여성 거주지~분만실까지 최대 거리는 약 27km가 넘는 거리

  - 또한, 가임기 여성이 밀집 되어 있는 특정 아파트 단지로 부터 떨어진 분만실까지 거리는 약 23km인 곳도 존재

 

  - 이외에도 전국적으로 보건복지부에서는 33곳의 분만취약지를 선정해둠

 

  - 응급실의 경우, 분만실보다 개수는 많아 거리의 문제는 크지 않지만 다른 문제가 존재

  - 종합병원 응급실에 산부인과 의사가 상주하지 않음

  - 응급실에 찾아가도 산부인과 전문의가 응급실에 도착할 때까지 기다려야 함

  - 이에 대한 해결책: 산모 응급상황 발생 → 산부인과 전문의와 실시간 연결해주는 모바일 플랫폼을 통해 의사 호출
                                            → 스마트 이동형 산부인과 이용 또는 응급진료가 가능한 병원 이송

 

3. 플랫폼 소개

  - 실시간으로 환자와 의사를 연결해주는 모바일 플랫폼의 목업 제작

  - 산부인과 의사는 자신을 앱에 등록, 환자는 원하는 의사를 찾아 매칭

  - 긴급 상황시 매칭해주는 긴급 시스템도 탑재

 

4. 이동형 산부인과 대기 최적입지 선정 분석에 활용한 데이터

  - 가임기 여성(20대, 30대, 40대 여성) 인구수

  - 부산 지역을 기준으로 분석하여, 부산 지역 산부인과 목록 사용

  - 산부인과 목록의 병원 중 응급실, 분만실의 여부는 중앙응급의료센터 종합상황판에서 확인

 

5. 데이터 전처리

  - 분만실, 응급실, 산부인과 위치

  - 가임기 여성이 사는 거주지로부터 분만실, 응급실, 산부인과까지 최단거리

 

6. 데이터 분석

  - 산부인과 취약지수 계산(가임인구 수 + 분만실 또는 응급실까지의 최소 거리)

  - 이후 내림차순으로 정렬, 순위 도출

 

7. 분석 결론

  - 산부인과 취약지수가 높을수록 스마트 이동형 분만실이 대기하기에 적합

  - 취약지수가 높은 격자 내 또는 격자 주변에서 응급실과 분만실이 없는 산부인과를 스마트 이동형 산부인과 대기장소로 선정

  - 산부인과를 스마트 이동형 산부인과의 대기장소로 쓰는 이유는 산부인과에 상주해야 바로 산부인과 전문의와 출동할 수 있기 때문, 그렇지 않으면 산부인과나 병원에 들러 산부인과 전문의를 픽업한 뒤 거주지까지 이동해야 하므로 효율 감소

1. modules import

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.models import Model
from tensorflow.keras.utils import get_file, plot_model

 

 

2. 데이터 로드

# 해당 주소의 데이터를 다운로드
dataset_path = get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")

# 열 이름 지정
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight','Acceleration','Model Year', 'Origin']

# 지정된 열이름을 사용하여 데이터를 판다스 데이터프레임 형식으로 로드
raw_dataset = pd.read_csv(dataset_path, names = column_names,
                          na_values = '?', comment = '\t',
                          sep = ' ', skipinitialspace = True)

 

 

3. 데이터 확인

# raw data바로 사용하지 않고 copy()하여 사용
dataset = raw_dataset.copy()
dataset

 

 

4. 데이터 전처리

  • 해당 데이터는 일부 데이터가 누락되어 있음
dataset.isna().sum()

# 출력 결과
MPG             0
Cylinders       0
Displacement    0
Horsepower      6
Weight          0
Acceleration    0
Model Year      0
Origin          0
dtype: int64
  • 누락된 행 삭제
# Horsepower에 6개의 결측값이 있으므로 결측값 제거
dataset = dataset.dropna()
  • 'Origin' 범주형 데이터
    • 원-핫 인코딩 진행
origin = dataset.pop('Origin')
dataset['USA'] = (origin == 1) * 1.0
dataset['Europe'] = (origin == 2) * 1.0
dataset['Japan'] = (origin == 3) * 1.0
dataset

 

 

4-1. 검증 데이터셋 생성

# train 데이터로 전체 데이터의 0.8을 추출
# 전체 데이터에서 train 데이터를 drop시킨 나머지를 test 데이터로 지정
train_dataset = dataset.sample(frac = 0.8, random_state = 0)
test_dataset = dataset.drop(train_dataset.index)

 

 

4-2. 데이터 조사

sns.pairplot(train_dataset[['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight']], diag_kind = 'kde')

# 데이터의 통계정보
train_stats = train_dataset.describe()
# MPG는 정답이기 때문에 통계 정보에서 제외
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats

 

 

4-3. 데이터의 특성과 레이블 분리

train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

 

 

4-4. 데이터 정규화

# 통계정보에서 평균과 표준편차를 가져와서 각 데이터 값에서 평균을 빼고 표준편차로 나눠 정규화
def normalization(x):
    return (x - train_stats['mean']) / train_stats['std']

normed_train_data = normalization(train_dataset)
normed_test_data = normalization(test_dataset)

 

 

5. 모델 구성

def build_model():
    input = Input(shape = len(train_dataset.keys()), name = 'input')
    hidden1 = Dense(64,activation = 'relu', name = 'dense1')(input)
    hidden2 = Dense(64, activation = 'relu', name = 'dense2')(hidden1)
    output = Dense(1, name = 'output')(hidden2)

    model = Model(inputs = [input], outputs = [output])

    model.compile(loss = 'mse',
                  optimizer = RMSprop(0.001),
                  metrics = ['mae', 'mse'])
    
    return model
model = build_model()
model.summary()

# 출력 결과
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (InputLayer)          [(None, 9)]               0         
                                                                 
 dense1 (Dense)              (None, 64)                640       
                                                                 
 dense2 (Dense)              (None, 64)                4160      
                                                                 
 output (Dense)              (None, 1)                 65        
                                                                 
=================================================================
Total params: 4,865
Trainable params: 4,865
Non-trainable params: 0
_________________________________________________________________
plot_model(model)

 

 

6. 샘플 데이터 확인

sample_batch = normed_train_data[:10]
sample_result = model.predict(sample_batch)
sample_batch

  • 정규화가 잘 된 데이터 값을 확인할 수 있음

 

 

7. 모델 학습

epochs = 1000
history = model.fit(normed_train_data, train_labels,
                    epochs = epochs, validation_split = 0.2)

 

 

8. 모델 학습 시각화

# 학습 과정에서 생겼던 각 반복의 loss값과 mae, mse 값을 모두 데이터프레임 형태로 저장
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist

# 위의 데이터프레임을 사용하여 하나의 함수로 시각화 생성
def plot_history(history):
    hist = pd.DataFrame(history.history)
    hist['epoch'] = history.epoch

    plt.figure(figsize = (12, 6))
    
    plt.subplot(1, 2, 1)
    plt.xlabel('Epochs')
    plt.ylabel('MPG Mean Absolute Error')
    plt.plot(hist['epoch'], hist['mae'], label = 'Train Error')
    plt.plot(hist['epoch'], hist['val_mae'], label = 'Val Error')
    plt.ylim([0, 5])
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.xlabel('Epochs')
    plt.ylabel('MPG Mean Squared Error')
    plt.plot(hist['epoch'], hist['mse'], label = 'Train Error')
    plt.plot(hist['epoch'], hist['val_mse'], label = 'Val Error')
    plt.ylim([0, 20])
    plt.legend()

    plt.show()

plot_history(history)

  • 검증데이터의 오차(Val Error)값이 mae, mse 값 모두 일정 값 이하로 더이상 떨어지지 않음
  • 학습을 더 진행해봤자 검증데이터의 오차가 줄어들지 않으면 의미가 없고
    train 데이터의 오차만 줄어들어 둘 사이 간격이 벌어지면 오히려 모델이 train 데이터에 과대적합될 수 있음

 

 

9. EarlyStopping을 이용한 규제화

from tensorflow.keras.callbacks import EarlyStopping

model = build_model()

# 10번의 성능 향상을 보고 그 동안 성능 향상이 이뤄지지 않으면 stop
early_stop = EarlyStopping(monitor = 'val_loss', patience = 10)

history = model.fit(normed_train_data, train_labels, epochs = epochs,
                    validation_split = 0.2, callbacks = [early_stop])

  • 1000번 다 반복되지 않고 91번째에서 성능 향상이 없다고 판단되어 학습 중지
plot_history(history)

 

 

10. 모델 평가

# test 데이터를 모델에 넣어 나온 loss와 mae, mse 값 저장
loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose = 2)
print(mae)

# 출력 결과
# 1.88정도의 mpg 오차내에서 예측
3/3 - 0s - loss: 5.7125 - mae: 1.8831 - mse: 5.7125 - 61ms/epoch - 20ms/step
1.8831140995025635

 

 

11. 학습된 모델을 통한 예측

# 예측
test_pred = model.predict(normed_test_data).flatten()

# 예측된 값과 실제 값의 산점도를 그려 선형성을 만족하는지 확인
plt.scatter(test_labels, test_pred)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.axis('equal')
plt.axis('square')
plt.grid()
plt.xlim([0, plt.xlim()[1]])
plt.ylim([0, plt.ylim()[1]])
plt.plot([-50, 50], [-50, 50])
plt.show()

# 잘못 예측한 값은 어느정도 되는지 시각화
error = test_pred - test_labels
plt.hist(error, bins = 30)
plt.xlabel('Prediction Error')
plt.grid()
plt.ylabel('Count')
plt.show()

1. modules import

import tensorflow as tf
from tensorflow.keras.datasets.boston_housing import load_data
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import plot_model

from sklearn.model_selection import train_test_split

import numpy as np
import matplotlib.pyplot as plt

 

 

2. 데이터 로드

  • 데이터의 수가 적기 때문에 테스트 데이터의 비율을 20%로 지정
  • 13개의 특성을 가짐
  • 각각의 특성이 모두 다른 스케일, 즉 단위가 모두 다름
    • 범죄율: 0~1 사이의 값
    • 방의 개수: 3~9 사이의 값
  • 정답 레이블은 주택 가격의 중간 가격($1000 단위)
tf.random.set_seed(111)
(x_train_full, y_train_full), (x_test, y_test) = load_data(path = 'boston_housing.npz',
                                                           test_split = 0.2,
                                                           seed = 111)
# 가장 첫번째 train 데이터의 독립변수들
print(x_train_full[0])

# 출력 결과
[2.8750e-02 2.8000e+01 1.5040e+01 0.0000e+00 4.6400e-01 6.2110e+00
 2.8900e+01 3.6659e+00 4.0000e+00 2.7000e+02 1.8200e+01 3.9633e+02
 6.2100e+00]
# 가장 첫번째 train 데이터의 종속변수
print(y_train_full[0])

# 출력 결과
25.0

 

 

3. 데이터 확인

print('학습 데이터: {}\t레이블: {}'.format(x_train_full.shape, y_train_full.shape))
print('테스트 데이터: {}\t레이블: {}'.format(x_test.shape, y_test.shape))

# 출력 결과
학습 데이터: (404, 13)	레이블: (404,)
테스트 데이터: (102, 13)	레이블: (102,)

 

 

4. 데이터 전처리

  • standardization
  • 특성의 단위가 모두 다르기 때문에 동일한 범위로 조정
# 정규화하기 위해 평균과 표준편차를 구한 후, 각 값에서 평균을 빼고 표준편차로 나눠줌
mean = np.mean(x_train_full, axis = 0)
std = np.std(x_train_full, axis = 0)

x_train_preprocessed = (x_train_full - mean) / std
x_test = (x_test - mean) / std

x_train, x_val, y_train, y_val = train_test_split(x_train_preprocessed, y_train_full, test_size = 0.3, random_state = 111)
print("학습 데이터: {}\t레이블: {}".format(x_train_full.shape, y_train_full.shape))
print("학습 데이터: {}\t레이블: {}".format(x_train.shape, y_train.shape))
print("검증 데이터: {}\t레이블: {}".format(x_val.shape, y_val.shape))
print("테스트 데이터: {}\t레이블: {}".format(x_test.shape, y_test.shape))

# 출력 결과
학습 데이터: (404, 13)	레이블: (404,)
학습 데이터: (282, 13)	레이블: (282,)
검증 데이터: (122, 13)	레이블: (122,)
테스트 데이터: (102, 13)	레이블: (102,)

 

 

5. 모델 구성

  • 학습 데이터가 매우 적은 경우에 모델의 깊이를 깊게 할수록 과대적합이 일어날 확률이 높음
model = Sequential([Dense(100, activation = 'relu', input_shape = (13, ), name = 'dense1'),
                    Dense(64, activation = 'relu', name = 'dense2'),
                    Dense(32, activation = 'relu', name = 'dense3'),
                    Dense(1, name = 'output')])

model.summary()

# 출력 결과
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense1 (Dense)              (None, 100)               1400      
                                                                 
 dense2 (Dense)              (None, 64)                6464      
                                                                 
 dense3 (Dense)              (None, 32)                2080      
                                                                 
 output (Dense)              (None, 1)                 33        
                                                                 
=================================================================
Total params: 9,977
Trainable params: 9,977
Non-trainable params: 0
_________________________________________________________________
plot_model(model)

 

 

6. 모델 컴파일

  • 회귀 문제에서는 주로 평균제곱오차(MSE)를 손실함수로,
    평균절대오차(MAE)를 평가지표로 많이 사용
model.compile(loss = 'mse',
              optimizer = Adam(learning_rate = 1e-2),
              metrics = ['mae'])

 

 

7. 모델 학습

history = model.fit(x_train, y_train, epochs = 300,
                    validation_data = (x_val, y_val))

 

 

8. 모델 평가

  • evaluate()
model.evaluate(x_test, y_test)

# 출력 결과
4/4 [==============================] - 0s 5ms/step - loss: 14.7238 - mae: 2.6094
[14.723812103271484, 2.609373092651367]
history_dict = history.history

loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(loss) + 1)
fig = plt.figure(figsize= (12, 6))

ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(epochs, loss, color = 'blue', label = 'train_loss')
ax1.plot(epochs, val_loss, color = 'red', label = 'val_loss')
ax1.set_title('Train and Validation Loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.grid()
ax1.legend()

mae = history_dict['mae']
val_mae = history_dict['val_mae']

ax2 = fig.add_subplot(1, 2, 2)
ax2.plot(epochs, mae, color = 'blue', label = 'train_mae')
ax2.plot(epochs, val_mae, color = 'red', label = 'val_mae')
ax2.set_title('Train and Validation MAE')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('MAE')
ax2.grid()
ax2.legend()

 

 

9. K-Fold 교차 검증

  • 데이터셋의 크기가 매우 작은 경우에 [훈련, 검증, 테스트] 데이터로 나누게 되면 과소적합이 일어날 확률이 높음
  • 이를 해결하기 위해 K-Fold 교차 검증 실행

https://scikit-learn.org/stable/modules/cross_validation.html

 

 

10. 모델 재구성

  • K-Fold 교차검증을 위한 재구성
  • train 데이터를 5개로 나눔
from sklearn.model_selection import KFold
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model

tf.random.set_seed(111)
(x_train_full, y_train_full), (x_test, y_test) = load_data(path = 'boston_housing.npz',
                                                           test_split = 0.2,
                                                           seed = 111)

mean = np.mean(x_train_full, axis = 0)
std = np.std(x_train_full, axis = 0)

x_train_preprocessed = (x_train_full - mean) / std
x_test = (x_test - mean) / std

# 3개로 나누는 KFold 모델 생성
k = 3
kfold = KFold(n_splits = k, random_state = 111, shuffle = True)

# 모델 생성
def build_model():
    input = Input(shape = (13, ), name = 'input')
    hidden1 = Dense(100, activation = 'relu', input_shape = (13, ), name = 'dense1')(input)
    hidden2 = Dense(64, activation = 'relu', name = 'dense2')(hidden1)
    hidden3 = Dense(32, activation = 'relu', name = 'dense3')(hidden2)
    output = Dense(1, name = 'output')(hidden3)

    model = Model(inputs = [input], outputs = [output])

    model.compile(loss = 'mse',
                  optimizer = 'adam',
                  metrics = ['mae'])
    return model

# mae값을 저장할 리스트
mae_list = []

# 각 fold마다 학습 진행
for train_idx, val_idx in kfold.split(x_train):
    x_train_fold, x_val_fold = x_train[train_idx], x_train[val_idx]
    y_train_fold, y_val_fold = y_train_full[train_idx], y_train_full[val_idx]

    model = build_model()
    model.fit(x_train_fold, y_train_fold, epochs = 300,
              validation_data = (x_val_fold, y_val_fold))
    
    _, test_mae = model.evaluate(x_test, y_test)
    mae_list.append(test_mae)

print(mae_list)
print(np.mean(mae_list))

# 출력 결과
# 기준이 $1000이므로 $8000정도의 오차범위가 존재한다는 의미
[9.665495872497559, 8.393745422363281, 8.736763954162598]
8.932001749674479

6. MNIST 예제를 통해 모델 구성하기

  • keras.datasets에 포함되어 있는 데이터셋

https://www.tensorflow.org/datasets/catalog/mnist?hl=ko

  - modules import

import tensorflow as tf
from tensorflow.keras.datasets.mnist import load_data
from tensorflow.keras.models import Sequential
from tensorflow.keras import models
from tensorflow.keras.layers import Dense, Input, Flatten
from tensorflow.keras.utils import to_categorical, plot_model

from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

 

  - 데이터셋 로드

  • MNIST 데이터셋을 로드
  • Train Data 중 30%를 검증 데이터(validation data)로 사용
tf.random.set_seed(111)
(x_train_full, y_train_full), (x_test, y_test) = load_data(path = 'mnist.npz')
x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full, test_size = 0.3, random_state = 111)

 

  - 데이터 확인

# 데이터 형태 확인
num_x_train = (x_train.shape[0])
num_x_val = (x_val.shape[0])
num_x_test = (x_test.shape[0])

print("학습 데이터: {}\t레이블: {}".format(x_train_full.shape, y_train_full.shape))
# train과 val 데이터로 split 한 뒤의 데이터
print("학습 데이터: {}\t레이블: {}".format(x_train.shape, y_train.shape))
print("검증데이터: {}\t레이블: {}".format(x_val.shape, y_val.shape))
print("테스트 데이터: {}\t레이블: {}".format(x_test.shape, y_test.shape))

# 출력 결과
학습 데이터: (60000, 28, 28)	레이블: (60000,)
학습 데이터: (42000, 28, 28)	레이블: (42000,)
검증데이터: (18000, 28, 28)	레이블: (18000,)
테스트 데이터: (10000, 28, 28)	레이블: (10000,
# 랜덤으로 5개의 데이터 추출하여 데이터 확인
num_sample = 5
random_idxs = np.random.randint(60000, size = num_sample)

plt.figure(figsize = (14, 8))
for i, idx in enumerate(random_idxs):
    img = x_train_full[idx, :]
    label = y_train_full[idx]

    plt.subplot(1, len(random_idxs), i+1)
    plt.imshow(img)
    plt.title("Indes: {}, Label: {}".format(idx, label))

 

  - 데이터 전처리

  • Normalization
# 최대 255의 값으로 이루어진 x데이터를 255로 나누어 0과 1사이로 정규화
x_train = x_train / 255.
x_val = x_val / 255.
x_test = x_test / 255.

# 정수형인 y 레이블을 원-핫인코딩
y_train = to_categorical(y_train)
y_val = to_categorical(y_val)
y_test = to_categorical(y_test)

 

  - 모델 구성(Sequential)

model = Sequential([Input(shape = (28, 28), name = 'input'),
                    Flatten(input_shape = [28, 28], name= 'flatten'),
                    Dense(100, activation = 'relu', name = 'dense1'),
                    Dense(64, activation = 'relu', name = 'dense2'),
                    Dense(32, activation = 'relu', name = 'dense3'),
                    Dense(10, activation = 'softmax', name = 'output')])

model.summary()

# 출력 결과
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense1 (Dense)              (None, 100)               78500     
                                                                 
 dense2 (Dense)              (None, 64)                6464      
                                                                 
 dense3 (Dense)              (None, 32)                2080      
                                                                 
 output (Dense)              (None, 10)                330       
                                                                 
=================================================================
Total params: 87,374
Trainable params: 87,374
Non-trainable params: 0
_________________________________________________________________
# 각 레이어의 형태까지 출력
plot_model(model, show_shapes = True)

 

  - 모델 컴파일

model.compile(loss = 'categorical_crossentropy',
              optimizer = 'sgd',
              metrics = ['accuracy'])

 

  - 모델 학습

  • 모델 시각화를 위해 history라는 변수에 학습 과정을 담음
history = model.fit(x_train, y_train,
                    epochs = 50,
                    batch_size = 128,
                    validation_data = (x_val, y_val))

 

 

  - 학습 결과 시각화

# history에 학습 과정이 저장되어 있는지 확인
history.history.keys()

# 출력 결과
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
history_dict = history.history

loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(loss) + 1)
fig = plt.figure(figsize = (12, 6))

ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(epochs, loss, color = 'blue', label = 'train_loss')
ax1.plot(epochs, val_loss, color = 'red', label = 'val_loss')
ax1.set_title('Train and Validation Loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.grid()
ax1.legend()

accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']

ax2 = fig.add_subplot(1, 2, 2)
ax2.plot(epochs, accuracy, color = 'blue', label = 'train_accuracy')
ax2.plot(epochs, val_accuracy, color = 'red', label = 'val_accuracy')
ax2.set_title('Train and Validation Accuracy')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accauracy')
ax2.grid()
ax2.legend()

plt.show()

 

  - 모델 평가(1)

  • evaluate()
model.evaluate(x_test, y_test)

# 출력 결과
313/313 [==============================] - 2s 6ms/step - loss: 0.1235 - accuracy: 0.9609
[0.12354850769042969, 0.9609000086784363]

 

  - 학습된 모델을 통해 값 예측

pred_ys = model.predict(x_test)
print(pred_ys.shape)

np.set_printoptions(precision = 7)
# 가장 첫번째(인덱스가 0인)데이터가 0~9까지 총 10개의 정답 각각에 속할 확률 출력
print(pred_ys[0])

# 출력 결과
# 7일 확률이 0.99로 가장 높음
313/313 [==============================] - 3s 9ms/step
(10000, 10)
[3.5711932e-06 3.6218420e-08 4.6535680e-04 5.5227923e-04 1.9860077e-07
 2.2765586e-07 2.0107932e-12 9.9897194e-01 1.8151616e-06 4.6298537e-06]
arg_pred_y = np.argmax(pred_ys, axis = 1)

plt.imshow(x_test[0])
plt.title("predicted label: {}".format(arg_pred_y[0]))
plt.show()

 

  - 모델 평가(2)

  • 혼돈행렬(Confusion Matrix)
from sklearn.metrics import confusion_matrix

plt.figure(figsize = (8, 8))
cm = confusion_matrix(np.argmax(y_test, axis = -1), np.argmax(pred_ys, axis = -1))
sns.heatmap(cm, annot = True, fmt = 'd', cmap = 'Blues')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

 

  - 모델 평가(3)

  • 분류 보고서
from sklearn.metrics import classification_report

print(classification_report(np.argmax(y_test, axis = -1), np.argmax(pred_ys, axis = -1)))

 

 

7. 모델 저장과 복원

  • save()
  • load_model()
  • (주의)
    Sequential API, 함수형 API에서는 모델의 저장 및 로드가 가능하지만 서브클래싱 방식으로는 할 수 없음
  • 서브클래싱 방식
    아래의 두 가지를 통해 모델의 파아미터만 저장 및 로드
save_weights()
load_weights()
  • JSON 형식
    • model.to_json() (저장)
    • tf.keras.models.model_from_json(file_path) (복원)
  • YAML로 직렬화
    • model.to_yaml() (저장)
    • tf.keras.models.model_from_yaml(file_path) (복원)
# 모델 저장
model.save('mnist_model.h5')
# 모델 복원
loaded_model = models.load_model('mnist_model.h5')
loaded_model.summary()

# 출력 결과
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense1 (Dense)              (None, 100)               78500     
                                                                 
 dense2 (Dense)              (None, 64)                6464      
                                                                 
 dense3 (Dense)              (None, 32)                2080      
                                                                 
 output (Dense)              (None, 10)                330       
                                                                 
=================================================================
Total params: 87,374
Trainable params: 87,374
Non-trainable params: 0
_________________________________________________________________

 

 

8. 콜백

  • fit() 함수의 callbacks 매개변수를 사용하여 케라스가 훈련의 시작이나 끝에 호출할 객체 리스트를 지정할 수 있음
  • 여러 개 사용 하능
  • ModelCheckpoint
    • tf.keras.callbacks.ModelCheckpoint
    • 정기적으로 모델의 체크포인트를 저장하고, 문제가 발생할 때 복구하는데 사용
  • EarlyStopping
    • tf.keras.callbacks.EarlyStopping
    • 검증 성능이 한동안 개선되지 않을 경우 학습을 중단할 때 사용
  • LearningRateSchedular
    • tf.keras.callbacks.LearningRateScheduler
    • 최적화를 하는 동안 학습률(learning_rate)를 동적으로 변경할 때 사용
  • TensorBoard
    • tf.keras.callbacks.TensorBoard
    • 모델의 경과를 모니터링할 때 사용
# 위의 MNIST 데이터와 모델 그대로 사용(함수로 묶어서 사용)
(x_train_full, y_train_full), (x_test, y_test) = load_data(path = 'mnist.npz')
x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full, test_size = 0.3, random_state = 111)

print("학습 데이터: {}\t레이블: {}".format(x_train_full.shape, y_train_full.shape))
print("학습 데이터: {}\t레이블: {}".format(x_train.shape, y_train.shape))
print("검증데이터: {}\t레이블: {}".format(x_val.shape, y_val.shape))
print("테스트 데이터: {}\t레이블: {}".format(x_test.shape, y_test.shape))

x_train = x_train / 255.
x_val = x_val / 255.
x_test = x_test / 255.

y_train = to_categorical(y_train)
y_val = to_categorical(y_val)
y_test = to_categorical(y_test)

def build_model():
    model = Sequential([Input(shape = (28, 28), name = 'input'),
                        Flatten(input_shape = [28, 28], name= 'flatten'),
                        Dense(100, activation = 'relu', name = 'dense1'),
                        Dense(64, activation = 'relu', name = 'dense2'),
                        Dense(32, activation = 'relu', name = 'dense3'),
                        Dense(10, activation = 'softmax', name = 'output')])
    
    model.compile(loss = 'categorical_crossentropy',
                  optimizer = 'sgd',
                  metrics = ['accuracy'])

    return model

model = build_model()
model.summary()

# 출력 결과
학습 데이터: (60000, 28, 28)	레이블: (60000,)
학습 데이터: (42000, 28, 28)	레이블: (42000,)
검증데이터: (18000, 28, 28)	레이블: (18000,)
테스트 데이터: (10000, 28, 28)	레이블: (10000,)
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense1 (Dense)              (None, 100)               78500     
                                                                 
 dense2 (Dense)              (None, 64)                6464      
                                                                 
 dense3 (Dense)              (None, 32)                2080      
                                                                 
 output (Dense)              (None, 10)                330       
                                                                 
=================================================================
Total params: 87,374
Trainable params: 87,374
Non-trainable params: 0
_________________________________________________________________
# 콜백 라이브러리 import
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, LearningRateScheduler, TensorBoard

 

  - ModelCheckpoint

check_point_cb = ModelCheckpoint('keras_mnist_model.h5')
history = model.fit(x_train, y_train, epochs = 10, callbacks = [check_point_cb])

# 출력 결과
Epoch 1/10
1313/1313 [==============================] - 8s 5ms/step - loss: 0.8994 - accuracy: 0.7532
Epoch 2/10
1313/1313 [==============================] - 6s 5ms/step - loss: 0.3294 - accuracy: 0.9059
Epoch 3/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.2614 - accuracy: 0.9247
Epoch 4/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.2208 - accuracy: 0.9360
Epoch 5/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.1919 - accuracy: 0.9437
Epoch 6/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.1696 - accuracy: 0.9507
Epoch 7/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.1511 - accuracy: 0.9565
Epoch 8/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.1356 - accuracy: 0.9611
Epoch 9/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.1229 - accuracy: 0.9646
Epoch 10/10
1313/1313 [==============================] - 4s 3ms/step - loss: 0.1122 - accuracy: 0.9674
loaded_model = load_model('keras_mnist_model.h5')
loaded_model.summary()

# 출력 결과
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense1 (Dense)              (None, 100)               78500     
                                                                 
 dense2 (Dense)              (None, 64)                6464      
                                                                 
 dense3 (Dense)              (None, 32)                2080      
                                                                 
 output (Dense)              (None, 10)                330       
                                                                 
=================================================================
Total params: 87,374
Trainable params: 87,374
Non-trainable params: 0
_________________________________________________________________
  • 최상의 모델만을 저장
    • save_best_only = True
model = build_model()

cp = ModelCheckpoint('keras_best_model.h5', save_best_only = True)

history = model.fit(x_train, y_train, epochs = 10,
                    validation_data = (x_val, y_val), callbacks = [cp])

# 출력 결과
Epoch 1/10
1313/1313 [==============================] - 12s 8ms/step - loss: 0.8529 - accuracy: 0.7606 - val_loss: 0.3651 - val_accuracy: 0.8924
Epoch 2/10
1313/1313 [==============================] - 8s 6ms/step - loss: 0.3209 - accuracy: 0.9075 - val_loss: 0.2884 - val_accuracy: 0.9141
Epoch 3/10
1313/1313 [==============================] - 11s 8ms/step - loss: 0.2551 - accuracy: 0.9254 - val_loss: 0.2353 - val_accuracy: 0.9296
Epoch 4/10
1313/1313 [==============================] - 13s 10ms/step - loss: 0.2147 - accuracy: 0.9371 - val_loss: 0.2123 - val_accuracy: 0.9366
Epoch 5/10
1313/1313 [==============================] - 10s 8ms/step - loss: 0.1869 - accuracy: 0.9451 - val_loss: 0.1972 - val_accuracy: 0.9410
Epoch 6/10
1313/1313 [==============================] - 12s 9ms/step - loss: 0.1661 - accuracy: 0.9513 - val_loss: 0.1818 - val_accuracy: 0.9450
Epoch 7/10
1313/1313 [==============================] - 10s 8ms/step - loss: 0.1491 - accuracy: 0.9566 - val_loss: 0.1700 - val_accuracy: 0.9488
Epoch 8/10
1313/1313 [==============================] - 5s 4ms/step - loss: 0.1350 - accuracy: 0.9608 - val_loss: 0.1476 - val_accuracy: 0.9552
Epoch 9/10
1313/1313 [==============================] - 5s 4ms/step - loss: 0.1229 - accuracy: 0.9637 - val_loss: 0.1414 - val_accuracy: 0.9572
Epoch 10/10
1313/1313 [==============================] - 6s 4ms/step - loss: 0.1128 - accuracy: 0.9663 - val_loss: 0.1337 - val_accuracy: 0.9595

 

  - EarlyStopping

  • 일정 에포크(patience)동안 검증 세트에 대한 점수가 오르지 않으면 학습을 멈춤
  • 모델이 향상되지 않으면 학습이 자동으로 중지되므로, epochs 숫자를 크게 해도 무방
  • 학습이 끝난 후의 최상의 가중치를 복원하기 때문에 모델을 따로 복원할 필요없음
model = build_model()

cp = ModelCheckpoint('keras_best_model2.h5', save_best_only = True)
early_stopping_cb = EarlyStopping(patience = 3, monitor = 'val_loss',
                                  restore_best_weights = True)

# 50번의 반복을 하며 가장 성능 좋은 모델을 저장함
# validation loss값을 보며 더 이상 돌지 않아도 된다고 판단하면 학습 중단
history = model.fit(x_train, y_train, epochs = 50,
                    validation_data = (x_val, y_val), callbacks = [cp, early_stopping_cb])

  • 30번만에 학습을 중단하고 최적의 값으로 판

 

  - LearningRateSchedular

def scheduler(epoch, learning_rate):
    if epoch < 10:
        return learning_rate
    else:
        return learning_rate * tf.math.exp(-0.1)

# learning rate 설정 전
model = build_model()
round(model.optimizer.lr.numpy(), 5)

# 출력 결과
0.01


# learning rate 설정 후
lr_scheduler_cb = LearningRateScheduler(scheduler)

history = model.fit(x_train, y_train, epochs = 15,
                    callbacks = [lr_scheduler_cb], verbose = 0)
round(model.optimizer.lr.numpy(), 5)

# 출력 결과
0.00607

 

  - Tensorboard

  • 텐서보드를 이용하여 학습과정 모니터링
  • 텐서보드를 사용하기 위해 logs 폴더를 만들고, 학습이 진행되는 동안 로그 파일을 생성
TensorBoard(log_dir = '.logs', histogram_freq = 0, write_graph = True, write_images = True)

# 출력 결과
<keras.callbacks.TensorBoard at 0x1c034d086a0>
log_dir = '.logs'

tensor_board_cb = [TensorBoard(log_dir = log_dir, histogram_freq = 1, write_graph = True, write_images = True)]

model = build_model()
model.fit(x_train, y_train, batch_size = 32, validation_data = (x_val, y_val),
          epochs = 30, callbacks = [tensor_board_cb])

# 출력 결과

 

  • load하는데 시간 소요
    • load가 안된다면 port 번호를 바꿔서 실행
    • ex) %tensorboard --logdir {log_dir} port 8000
%load_ext tensorboard
%tensorboard --logdir {log_dir}

  • 웹 환경에서 다양한 지표를 그래프로 확인 가능

+ Recent posts