● 군집화

  • 대표적인 비지도학습 알고리즘
  • 레이블(정답)이 없는 데이터를 그룹화하는 알고리즘

https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html#sphx-glr-auto-examples-cluster-plot-cluster-comparison-py

 

  • 필요 라이브러리
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cluster
from sklearn import mixture
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
  • 예시 데이터 생성
# 그래프 그리는 함수 작성
def plot_data(datasets, position, title):
    X, y = datasets
    plt.subplot(position)
    plt.title(title)
    plt.scatter(X[:, 0], X[:, 1])
    
# 랜덤한 예시 데이터 생성을 위한 파라미터
np.random.seed(0)
n_samples = 1500
random_state = 0
noise = 0.05

# 여러 구조의 예시 데이터 작성
circles = datasets.make_circles(n_samples = n_samples, factor = 0.5, noise = noise, random_state = random_state)
moons = datasets.make_moons(n_samples = n_samples, noise = noise, random_state = random_state)
blobs = datasets.make_blobs(n_samples = n_samples, random_state = random_state)
no_structures = np.random.rand(n_samples, 2), None

# 그래프 그리는 함수로 예시 데이터 시각화
plt.figure(figsize = (12, 12))
plot_data(circles, 221, 'Circles')
plot_data(moons, 222, 'Moons')
plot_data(blobs, 223, 'Blobs')
plot_data(no_structures, 224, 'No structures')

 

  • 군집화를 위해 학습, 예측, 시각화하는 과정을 하나의 함수로 작성
# 학습하고, 예측하고, 시각화하는 함수
def fit_predict_plot(model, dataset, position, title):
    X, y = dataset
    model.fit(X)
    if hasattr(model, 'labels_'):
        labels = model.labels_.astype(np.int)
    else:
        labels = model.predict(X)
    
    colors = np.array(['#30A9DE', '#E53A40', '#090707', '#A593E0', '#F6B352', '##519D9E', '#D81159', '#8CD790', '#353866'])
    ax = plt.subplot(position)
    ax.set_title(title)
    ax.scatter(X[:, 0], X[:, 1], color = colors[labels])

 

 

1. K-Means

  • n개의 등분산 그룹으로 군집화
  • 제곱합 함수를 최소화
  • 군집화 개수를 지정해야 함
  • 각 군집 \(C\)의 평균 \(\mu_{j}\)을 중심점이라고 함
  • 다음을 만족하는 중심점을 찾는 것이 목표

$$ \sum_{i=0}^{n}\underset{\mu _{j}\in C}{min}(\left\| x_{i}-\mu_{j}\right\|^2) $$

  • 거리를 기반으로 계산

 

  - 만들어둔 예시 데이터와 함수를 기반으로 군집화 및 시각화

fig = plt.figure(figsize = (12, 12))
fig.suptitle('K-Means')
fit_predict_plot(cluster.KMeans(n_clusters = 2, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.KMeans(n_clusters = 2, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.KMeans(n_clusters = 2, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.KMeans(n_clusters = 2, random_state = random_state), no_structures, 224, 'No structures')

 

  - 군집 개수를 3개로

fig = plt.figure(figsize = (12, 12))
fig.suptitle('K-Means')
fit_predict_plot(cluster.KMeans(n_clusters = 3, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.KMeans(n_clusters = 3, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.KMeans(n_clusters = 3, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.KMeans(n_clusters = 3, random_state = random_state), no_structures, 224, 'No structures')

 

  - 군집 개수를 4개로

fig = plt.figure(figsize = (12, 12))
fig.suptitle('K-Means')
fit_predict_plot(cluster.KMeans(n_clusters = 4, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.KMeans(n_clusters = 4, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.KMeans(n_clusters = 4, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.KMeans(n_clusters = 4, random_state = random_state), no_structures, 224, 'No structures')

 

 

  - 붓꽃 데이터 군집화

from sklearn.datasets import load_iris
iris = load_iris()

model = cluster.KMeans(n_clusters = 3)
model.fit(iris.data)
predict = model.predict(iris.data)

# 예측결과가 0인 것에 인덱스 부여
idx = np.where(predict == 0)
iris.target[idx]

# 예측결과가 1인 것에 인덱스 부여
idx = np.where(predict == 1)
iris.target[idx]

# 예측결과가 2인 것에 인덱스 부여
idx = np.where(predict == 2)
iris.target[idx]

 

2. 미니 배치 K-Means

  • 배치 처리를 통해 계산 시간을 줄인 K-평균
  • K-평균과 다른 결과가 나올 수 있음

 

  - 위의 예제 데이터로 실습

  - 군집 2개로 군집화

fig = plt.figure(figsize = (12, 12))
fig.suptitle('MiniBatch K-Means')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 2, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 2, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 2, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 2, random_state = random_state), no_structures, 224, 'No structures')

  - 군집 3개로 군집화

fig = plt.figure(figsize = (12, 12))
fig.suptitle('MiniBatch K-Means')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 3, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 3, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 3, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 3, random_state = random_state), no_structures, 224, 'No structures')

  - 군집 4개로 군집화

fig = plt.figure(figsize = (12, 12))
fig.suptitle('MiniBatch K-Means')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 4, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 4, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 4, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.MiniBatchKMeans(n_clusters = 4, random_state = random_state), no_structures, 224, 'No structures')

 

 

3. Affinity Propagation

  • 샘플 쌍끼리 메시지를 보내 군집을 생성
  • 샘플을 대표하는 적절한 예를 찾을 때까지 반복
  • 군집의 개수를 자동으로 정함

https://datascienceschool.net/03%20machine%20learning/16.05%20Affinity%20Propagation.html

 

  - 위의 예제 데이터로 실습

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Affinity Propagation')

# 군집의 개수를 정할 필요 없고 대신, damping과 preference를 파라미터로 작성
# damping: 알고리즘의 매 반복마다 Responsiblity Matrix와 Availability Matrix를 업데이트할 때 Exponential weighted average를 적용
# preference: 각 data point들이 얼마다 exemplar로 선택될 가능성이 높은지를 지정
fit_predict_plot(cluster.AffinityPropagation(damping = .9, preference = -200), circles, 221, 'Circles')
fit_predict_plot(cluster.AffinityPropagation(damping = .9, preference = -200), moons, 222, 'Moons')
fit_predict_plot(cluster.AffinityPropagation(damping = .9, preference = -200), blobs, 223, 'Blobs')
fit_predict_plot(cluster.AffinityPropagation(damping = .9, preference = -200), no_structures, 224, 'No structures')

각 구조마다 적절한 군집 개수를 스스로 결정항 군집화한 모습

 

 

4. Mean Shift

  •  중심점 후보를 정해진 구역 내 평균으로 업데이트
fig = plt.figure(figsize = (12, 12))
fig.suptitle('Mean Shift')

# 파라미터는 따로 지정 안함
fit_predict_plot(cluster.MeanShift(), circles, 221, 'Circles')
fit_predict_plot(cluster.MeanShift(), moons, 222, 'Moons')
fit_predict_plot(cluster.MeanShift(), blobs, 223, 'Blobs')
fit_predict_plot(cluster.MeanShift(), no_structures, 224, 'No structures')

 

 

5. 스펙트럼 군집화

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Spectral Clustering')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 2, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 2, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 2, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 2, random_state = random_state), no_structures, 224, 'No structures')

  - 군집 3개

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Spectral Clustering')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 3, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 3, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 3, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 3, random_state = random_state), no_structures, 224, 'No structures')

  - 군집 4개

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Spectral Clustering')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 4, random_state = random_state), circles, 221, 'Circles')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 4, random_state = random_state), moons, 222, 'Moons')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 4, random_state = random_state), blobs, 223, 'Blobs')
fit_predict_plot(cluster.SpectralClustering(n_clusters = 4, random_state = random_state), no_structures, 224, 'No structures')

 

 

  - 유방암 데이터 군집화

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

model = cluster.SpectralClustering(n_clusters = 2, eigen_solver = 'arpack', affinity = 'nearest_neighbors')
model.fit(cancer.data)
predict = model.labels_
# 0으로 predict 한 target 표시
idx = np.where(predict == 0)
cancer.target[idx]

# 출력 결과
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
       1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1,
       1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,
       0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1,
       1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1,
       1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0])
       
       

# 1로 predict 한 target 표시
idx = np.where(predict == 1)
cancer.target[idx]

# 출력 결과
array([0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1])

 

 

6. 계층 군집화

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Hierarchical Clustering')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 2, linkage = 'ward'), circles, 221, 'Circles')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 2, linkage = 'ward'), moons, 222, 'Moons')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 2, linkage = 'ward'), blobs, 223, 'Blobs')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 2, linkage = 'ward'), no_structures, 224, 'No structures')

  - 군집 3개

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Hierarchical Clustering')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 3, linkage = 'ward'), circles, 221, 'Circles')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 3, linkage = 'ward'), moons, 222, 'Moons')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 3, linkage = 'ward'), blobs, 223, 'Blobs')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 3, linkage = 'ward'), no_structures, 224, 'No structures')

  - 군집 4개

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Hierarchical Clustering')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 4, linkage = 'ward'), circles, 221, 'Circles')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 4, linkage = 'ward'), moons, 222, 'Moons')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 4, linkage = 'ward'), blobs, 223, 'Blobs')
fit_predict_plot(cluster.AgglomerativeClustering(n_clusters = 4, linkage = 'ward'), no_structures, 224, 'No structures')

 

  - 와인 데이터 군집화

from sklearn.datasets import load_wine
wine = load_wine()

model = cluster.AgglomerativeClustering(n_clusters = 3)
model.fit(wine.data)
predict = model.labels_

idx = np.where(predict == 0)
wine.target[idx]

# 출력 결과
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1])


idx = np.where(predict == 1)
wine.target[idx]

# 출력 결과
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2])


idx = np.where(predict == 2)
wine.target[idx]

# 출력 결과
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

 

 

7. DBSCAN(Density-Based Spatial Clustering of Applications with Noise)

fig = plt.figure(figsize = (12, 12))
fig.suptitle('DBSCAN')
fit_predict_plot(cluster.DBSCAN(eps = .3), circles, 221, 'Circles')
fit_predict_plot(cluster.DBSCAN(eps = .3), moons, 222, 'Moons')
fit_predict_plot(cluster.DBSCAN(eps = .3), blobs, 223, 'Blobs')
fit_predict_plot(cluster.DBSCAN(eps = .3), no_structures, 224, 'No structures')

 

 

8. OPTICS(Ordering Points To Identify the Clustering Structure)

fig = plt.figure(figsize = (12, 12))
fig.suptitle('DOPTICS')
fit_predict_plot(cluster.OPTICS(min_samples = 20, xi = 0.05, min_cluster_size = 0.1), circles, 221, 'Circles')
fit_predict_plot(cluster.OPTICS(min_samples = 20, xi = 0.05, min_cluster_size = 0.1), moons, 222, 'Moons')
fit_predict_plot(cluster.OPTICS(min_samples = 20, xi = 0.05, min_cluster_size = 0.1), blobs, 223, 'Blobs')
fit_predict_plot(cluster.OPTICS(min_samples = 20, xi = 0.05, min_cluster_size = 0.1), no_structures, 224, 'No structures')

 

 

9. Birch(Balanced iterative redcing and clustering using hierarchies)

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Birch')
fit_predict_plot(cluster.Birch(n_clusters = 2, threshold = .3), circles, 221, 'Circles')
fit_predict_plot(cluster.Birch(n_clusters = 2, threshold = .3), moons, 222, 'Moons')
fit_predict_plot(cluster.Birch(n_clusters = 2, threshold = .3), blobs, 223, 'Blobs')
fit_predict_plot(cluster.Birch(n_clusters = 2, threshold = .3), no_structures, 224, 'No structures')

  - 군집 3개

fig = plt.figure(figsize = (12, 12))
fig.suptitle('Birch')
fit_predict_plot(cluster.Birch(n_clusters = 3, threshold = .3), circles, 221, 'Circles')
fit_predict_plot(cluster.Birch(n_clusters = 3, threshold = .3), moons, 222, 'Moons')
fit_predict_plot(cluster.Birch(n_clusters = 3, threshold = .3), blobs, 223, 'Blobs')
fit_predict_plot(cluster.Birch(n_clusters = 3, threshold = .3), no_structures, 224, 'No structures')

 

 

10. 손글씨 데이터 군집화

  - 데이터 확인

 
from sklearn.datasets import load_digits

digits = load_digits()

X = digits.data.reshape(-1, 8, 8)
y= digits.target

plt.figure(figsize = (16, 8))
for i in range(10):
    plt.subplot(2, 5, i+1)
    plt.imshow(X[i])

 

  - K-Means

kmeans = cluster.KMeans(n_clusters = 10)
kmeans.fit(digits.data)
predict = kmeans.predict(digits.data)

# 텍스트로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)
    real_class = digits.target[idx]
    print('Cluster {}: {}'.format(i+1, real_class))
    
# 이미지로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)[0]
    choice_idx = np.random.choice(idx, size = 5)
    choice_image = X[choice_idx]

    k = 1

    print('Cluster: {}'.format(i+1))
    for image in choice_image:
        plt.subplot(1, 5, k)
        plt.xticks([])
        plt.yticks([])
        plt.imshow(image)
        k += 1
    
    plt.show()

 

  - 스펙트럼 군집화

spectral = cluster.SpectralClustering(n_clusters = 10, eigen_solver = 'arpack', affinity = 'nearest_neighbors')
spectral.fit(digits.data)
predict = spectral.labels_

# 텍스트로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)
    real_class = digits.target[idx]
    print('Cluster {}: {}'.format(i+1, real_class))

# 이미지로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)[0]
    choice_idx = np.random.choice(idx, size = 5)
    choice_image = X[choice_idx]

    k = 1

    print('Cluster: {}'.format(i+1))
    for image in choice_image:
        plt.subplot(1, 5, k)
        plt.xticks([])
        plt.yticks([])
        plt.imshow(image)
        k += 1
    
    plt.show()

 

  - 계층 군집화

hierarchical = cluster.AgglomerativeClustering(n_clusters = 10, linkage = 'ward')
hierarchical.fit(digits.data)
predict = hierarchical.labels_

# 텍스트로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)
    real_class = digits.target[idx]
    print('Cluster {}: {}'.format(i+1, real_class))

# 이미지로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)[0]
    choice_idx = np.random.choice(idx, size = 5)
    choice_image = X[choice_idx]

    k = 1

    print('Cluster: {}'.format(i+1))
    for image in choice_image:
        plt.subplot(1, 5, k)
        plt.xticks([])
        plt.yticks([])
        plt.imshow(image)
        k += 1
    
    plt.show()

 

  - Birch

birch = cluster.Birch(n_clusters = 10, threshold = .3)
birch.fit(digits.data)
predict = birch.labels_

# 텍스트로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)
    real_class = digits.target[idx]
    print('Cluster {}: {}'.format(i+1, real_class))

# 이미지로 예측 결과 확인
for i in range(10):
    idx = np.where(predict == i)[0]
    choice_idx = np.random.choice(idx, size = 5)
    choice_image = X[choice_idx]

    k = 1

    print('Cluster: {}'.format(i+1))
    for image in choice_image:
        plt.subplot(1, 5, k)
        plt.xticks([])
        plt.yticks([])
        plt.imshow(image)
        k += 1
    
    plt.show()

+ Recent posts