machine_learning
기본 지도 학습 알고리즘 (2) 분류
hayleyhell
2022. 11. 18. 19:02
결정 트리¶
- 분류
In [ ]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [ ]:
# 데이터 로드
iris = load_iris()
In [ ]:
X = pd.DataFrame(iris.data, columns=iris.feature_names)
print(X.shape)
X.head()
(150, 4)
Out[ ]:
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 |
1 | 4.9 | 3.0 | 1.4 | 0.2 |
2 | 4.7 | 3.2 | 1.3 | 0.2 |
3 | 4.6 | 3.1 | 1.5 | 0.2 |
4 | 5.0 | 3.6 | 1.4 | 0.2 |
In [ ]:
y = pd.DataFrame(iris.target, columns=['class'])
print(y.shape)
y.head()
(150, 1)
Out[ ]:
class | |
---|---|
0 | 0 |
1 | 0 |
2 | 0 |
3 | 0 |
4 | 0 |
In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
# 모델을 학습시킬 때 경고 메세지가 나오지 않기 위해
y_train = y_train.values.ravel()
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(120, 4) (30, 4) (120,) (30, 1)
In [ ]:
# 모델 학습
model = DecisionTreeClassifier(max_depth=4)
model.fit(X_train, y_train)
Out[ ]:
DecisionTreeClassifier(max_depth=4)
In [ ]:
# 모델로 test set 데이터 예측
y_test_predict = model.predict(X_test)
y_test_predict
Out[ ]:
array([1, 2, 2, 0, 2, 2, 0, 2, 0, 1, 1, 1, 2, 2, 0, 0, 2, 2, 0, 0, 1, 2, 0, 1, 1, 2, 1, 1, 1, 2])
In [ ]:
# 모델 성능 평가
model.score(X_test, y_test)
Out[ ]:
0.9
속성 중요도 (평균 지니 감소)
- 위 노드에서 아래 노드로 내려오면서, 불순도가 얼마나 줄어들었는지를 계산!
- 특정 변수가 불순도를 얼마나 낮췄는지 → 그 변수가 얼마나 중요한지
In [ ]:
# 속성들이 순서대로 얼마나 중요한지 np 배열 안에 저장
importances = model.feature_importances_
importances
Out[ ]:
array([0.02378049, 0. , 0.54764808, 0.42857143])
In [ ]:
# argsort()는 정렬 행렬의 원본 행렬 인덱스를 ndarray 형으로 반환해 준다
indices_sorted = np.argsort(importances)
indices_sorted
Out[ ]:
array([1, 0, 3, 2])
In [ ]:
plt.figure()
plt.title('Feautre importances')
plt.bar(range(len(importances)), importances[indices_sorted])
plt.xticks(range(len(importances)), X.columns[indices_sorted], rotation=90)
plt.show()
로지스틱 회귀¶
선형 회귀
Y=Wx+b
로지스틱 회귀
P(Y=1)=Wx+b
- 여기서 P(Y)는 확률이기 때문에 0에서 1 사이의 값으로 떨어져야 한다.
Y = Sigmoid(Wx+b)
- sigmoid 함수를 통해서 (0,1)을 예측하는 선으로 바꿀 수 있다.
로그 손실
- 로지스틱 회귀의 손실 함수는 로그 손실 (log-loss / cross entropy)을 사용
- 손실의 정도를 로그 함수로 결정하기 때문에 로그 손실
In [ ]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd
iris_data = load_iris()
X = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)
y = pd.DataFrame(iris_data.target, columns=['class'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
In [ ]:
# 로지스틱 회귀에서는 이 코드를 추가하면, 경고가 안 뜬다.
y_train = y_train.values.ravel()
In [ ]:
# 로지스틱 회귀는 좀 더 다양한 설정을 할 수 있다.optional parameter
model = LogisticRegression(solver='saga', # 모델을 최적화 할 때 어떤 알고리즘을 쓸지
max_iter=2000) # 최적화 할 때 몇 번 반복할지 결정
In [ ]:
model.fit(X_train, y_train)
Out[ ]:
LogisticRegression(max_iter=2000, solver='saga')
In [ ]:
# 분류 모델의 예측 값이니까 0, 1 ,2 만 있다
model.predict(X_test)
Out[ ]:
array([1, 2, 2, 0, 2, 1, 0, 2, 0, 1, 1, 2, 2, 2, 0, 0, 2, 2, 0, 0, 1, 2, 0, 1, 1, 2, 1, 1, 1, 2])
In [ ]:
model.score(X_test, y_test)
Out[ ]:
0.9666666666666667
앙상블¶
- 배깅
- 분산을 줄이는 방식 -> 과적합이 줄어든다
- 랜덤 포레스트
- 의사결정 트리를 다수를 생성 -> 연속형이라면 평균, 범주형이라면 투표
- 부스팅
- 의사결정 트리를 다수를 생성 -> 가중치
- 반복적으로 모델 생성 -> 편향과 분산을 줄이는 방식
- 예측 모델을 순차적으로 더해나간 후, 최종 모델을 예측
- 가중치를 어떻게 부여하냐에 초점
- 그라디언트 부스팅
- 의사결정 트리를 다수를 생성 -> 가중치
랜덤 포레스트 (Random Forest)¶
- 분류, 회귀 둘 다 이용
- 의사결정 학습자(Decision Tree)의 집합
- 수많은 결정 트리를 임의로 만들고, 다수결 투표로 결과를 예측하는 알고리즘
임의로 데이터셋 생성
- Bagging
- 모델을 만들 때마다 임의로 만든 Bootstrap 데이터셋을 이용한다.
- 이 모델들의 결정을 합친다(aggregating)
- Bagging
임의로 속성 선택
- 트리를 임의로 만들기 때문에 수많은 결정 트리를 만들 수 있다.
In [ ]:
from sklearn.ensemble import RandomForestClassifier
In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
y_train = y_train.values.ravel()
In [ ]:
# 100개의 결정 트리 사용
model = RandomForestClassifier(n_estimators=100,
max_depth=4)
In [ ]:
model.fit(X_train, y_train)
Out[ ]:
RandomForestClassifier(max_depth=4)
In [ ]:
model.predict(X_test)
Out[ ]:
array([1, 1, 2, 0, 2, 1, 0, 2, 0, 1, 1, 1, 2, 2, 0, 0, 2, 2, 0, 0, 1, 2, 0, 1, 1, 2, 1, 1, 1, 2])
In [ ]:
model.score(X_test, y_test)
Out[ ]:
0.9
이 모델이 90%의 확률로 제대로 분류한다.
In [ ]:
# 랜덤 포레스트도 결정 트리를 이용하기 때문에, 평균 지니 감소를 이용해서 속성 중요도를 계산할 수 있다
model.feature_importances_
Out[ ]:
array([0.09186873, 0.01898129, 0.39892432, 0.49022566])
In [ ]:
# 랜덤 포레스트 모델의 속성 중요도 시각화
plt.figure()
plt.title('Feautre importances')
plt.bar(range(len(importances)), importances[indices_sorted])
plt.xticks(range(len(importances)), X.columns[indices_sorted], rotation=90)
plt.show()
회귀 문제인 경우에도 랜덤 포레스트를 사용 할 수 있다.
In [ ]:
from sklearn.ensemble import RandomForestRegressor
In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
y_train = y_train.values.ravel()
In [ ]:
model = RandomForestRegressor(n_estimators=100, max_depth=4)
In [ ]:
model.fit(X_train, y_train)
Out[ ]:
RandomForestRegressor(max_depth=4)
In [ ]:
model.score(X_test, y_test)
Out[ ]:
0.9164586562547626
그라디언트 부스팅 (GBM)¶
- 분류, 회귀 둘 다 이용
- 이전 트리의 오차를 보완하는 방법
- 순차적으로 훈련하는 단계에서, 손실 함수를 최소화하는 모델
- 여러 약한 학습자를 어떻게 합쳐서, 최종 모델로 만들 것이냐에 초점 부스팅
- 이전 트리에서 잘못 분류된 데이터에 대해서 훈련 -> 반복
In [ ]:
from sklearn.ensemble import GradientBoostingRegressor
In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
y_train = y_train.values.ravel()
In [ ]:
# 랜덤 포레스트에서는 트리의 수가 가장 중요한 인자이고
# 그라디언트 부스팅은 트리의 수와 학습률 두 가지가 중요한 인자이다
model = GradientBoostingRegressor(n_estimators = 2000, # 트리의 수
learning_rate = 0.05, # 학습률
max_depth = 5) # 트리의 깊이
In [ ]:
model.fit(X_train, y_train)
Out[ ]:
GradientBoostingRegressor(learning_rate=0.05, max_depth=5, n_estimators=2000)
In [ ]:
model.predict(X_train)
Out[ ]:
array([1.35950349e-08, 9.99999999e-01, 9.99999991e-01, 1.35950349e-08, 9.99999999e-01, 1.35950349e-08, 1.35950349e-08, 1.99999998e+00, 1.35950349e-08, 1.99999998e+00, 1.99999998e+00, 9.99999999e-01, 1.35950349e-08, 1.35950349e-08, 9.99999999e-01, 1.99999998e+00, 9.99999999e-01, 1.99999998e+00, 1.99999998e+00, 1.35950349e-08, 1.99999998e+00, 1.35950349e-08, 1.35950349e-08, 1.35950349e-08, 1.35950349e-08, 9.99999999e-01, 9.99999999e-01, 9.99999991e-01, 9.99999999e-01, 1.99999998e+00, 1.00000002e+00, 1.99999998e+00, 1.99999998e+00, 1.00000000e+00, 1.35950349e-08, 9.99999999e-01, 1.35950349e-08, 1.00000002e+00, 1.99999998e+00, 9.99999999e-01, 1.99999998e+00, 9.99999999e-01, 1.99999998e+00, 9.99999999e-01, 9.99999999e-01, 9.99999999e-01, 1.99999998e+00, 1.00000000e+00, 1.35950349e-08, 1.99999998e+00, 1.35950349e-08, 1.35950349e-08, 9.99999999e-01, 1.35950349e-08, 1.35950349e-08, 1.99999998e+00, 9.99999991e-01, 1.35950349e-08, 1.99999998e+00, 1.35950349e-08, 1.99999995e+00, 9.99999999e-01, 1.35950349e-08, 1.35950349e-08, 1.99999998e+00, 9.99999999e-01, 9.99999999e-01, 1.35950349e-08, 1.35950349e-08, 1.99999998e+00, 9.99999999e-01, 9.99999999e-01, 1.35950349e-08, 9.99999999e-01, 1.99999998e+00, 9.99999991e-01, 1.35950349e-08, 1.00000004e+00, 1.99999998e+00, 1.99999998e+00, 1.99999998e+00, 1.99999998e+00, 1.35950349e-08, 1.35950349e-08, 1.00000002e+00, 1.99999998e+00, 1.35950349e-08, 1.00000004e+00, 1.35950349e-08, 1.35950349e-08, 1.99999998e+00, 9.99999999e-01, 1.99999998e+00, 1.35950349e-08, 1.99999998e+00, 1.35950349e-08, 1.99999998e+00, 1.35950349e-08, 9.99999999e-01, 1.35950349e-08, 1.99999998e+00, 1.99999998e+00, 1.35950349e-08, 1.99999997e+00, 1.99999998e+00, 1.99999998e+00, 9.99999999e-01, 1.35950349e-08, 9.99999999e-01, 1.35950349e-08, 1.35950349e-08, 9.99999999e-01, 1.99999998e+00, 1.35950349e-08, 1.99999998e+00, 1.35950349e-08, 9.99999999e-01, 1.99999998e+00, 1.99999998e+00, 9.99999999e-01])
In [ ]:
model.score(X_test, y_test)
Out[ ]:
0.9054883915557276
에다 부스트 (Adaboost)¶
- 분류, 회귀 둘 다 이용
- 실수(잘못 분류된 mistake 샘플 = weak learner)를 통해 학습해나가는 부스팅의 유형
- 단일 학습자에 비해 편향이 줄어든다.
In [ ]:
from sklearn.ensemble import AdaBoostClassifier
In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
y_train = y_train.values.ravel()
In [ ]:
# 100개의 결정 스텀프를 만든다
model = AdaBoostClassifier(n_estimators=100,
learning_rate = 0.05) #학습률
In [ ]:
model.fit(X_train, y_train)
Out[ ]:
AdaBoostClassifier(learning_rate=0.05, n_estimators=100)
In [ ]:
model.predict(X_train)
Out[ ]:
array([0, 1, 1, 0, 1, 0, 0, 2, 0, 2, 2, 1, 0, 0, 1, 1, 1, 2, 2, 0, 2, 0, 0, 0, 0, 1, 1, 1, 1, 2, 1, 2, 2, 1, 0, 1, 0, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 0, 2, 0, 0, 1, 0, 0, 2, 1, 0, 2, 0, 1, 1, 0, 0, 2, 1, 1, 0, 0, 2, 1, 1, 0, 1, 1, 1, 0, 1, 2, 2, 2, 2, 0, 0, 1, 2, 0, 1, 0, 0, 2, 1, 2, 0, 2, 0, 2, 0, 1, 0, 2, 1, 0, 1, 1, 2, 1, 0, 1, 0, 0, 1, 2, 0, 2, 0, 1, 2, 2, 1])
In [ ]:
model.score(X_test, y_test)
Out[ ]:
0.9
XGBoost¶
In [ ]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
In [ ]:
cancer = load_breast_cancer()
cancer_df = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
cancer_df['target'] = cancer.target
cancer_df.columns
Out[ ]:
Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area', 'mean smoothness', 'mean compactness', 'mean concavity', 'mean concave points', 'mean symmetry', 'mean fractal dimension', 'radius error', 'texture error', 'perimeter error', 'area error', 'smoothness error', 'compactness error', 'concavity error', 'concave points error', 'symmetry error', 'fractal dimension error', 'worst radius', 'worst texture', 'worst perimeter', 'worst area', 'worst smoothness', 'worst compactness', 'worst concavity', 'worst concave points', 'worst symmetry', 'worst fractal dimension', 'target'], dtype='object')
In [ ]:
y = cancer_df['target']
X = cancer_df.drop('target', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=156)
# X_train, y_train을 다시 쪼개서 학습과 검증용 데이터로 분리
X_tr, X_val, y_tr, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=156)
In [ ]:
from xgboost import XGBClassifier
# Warining 메시지를 없애기 위해 eval_metric 값을 XGBClassifier 생성 인자로 입력
xgb_clf = XGBClassifier(n_estimators=400,
learning_rate=0.05,
max_depth=3,
eval_metric='logloss')
evals = [(X_tr, y_tr), (X_val, y_val)]
xgb_clf.fit(X_tr, y_tr, early_stopping_rounds=50, eval_metric='logloss',
eval_set=evals, verbose=True) # verbose는 학습마다 평가값 메세지 출력
pred = xgb_clf.predict(X_test)
pred_proba = xgb_clf.predict_proba(X_test)[:, 1]
[0] validation_0-logloss:0.650162 validation_1-logloss:0.661831 Multiple eval metrics have been passed: 'validation_1-logloss' will be used for early stopping. Will train until validation_1-logloss hasn't improved in 50 rounds. [1] validation_0-logloss:0.611314 validation_1-logloss:0.636086 [2] validation_0-logloss:0.57563 validation_1-logloss:0.611436 [3] validation_0-logloss:0.543099 validation_1-logloss:0.592036 [4] validation_0-logloss:0.513234 validation_1-logloss:0.573291 [5] validation_0-logloss:0.484475 validation_1-logloss:0.550371 [6] validation_0-logloss:0.457959 validation_1-logloss:0.529295 [7] validation_0-logloss:0.434357 validation_1-logloss:0.51534 [8] validation_0-logloss:0.411503 validation_1-logloss:0.497179 [9] validation_0-logloss:0.390274 validation_1-logloss:0.481542 [10] validation_0-logloss:0.371284 validation_1-logloss:0.469904 [11] validation_0-logloss:0.352541 validation_1-logloss:0.45474 [12] validation_0-logloss:0.335278 validation_1-logloss:0.442294 [13] validation_0-logloss:0.318925 validation_1-logloss:0.429608 [14] validation_0-logloss:0.304393 validation_1-logloss:0.420649 [15] validation_0-logloss:0.289996 validation_1-logloss:0.409577 [16] validation_0-logloss:0.276513 validation_1-logloss:0.398867 [17] validation_0-logloss:0.263894 validation_1-logloss:0.390504 [18] validation_0-logloss:0.252102 validation_1-logloss:0.382539 [19] validation_0-logloss:0.241231 validation_1-logloss:0.373933 [20] validation_0-logloss:0.230763 validation_1-logloss:0.367887 [21] validation_0-logloss:0.220913 validation_1-logloss:0.360174 [22] validation_0-logloss:0.211555 validation_1-logloss:0.354212 [23] validation_0-logloss:0.20263 validation_1-logloss:0.346828 [24] validation_0-logloss:0.194341 validation_1-logloss:0.341107 [25] validation_0-logloss:0.186369 validation_1-logloss:0.336341 [26] validation_0-logloss:0.178748 validation_1-logloss:0.330823 [27] validation_0-logloss:0.171671 validation_1-logloss:0.326746 [28] validation_0-logloss:0.164814 validation_1-logloss:0.32099 [29] validation_0-logloss:0.158347 validation_1-logloss:0.316706 [30] validation_0-logloss:0.152249 validation_1-logloss:0.312766 [31] validation_0-logloss:0.146498 validation_1-logloss:0.308823 [32] validation_0-logloss:0.141021 validation_1-logloss:0.304373 [33] validation_0-logloss:0.135903 validation_1-logloss:0.301034 [34] validation_0-logloss:0.131089 validation_1-logloss:0.297937 [35] validation_0-logloss:0.126469 validation_1-logloss:0.294994 [36] validation_0-logloss:0.121972 validation_1-logloss:0.292951 [37] validation_0-logloss:0.117844 validation_1-logloss:0.290434 [38] validation_0-logloss:0.113793 validation_1-logloss:0.289272 [39] validation_0-logloss:0.109943 validation_1-logloss:0.285777 [40] validation_0-logloss:0.106376 validation_1-logloss:0.283644 [41] validation_0-logloss:0.103022 validation_1-logloss:0.281831 [42] validation_0-logloss:0.099625 validation_1-logloss:0.280047 [43] validation_0-logloss:0.096486 validation_1-logloss:0.279718 [44] validation_0-logloss:0.093594 validation_1-logloss:0.277438 [45] validation_0-logloss:0.090803 validation_1-logloss:0.275416 [46] validation_0-logloss:0.088072 validation_1-logloss:0.275037 [47] validation_0-logloss:0.085411 validation_1-logloss:0.274579 [48] validation_0-logloss:0.082988 validation_1-logloss:0.273483 [49] validation_0-logloss:0.080349 validation_1-logloss:0.272474 [50] validation_0-logloss:0.077857 validation_1-logloss:0.271628 [51] validation_0-logloss:0.075504 validation_1-logloss:0.270936 [52] validation_0-logloss:0.073438 validation_1-logloss:0.26967 [53] validation_0-logloss:0.071468 validation_1-logloss:0.270078 [54] validation_0-logloss:0.069642 validation_1-logloss:0.268902 [55] validation_0-logloss:0.067661 validation_1-logloss:0.268544 [56] validation_0-logloss:0.065915 validation_1-logloss:0.268998 [57] validation_0-logloss:0.064327 validation_1-logloss:0.267897 [58] validation_0-logloss:0.062588 validation_1-logloss:0.266632 [59] validation_0-logloss:0.061074 validation_1-logloss:0.267428 [60] validation_0-logloss:0.05957 validation_1-logloss:0.266097 [61] validation_0-logloss:0.058166 validation_1-logloss:0.266438 [62] validation_0-logloss:0.056912 validation_1-logloss:0.26673 [63] validation_0-logloss:0.055498 validation_1-logloss:0.265499 [64] validation_0-logloss:0.054219 validation_1-logloss:0.26443 [65] validation_0-logloss:0.053109 validation_1-logloss:0.265004 [66] validation_0-logloss:0.05207 validation_1-logloss:0.265905 [67] validation_0-logloss:0.050929 validation_1-logloss:0.265012 [68] validation_0-logloss:0.04976 validation_1-logloss:0.264348 [69] validation_0-logloss:0.048717 validation_1-logloss:0.263599 [70] validation_0-logloss:0.047764 validation_1-logloss:0.263186 [71] validation_0-logloss:0.046802 validation_1-logloss:0.262547 [72] validation_0-logloss:0.0458 validation_1-logloss:0.26204 [73] validation_0-logloss:0.044838 validation_1-logloss:0.262537 [74] validation_0-logloss:0.043883 validation_1-logloss:0.262887 [75] validation_0-logloss:0.043086 validation_1-logloss:0.262487 [76] validation_0-logloss:0.042236 validation_1-logloss:0.262175 [77] validation_0-logloss:0.04133 validation_1-logloss:0.261664 [78] validation_0-logloss:0.0405 validation_1-logloss:0.26179 [79] validation_0-logloss:0.039667 validation_1-logloss:0.261034 [80] validation_0-logloss:0.038765 validation_1-logloss:0.260937 [81] validation_0-logloss:0.038064 validation_1-logloss:0.261483 [82] validation_0-logloss:0.037396 validation_1-logloss:0.260539 [83] validation_0-logloss:0.036763 validation_1-logloss:0.259672 [84] validation_0-logloss:0.036052 validation_1-logloss:0.259053 [85] validation_0-logloss:0.035453 validation_1-logloss:0.260073 [86] validation_0-logloss:0.034885 validation_1-logloss:0.259836 [87] validation_0-logloss:0.03425 validation_1-logloss:0.259332 [88] validation_0-logloss:0.033608 validation_1-logloss:0.259319 [89] validation_0-logloss:0.033107 validation_1-logloss:0.260023 [90] validation_0-logloss:0.0326 validation_1-logloss:0.25936 [91] validation_0-logloss:0.032016 validation_1-logloss:0.258861 [92] validation_0-logloss:0.031522 validation_1-logloss:0.259177 [93] validation_0-logloss:0.031067 validation_1-logloss:0.258645 [94] validation_0-logloss:0.030485 validation_1-logloss:0.259511 [95] validation_0-logloss:0.030072 validation_1-logloss:0.260911 [96] validation_0-logloss:0.029626 validation_1-logloss:0.260142 [97] validation_0-logloss:0.029131 validation_1-logloss:0.25974 [98] validation_0-logloss:0.02866 validation_1-logloss:0.259371 [99] validation_0-logloss:0.028285 validation_1-logloss:0.258932 [100] validation_0-logloss:0.027886 validation_1-logloss:0.259282 [101] validation_0-logloss:0.027512 validation_1-logloss:0.259552 [102] validation_0-logloss:0.027137 validation_1-logloss:0.259006 [103] validation_0-logloss:0.026676 validation_1-logloss:0.259906 [104] validation_0-logloss:0.026343 validation_1-logloss:0.259499 [105] validation_0-logloss:0.025943 validation_1-logloss:0.25924 [106] validation_0-logloss:0.025561 validation_1-logloss:0.259006 [107] validation_0-logloss:0.025219 validation_1-logloss:0.257375 [108] validation_0-logloss:0.024916 validation_1-logloss:0.257017 [109] validation_0-logloss:0.024525 validation_1-logloss:0.257891 [110] validation_0-logloss:0.02418 validation_1-logloss:0.257703 [111] validation_0-logloss:0.023838 validation_1-logloss:0.25842 [112] validation_0-logloss:0.023563 validation_1-logloss:0.2581 [113] validation_0-logloss:0.023219 validation_1-logloss:0.258476 [114] validation_0-logloss:0.022903 validation_1-logloss:0.258327 [115] validation_0-logloss:0.022602 validation_1-logloss:0.258196 [116] validation_0-logloss:0.022286 validation_1-logloss:0.259054 [117] validation_0-logloss:0.022039 validation_1-logloss:0.258778 [118] validation_0-logloss:0.021756 validation_1-logloss:0.25728 [119] validation_0-logloss:0.021492 validation_1-logloss:0.257225 [120] validation_0-logloss:0.021194 validation_1-logloss:0.257639 [121] validation_0-logloss:0.020948 validation_1-logloss:0.25761 [122] validation_0-logloss:0.020666 validation_1-logloss:0.258319 [123] validation_0-logloss:0.020448 validation_1-logloss:0.25808 [124] validation_0-logloss:0.020231 validation_1-logloss:0.25855 [125] validation_0-logloss:0.019979 validation_1-logloss:0.257135 [126] validation_0-logloss:0.019728 validation_1-logloss:0.255869 [127] validation_0-logloss:0.019465 validation_1-logloss:0.256403 [128] validation_0-logloss:0.019271 validation_1-logloss:0.256846 [129] validation_0-logloss:0.019079 validation_1-logloss:0.256648 [130] validation_0-logloss:0.018855 validation_1-logloss:0.257117 [131] validation_0-logloss:0.018634 validation_1-logloss:0.256086 [132] validation_0-logloss:0.018388 validation_1-logloss:0.256486 [133] validation_0-logloss:0.018156 validation_1-logloss:0.257893 [134] validation_0-logloss:0.018016 validation_1-logloss:0.258114 [135] validation_0-logloss:0.017846 validation_1-logloss:0.257936 [136] validation_0-logloss:0.017628 validation_1-logloss:0.258756 [137] validation_0-logloss:0.01748 validation_1-logloss:0.258839 [138] validation_0-logloss:0.017322 validation_1-logloss:0.258673 [139] validation_0-logloss:0.017188 validation_1-logloss:0.258756 [140] validation_0-logloss:0.016962 validation_1-logloss:0.259871 [141] validation_0-logloss:0.016808 validation_1-logloss:0.259601 [142] validation_0-logloss:0.016689 validation_1-logloss:0.259824 [143] validation_0-logloss:0.016565 validation_1-logloss:0.259922 [144] validation_0-logloss:0.016383 validation_1-logloss:0.260348 [145] validation_0-logloss:0.016227 validation_1-logloss:0.260546 [146] validation_0-logloss:0.016061 validation_1-logloss:0.260917 [147] validation_0-logloss:0.015892 validation_1-logloss:0.26137 [148] validation_0-logloss:0.015719 validation_1-logloss:0.259991 [149] validation_0-logloss:0.015565 validation_1-logloss:0.260284 [150] validation_0-logloss:0.015457 validation_1-logloss:0.260479 [151] validation_0-logloss:0.015307 validation_1-logloss:0.261421 [152] validation_0-logloss:0.015151 validation_1-logloss:0.261877 [153] validation_0-logloss:0.015006 validation_1-logloss:0.262274 [154] validation_0-logloss:0.01486 validation_1-logloss:0.262866 [155] validation_0-logloss:0.014759 validation_1-logloss:0.262988 [156] validation_0-logloss:0.014615 validation_1-logloss:0.263459 [157] validation_0-logloss:0.014481 validation_1-logloss:0.263786 [158] validation_0-logloss:0.014336 validation_1-logloss:0.263057 [159] validation_0-logloss:0.014243 validation_1-logloss:0.262369 [160] validation_0-logloss:0.014103 validation_1-logloss:0.262511 [161] validation_0-logloss:0.014009 validation_1-logloss:0.262647 [162] validation_0-logloss:0.013918 validation_1-logloss:0.262639 [163] validation_0-logloss:0.013805 validation_1-logloss:0.262501 [164] validation_0-logloss:0.013717 validation_1-logloss:0.262639 [165] validation_0-logloss:0.013587 validation_1-logloss:0.262551 [166] validation_0-logloss:0.013503 validation_1-logloss:0.261884 [167] validation_0-logloss:0.013417 validation_1-logloss:0.262026 [168] validation_0-logloss:0.013312 validation_1-logloss:0.261898 [169] validation_0-logloss:0.013192 validation_1-logloss:0.26184 [170] validation_0-logloss:0.013124 validation_1-logloss:0.261331 [171] validation_0-logloss:0.013044 validation_1-logloss:0.261479 [172] validation_0-logloss:0.01297 validation_1-logloss:0.26157 [173] validation_0-logloss:0.012846 validation_1-logloss:0.262534 [174] validation_0-logloss:0.012777 validation_1-logloss:0.26229 [175] validation_0-logloss:0.012668 validation_1-logloss:0.260863 [176] validation_0-logloss:0.012581 validation_1-logloss:0.261032 Stopping. Best iteration: [126] validation_0-logloss:0.019728 validation_1-logloss:0.255869
In [ ]:
# 분류 평가 지표
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
from sklearn.metrics import f1_score, roc_auc_score
def get_clf_eval(y_test, predict=None, predict_proba=None):
confusion = confusion_matrix(y_test, predict) # 오차행렬
accuracy = accuracy_score(y_test, predict) # 정확도
precision = precision_score(y_test, predict) # 정밀도
recall = recall_score(y_test, predict) # 재현도
f1 = f1_score(y_test, predict) # f1 스코어
roc_auc = roc_auc_score(y_test, predict_proba) # auc 스코어
print('오차 행렬')
print(confusion)
print('정확도: {0:.4f}, 정밀도: {1:.4f}, 재현율: {2:.4f}, F1: {3:.4f}, AUC:{4:.4f}'.format(accuracy, precision, recall, f1, roc_auc))
In [ ]:
get_clf_eval(y_test, pred, pred_proba)
오차 행렬 [[34 3] [ 2 75]] 정확도: 0.9561, 정밀도: 0.9615, 재현율: 0.9740, F1: 0.9677, AUC:0.9933
In [ ]:
# early_stopping_rounds를 10으로 설정하고 재학습
xgb_clf.fit(X_tr, y_tr, early_stopping_rounds=10, eval_metric='logloss',
eval_set=evals, verbose=True) # verbose는 학습마다 평가값 메세지 출력
pred = xgb_clf.predict(X_test)
pred_proba = xgb_clf.predict_proba(X_test)[:, 1]
get_clf_eval(y_test, pred, pred_proba)
[0] validation_0-logloss:0.650162 validation_1-logloss:0.661831 Multiple eval metrics have been passed: 'validation_1-logloss' will be used for early stopping. Will train until validation_1-logloss hasn't improved in 10 rounds. [1] validation_0-logloss:0.611314 validation_1-logloss:0.636086 [2] validation_0-logloss:0.57563 validation_1-logloss:0.611436 [3] validation_0-logloss:0.543099 validation_1-logloss:0.592036 [4] validation_0-logloss:0.513234 validation_1-logloss:0.573291 [5] validation_0-logloss:0.484475 validation_1-logloss:0.550371 [6] validation_0-logloss:0.457959 validation_1-logloss:0.529295 [7] validation_0-logloss:0.434357 validation_1-logloss:0.51534 [8] validation_0-logloss:0.411503 validation_1-logloss:0.497179 [9] validation_0-logloss:0.390274 validation_1-logloss:0.481542 [10] validation_0-logloss:0.371284 validation_1-logloss:0.469904 [11] validation_0-logloss:0.352541 validation_1-logloss:0.45474 [12] validation_0-logloss:0.335278 validation_1-logloss:0.442294 [13] validation_0-logloss:0.318925 validation_1-logloss:0.429608 [14] validation_0-logloss:0.304393 validation_1-logloss:0.420649 [15] validation_0-logloss:0.289996 validation_1-logloss:0.409577 [16] validation_0-logloss:0.276513 validation_1-logloss:0.398867 [17] validation_0-logloss:0.263894 validation_1-logloss:0.390504 [18] validation_0-logloss:0.252102 validation_1-logloss:0.382539 [19] validation_0-logloss:0.241231 validation_1-logloss:0.373933 [20] validation_0-logloss:0.230763 validation_1-logloss:0.367887 [21] validation_0-logloss:0.220913 validation_1-logloss:0.360174 [22] validation_0-logloss:0.211555 validation_1-logloss:0.354212 [23] validation_0-logloss:0.20263 validation_1-logloss:0.346828 [24] validation_0-logloss:0.194341 validation_1-logloss:0.341107 [25] validation_0-logloss:0.186369 validation_1-logloss:0.336341 [26] validation_0-logloss:0.178748 validation_1-logloss:0.330823 [27] validation_0-logloss:0.171671 validation_1-logloss:0.326746 [28] validation_0-logloss:0.164814 validation_1-logloss:0.32099 [29] validation_0-logloss:0.158347 validation_1-logloss:0.316706 [30] validation_0-logloss:0.152249 validation_1-logloss:0.312766 [31] validation_0-logloss:0.146498 validation_1-logloss:0.308823 [32] validation_0-logloss:0.141021 validation_1-logloss:0.304373 [33] validation_0-logloss:0.135903 validation_1-logloss:0.301034 [34] validation_0-logloss:0.131089 validation_1-logloss:0.297937 [35] validation_0-logloss:0.126469 validation_1-logloss:0.294994 [36] validation_0-logloss:0.121972 validation_1-logloss:0.292951 [37] validation_0-logloss:0.117844 validation_1-logloss:0.290434 [38] validation_0-logloss:0.113793 validation_1-logloss:0.289272 [39] validation_0-logloss:0.109943 validation_1-logloss:0.285777 [40] validation_0-logloss:0.106376 validation_1-logloss:0.283644 [41] validation_0-logloss:0.103022 validation_1-logloss:0.281831 [42] validation_0-logloss:0.099625 validation_1-logloss:0.280047 [43] validation_0-logloss:0.096486 validation_1-logloss:0.279718 [44] validation_0-logloss:0.093594 validation_1-logloss:0.277438 [45] validation_0-logloss:0.090803 validation_1-logloss:0.275416 [46] validation_0-logloss:0.088072 validation_1-logloss:0.275037 [47] validation_0-logloss:0.085411 validation_1-logloss:0.274579 [48] validation_0-logloss:0.082988 validation_1-logloss:0.273483 [49] validation_0-logloss:0.080349 validation_1-logloss:0.272474 [50] validation_0-logloss:0.077857 validation_1-logloss:0.271628 [51] validation_0-logloss:0.075504 validation_1-logloss:0.270936 [52] validation_0-logloss:0.073438 validation_1-logloss:0.26967 [53] validation_0-logloss:0.071468 validation_1-logloss:0.270078 [54] validation_0-logloss:0.069642 validation_1-logloss:0.268902 [55] validation_0-logloss:0.067661 validation_1-logloss:0.268544 [56] validation_0-logloss:0.065915 validation_1-logloss:0.268998 [57] validation_0-logloss:0.064327 validation_1-logloss:0.267897 [58] validation_0-logloss:0.062588 validation_1-logloss:0.266632 [59] validation_0-logloss:0.061074 validation_1-logloss:0.267428 [60] validation_0-logloss:0.05957 validation_1-logloss:0.266097 [61] validation_0-logloss:0.058166 validation_1-logloss:0.266438 [62] validation_0-logloss:0.056912 validation_1-logloss:0.26673 [63] validation_0-logloss:0.055498 validation_1-logloss:0.265499 [64] validation_0-logloss:0.054219 validation_1-logloss:0.26443 [65] validation_0-logloss:0.053109 validation_1-logloss:0.265004 [66] validation_0-logloss:0.05207 validation_1-logloss:0.265905 [67] validation_0-logloss:0.050929 validation_1-logloss:0.265012 [68] validation_0-logloss:0.04976 validation_1-logloss:0.264348 [69] validation_0-logloss:0.048717 validation_1-logloss:0.263599 [70] validation_0-logloss:0.047764 validation_1-logloss:0.263186 [71] validation_0-logloss:0.046802 validation_1-logloss:0.262547 [72] validation_0-logloss:0.0458 validation_1-logloss:0.26204 [73] validation_0-logloss:0.044838 validation_1-logloss:0.262537 [74] validation_0-logloss:0.043883 validation_1-logloss:0.262887 [75] validation_0-logloss:0.043086 validation_1-logloss:0.262487 [76] validation_0-logloss:0.042236 validation_1-logloss:0.262175 [77] validation_0-logloss:0.04133 validation_1-logloss:0.261664 [78] validation_0-logloss:0.0405 validation_1-logloss:0.26179 [79] validation_0-logloss:0.039667 validation_1-logloss:0.261034 [80] validation_0-logloss:0.038765 validation_1-logloss:0.260937 [81] validation_0-logloss:0.038064 validation_1-logloss:0.261483 [82] validation_0-logloss:0.037396 validation_1-logloss:0.260539 [83] validation_0-logloss:0.036763 validation_1-logloss:0.259672 [84] validation_0-logloss:0.036052 validation_1-logloss:0.259053 [85] validation_0-logloss:0.035453 validation_1-logloss:0.260073 [86] validation_0-logloss:0.034885 validation_1-logloss:0.259836 [87] validation_0-logloss:0.03425 validation_1-logloss:0.259332 [88] validation_0-logloss:0.033608 validation_1-logloss:0.259319 [89] validation_0-logloss:0.033107 validation_1-logloss:0.260023 [90] validation_0-logloss:0.0326 validation_1-logloss:0.25936 [91] validation_0-logloss:0.032016 validation_1-logloss:0.258861 [92] validation_0-logloss:0.031522 validation_1-logloss:0.259177 [93] validation_0-logloss:0.031067 validation_1-logloss:0.258645 [94] validation_0-logloss:0.030485 validation_1-logloss:0.259511 [95] validation_0-logloss:0.030072 validation_1-logloss:0.260911 [96] validation_0-logloss:0.029626 validation_1-logloss:0.260142 [97] validation_0-logloss:0.029131 validation_1-logloss:0.25974 [98] validation_0-logloss:0.02866 validation_1-logloss:0.259371 [99] validation_0-logloss:0.028285 validation_1-logloss:0.258932 [100] validation_0-logloss:0.027886 validation_1-logloss:0.259282 [101] validation_0-logloss:0.027512 validation_1-logloss:0.259552 [102] validation_0-logloss:0.027137 validation_1-logloss:0.259006 [103] validation_0-logloss:0.026676 validation_1-logloss:0.259906 Stopping. Best iteration: [93] validation_0-logloss:0.031067 validation_1-logloss:0.258645 오차 행렬 [[34 3] [ 3 74]] 정확도: 0.9474, 정밀도: 0.9610, 재현율: 0.9610, F1: 0.9610, AUC:0.9933
LightGBM¶
- 더 빠른 학습과 예측 수행 시간
- 더 작은 메모리 사용량
- 카테고리형 피처의 자동 변환과 최적 분할 (원 핫 인코딩을 사용하지 않고도 카테고리형 피처를 최적으로 변환하고 이에 따른 노드 분할 수행)
In [ ]:
from lightgbm import LGBMClassifier
lgbm_clf = LGBMClassifier(n_estimators=400,
learning_rate=0.05)
# LightGBM도 XGBoost와 동일하게 조기 중단 수행 가능
evals = [(X_tr, y_tr), (X_val, y_val)]
lgbm_clf.fit(X_tr, y_tr, early_stopping_rounds=50, eval_metric='logloss',
eval_set=evals, verbose=True) # verbose는 학습마다 평가값 메세지 출력
pred = lgbm_clf.predict(X_test)
pred_proba = lgbm_clf.predict_proba(X_test)[:, 1]
get_clf_eval(y_test, pred, pred_proba)
[1] training's binary_logloss: 0.625671 training's binary_logloss: 0.625671 valid_1's binary_logloss: 0.628248 valid_1's binary_logloss: 0.628248 Training until validation scores don't improve for 50 rounds. [2] training's binary_logloss: 0.588173 training's binary_logloss: 0.588173 valid_1's binary_logloss: 0.601106 valid_1's binary_logloss: 0.601106 [3] training's binary_logloss: 0.554518 training's binary_logloss: 0.554518 valid_1's binary_logloss: 0.577587 valid_1's binary_logloss: 0.577587 [4] training's binary_logloss: 0.523972 training's binary_logloss: 0.523972 valid_1's binary_logloss: 0.556324 valid_1's binary_logloss: 0.556324 [5] training's binary_logloss: 0.49615 training's binary_logloss: 0.49615 valid_1's binary_logloss: 0.537407 valid_1's binary_logloss: 0.537407 [6] training's binary_logloss: 0.470108 training's binary_logloss: 0.470108 valid_1's binary_logloss: 0.519401 valid_1's binary_logloss: 0.519401 [7] training's binary_logloss: 0.446647 training's binary_logloss: 0.446647 valid_1's binary_logloss: 0.502637 valid_1's binary_logloss: 0.502637 [8] training's binary_logloss: 0.425055 training's binary_logloss: 0.425055 valid_1's binary_logloss: 0.488311 valid_1's binary_logloss: 0.488311 [9] training's binary_logloss: 0.405125 training's binary_logloss: 0.405125 valid_1's binary_logloss: 0.474664 valid_1's binary_logloss: 0.474664 [10] training's binary_logloss: 0.386692 training's binary_logloss: 0.386692 valid_1's binary_logloss: 0.462832 valid_1's binary_logloss: 0.462832 [11] training's binary_logloss: 0.368104 training's binary_logloss: 0.368104 valid_1's binary_logloss: 0.449734 valid_1's binary_logloss: 0.449734 [12] training's binary_logloss: 0.351335 training's binary_logloss: 0.351335 valid_1's binary_logloss: 0.439108 valid_1's binary_logloss: 0.439108 [13] training's binary_logloss: 0.336106 training's binary_logloss: 0.336106 valid_1's binary_logloss: 0.428432 valid_1's binary_logloss: 0.428432 [14] training's binary_logloss: 0.320995 training's binary_logloss: 0.320995 valid_1's binary_logloss: 0.417674 valid_1's binary_logloss: 0.417674 [15] training's binary_logloss: 0.30764 training's binary_logloss: 0.30764 valid_1's binary_logloss: 0.408427 valid_1's binary_logloss: 0.408427 [16] training's binary_logloss: 0.29427 training's binary_logloss: 0.29427 valid_1's binary_logloss: 0.399502 valid_1's binary_logloss: 0.399502 [17] training's binary_logloss: 0.282032 training's binary_logloss: 0.282032 valid_1's binary_logloss: 0.390328 valid_1's binary_logloss: 0.390328 [18] training's binary_logloss: 0.270264 training's binary_logloss: 0.270264 valid_1's binary_logloss: 0.382869 valid_1's binary_logloss: 0.382869 [19] training's binary_logloss: 0.259865 training's binary_logloss: 0.259865 valid_1's binary_logloss: 0.376094 valid_1's binary_logloss: 0.376094 [20] training's binary_logloss: 0.249393 training's binary_logloss: 0.249393 valid_1's binary_logloss: 0.368958 valid_1's binary_logloss: 0.368958 [21] training's binary_logloss: 0.239551 training's binary_logloss: 0.239551 valid_1's binary_logloss: 0.362996 valid_1's binary_logloss: 0.362996 [22] training's binary_logloss: 0.230561 training's binary_logloss: 0.230561 valid_1's binary_logloss: 0.356303 valid_1's binary_logloss: 0.356303 [23] training's binary_logloss: 0.221717 training's binary_logloss: 0.221717 valid_1's binary_logloss: 0.350425 valid_1's binary_logloss: 0.350425 [24] training's binary_logloss: 0.212816 training's binary_logloss: 0.212816 valid_1's binary_logloss: 0.344838 valid_1's binary_logloss: 0.344838 [25] training's binary_logloss: 0.204907 training's binary_logloss: 0.204907 valid_1's binary_logloss: 0.339484 valid_1's binary_logloss: 0.339484 [26] training's binary_logloss: 0.197838 training's binary_logloss: 0.197838 valid_1's binary_logloss: 0.334541 valid_1's binary_logloss: 0.334541 [27] training's binary_logloss: 0.190576 training's binary_logloss: 0.190576 valid_1's binary_logloss: 0.330859 valid_1's binary_logloss: 0.330859 [28] training's binary_logloss: 0.183529 training's binary_logloss: 0.183529 valid_1's binary_logloss: 0.326574 valid_1's binary_logloss: 0.326574 [29] training's binary_logloss: 0.177072 training's binary_logloss: 0.177072 valid_1's binary_logloss: 0.323247 valid_1's binary_logloss: 0.323247 [30] training's binary_logloss: 0.170475 training's binary_logloss: 0.170475 valid_1's binary_logloss: 0.318499 valid_1's binary_logloss: 0.318499 [31] training's binary_logloss: 0.164027 training's binary_logloss: 0.164027 valid_1's binary_logloss: 0.314283 valid_1's binary_logloss: 0.314283 [32] training's binary_logloss: 0.157348 training's binary_logloss: 0.157348 valid_1's binary_logloss: 0.308474 valid_1's binary_logloss: 0.308474 [33] training's binary_logloss: 0.151619 training's binary_logloss: 0.151619 valid_1's binary_logloss: 0.305302 valid_1's binary_logloss: 0.305302 [34] training's binary_logloss: 0.146629 training's binary_logloss: 0.146629 valid_1's binary_logloss: 0.300947 valid_1's binary_logloss: 0.300947 [35] training's binary_logloss: 0.140405 training's binary_logloss: 0.140405 valid_1's binary_logloss: 0.295637 valid_1's binary_logloss: 0.295637 [36] training's binary_logloss: 0.135479 training's binary_logloss: 0.135479 valid_1's binary_logloss: 0.292359 valid_1's binary_logloss: 0.292359 [37] training's binary_logloss: 0.130699 training's binary_logloss: 0.130699 valid_1's binary_logloss: 0.290556 valid_1's binary_logloss: 0.290556 [38] training's binary_logloss: 0.125331 training's binary_logloss: 0.125331 valid_1's binary_logloss: 0.286204 valid_1's binary_logloss: 0.286204 [39] training's binary_logloss: 0.120528 training's binary_logloss: 0.120528 valid_1's binary_logloss: 0.283481 valid_1's binary_logloss: 0.283481 [40] training's binary_logloss: 0.116025 training's binary_logloss: 0.116025 valid_1's binary_logloss: 0.28006 valid_1's binary_logloss: 0.28006 [41] training's binary_logloss: 0.111562 training's binary_logloss: 0.111562 valid_1's binary_logloss: 0.277637 valid_1's binary_logloss: 0.277637 [42] training's binary_logloss: 0.1079 training's binary_logloss: 0.1079 valid_1's binary_logloss: 0.274847 valid_1's binary_logloss: 0.274847 [43] training's binary_logloss: 0.103859 training's binary_logloss: 0.103859 valid_1's binary_logloss: 0.272859 valid_1's binary_logloss: 0.272859 [44] training's binary_logloss: 0.100138 training's binary_logloss: 0.100138 valid_1's binary_logloss: 0.270262 valid_1's binary_logloss: 0.270262 [45] training's binary_logloss: 0.0963746 training's binary_logloss: 0.0963746 valid_1's binary_logloss: 0.268073 valid_1's binary_logloss: 0.268073 [46] training's binary_logloss: 0.0929572 training's binary_logloss: 0.0929572 valid_1's binary_logloss: 0.26671 valid_1's binary_logloss: 0.26671 [47] training's binary_logloss: 0.0899324 training's binary_logloss: 0.0899324 valid_1's binary_logloss: 0.263785 valid_1's binary_logloss: 0.263785 [48] training's binary_logloss: 0.0873289 training's binary_logloss: 0.0873289 valid_1's binary_logloss: 0.263664 valid_1's binary_logloss: 0.263664 [49] training's binary_logloss: 0.0840183 training's binary_logloss: 0.0840183 valid_1's binary_logloss: 0.262173 valid_1's binary_logloss: 0.262173 [50] training's binary_logloss: 0.080903 training's binary_logloss: 0.080903 valid_1's binary_logloss: 0.260928 valid_1's binary_logloss: 0.260928 [51] training's binary_logloss: 0.0783652 training's binary_logloss: 0.0783652 valid_1's binary_logloss: 0.258393 valid_1's binary_logloss: 0.258393 [52] training's binary_logloss: 0.075536 training's binary_logloss: 0.075536 valid_1's binary_logloss: 0.257425 valid_1's binary_logloss: 0.257425 [53] training's binary_logloss: 0.0728163 training's binary_logloss: 0.0728163 valid_1's binary_logloss: 0.259037 valid_1's binary_logloss: 0.259037 [54] training's binary_logloss: 0.0702634 training's binary_logloss: 0.0702634 valid_1's binary_logloss: 0.258369 valid_1's binary_logloss: 0.258369 [55] training's binary_logloss: 0.0676698 training's binary_logloss: 0.0676698 valid_1's binary_logloss: 0.258996 valid_1's binary_logloss: 0.258996 [56] training's binary_logloss: 0.0654293 training's binary_logloss: 0.0654293 valid_1's binary_logloss: 0.259331 valid_1's binary_logloss: 0.259331 [57] training's binary_logloss: 0.0632096 training's binary_logloss: 0.0632096 valid_1's binary_logloss: 0.259006 valid_1's binary_logloss: 0.259006 [58] training's binary_logloss: 0.0612077 training's binary_logloss: 0.0612077 valid_1's binary_logloss: 0.259286 valid_1's binary_logloss: 0.259286 [59] training's binary_logloss: 0.0591513 training's binary_logloss: 0.0591513 valid_1's binary_logloss: 0.257202 valid_1's binary_logloss: 0.257202 [60] training's binary_logloss: 0.0572088 training's binary_logloss: 0.0572088 valid_1's binary_logloss: 0.257168 valid_1's binary_logloss: 0.257168 [61] training's binary_logloss: 0.0551987 training's binary_logloss: 0.0551987 valid_1's binary_logloss: 0.258305 valid_1's binary_logloss: 0.258305 [62] training's binary_logloss: 0.0534423 training's binary_logloss: 0.0534423 valid_1's binary_logloss: 0.258397 valid_1's binary_logloss: 0.258397 [63] training's binary_logloss: 0.0516793 training's binary_logloss: 0.0516793 valid_1's binary_logloss: 0.2596 valid_1's binary_logloss: 0.2596 [64] training's binary_logloss: 0.0500622 training's binary_logloss: 0.0500622 valid_1's binary_logloss: 0.26049 valid_1's binary_logloss: 0.26049 [65] training's binary_logloss: 0.0485234 training's binary_logloss: 0.0485234 valid_1's binary_logloss: 0.260826 valid_1's binary_logloss: 0.260826 [66] training's binary_logloss: 0.0470319 training's binary_logloss: 0.0470319 valid_1's binary_logloss: 0.260881 valid_1's binary_logloss: 0.260881 [67] training's binary_logloss: 0.0456247 training's binary_logloss: 0.0456247 valid_1's binary_logloss: 0.260966 valid_1's binary_logloss: 0.260966 [68] training's binary_logloss: 0.0441293 training's binary_logloss: 0.0441293 valid_1's binary_logloss: 0.262592 valid_1's binary_logloss: 0.262592 [69] training's binary_logloss: 0.0428474 training's binary_logloss: 0.0428474 valid_1's binary_logloss: 0.262805 valid_1's binary_logloss: 0.262805 [70] training's binary_logloss: 0.0413116 training's binary_logloss: 0.0413116 valid_1's binary_logloss: 0.265159 valid_1's binary_logloss: 0.265159 [71] training's binary_logloss: 0.0398938 training's binary_logloss: 0.0398938 valid_1's binary_logloss: 0.263591 valid_1's binary_logloss: 0.263591 [72] training's binary_logloss: 0.0385006 training's binary_logloss: 0.0385006 valid_1's binary_logloss: 0.266054 valid_1's binary_logloss: 0.266054 [73] training's binary_logloss: 0.0373165 training's binary_logloss: 0.0373165 valid_1's binary_logloss: 0.264878 valid_1's binary_logloss: 0.264878 [74] training's binary_logloss: 0.0360691 training's binary_logloss: 0.0360691 valid_1's binary_logloss: 0.263518 valid_1's binary_logloss: 0.263518 [75] training's binary_logloss: 0.0347598 training's binary_logloss: 0.0347598 valid_1's binary_logloss: 0.263714 valid_1's binary_logloss: 0.263714 [76] training's binary_logloss: 0.0334923 training's binary_logloss: 0.0334923 valid_1's binary_logloss: 0.261344 valid_1's binary_logloss: 0.261344 [77] training's binary_logloss: 0.0323656 training's binary_logloss: 0.0323656 valid_1's binary_logloss: 0.263967 valid_1's binary_logloss: 0.263967 [78] training's binary_logloss: 0.0312561 training's binary_logloss: 0.0312561 valid_1's binary_logloss: 0.26418 valid_1's binary_logloss: 0.26418 [79] training's binary_logloss: 0.0302011 training's binary_logloss: 0.0302011 valid_1's binary_logloss: 0.266129 valid_1's binary_logloss: 0.266129 [80] training's binary_logloss: 0.0291182 training's binary_logloss: 0.0291182 valid_1's binary_logloss: 0.263931 valid_1's binary_logloss: 0.263931 [81] training's binary_logloss: 0.0281474 training's binary_logloss: 0.0281474 valid_1's binary_logloss: 0.264391 valid_1's binary_logloss: 0.264391 [82] training's binary_logloss: 0.0272388 training's binary_logloss: 0.0272388 valid_1's binary_logloss: 0.267166 valid_1's binary_logloss: 0.267166 [83] training's binary_logloss: 0.0262836 training's binary_logloss: 0.0262836 valid_1's binary_logloss: 0.265115 valid_1's binary_logloss: 0.265115 [84] training's binary_logloss: 0.0254321 training's binary_logloss: 0.0254321 valid_1's binary_logloss: 0.267202 valid_1's binary_logloss: 0.267202 [85] training's binary_logloss: 0.0245613 training's binary_logloss: 0.0245613 valid_1's binary_logloss: 0.265271 valid_1's binary_logloss: 0.265271 [86] training's binary_logloss: 0.0237611 training's binary_logloss: 0.0237611 valid_1's binary_logloss: 0.265977 valid_1's binary_logloss: 0.265977 [87] training's binary_logloss: 0.0230229 training's binary_logloss: 0.0230229 valid_1's binary_logloss: 0.268838 valid_1's binary_logloss: 0.268838 [88] training's binary_logloss: 0.022253 training's binary_logloss: 0.022253 valid_1's binary_logloss: 0.267025 valid_1's binary_logloss: 0.267025 [89] training's binary_logloss: 0.0215231 training's binary_logloss: 0.0215231 valid_1's binary_logloss: 0.265913 valid_1's binary_logloss: 0.265913 [90] training's binary_logloss: 0.0208833 training's binary_logloss: 0.0208833 valid_1's binary_logloss: 0.265612 valid_1's binary_logloss: 0.265612 [91] training's binary_logloss: 0.0202353 training's binary_logloss: 0.0202353 valid_1's binary_logloss: 0.267831 valid_1's binary_logloss: 0.267831 [92] training's binary_logloss: 0.0195726 training's binary_logloss: 0.0195726 valid_1's binary_logloss: 0.266689 valid_1's binary_logloss: 0.266689 [93] training's binary_logloss: 0.0189295 training's binary_logloss: 0.0189295 valid_1's binary_logloss: 0.267654 valid_1's binary_logloss: 0.267654 [94] training's binary_logloss: 0.0182642 training's binary_logloss: 0.0182642 valid_1's binary_logloss: 0.268718 valid_1's binary_logloss: 0.268718 [95] training's binary_logloss: 0.0176871 training's binary_logloss: 0.0176871 valid_1's binary_logloss: 0.270958 valid_1's binary_logloss: 0.270958 [96] training's binary_logloss: 0.0171794 training's binary_logloss: 0.0171794 valid_1's binary_logloss: 0.27016 valid_1's binary_logloss: 0.27016 [97] training's binary_logloss: 0.016589 training's binary_logloss: 0.016589 valid_1's binary_logloss: 0.271488 valid_1's binary_logloss: 0.271488 [98] training's binary_logloss: 0.0160635 training's binary_logloss: 0.0160635 valid_1's binary_logloss: 0.269985 valid_1's binary_logloss: 0.269985 [99] training's binary_logloss: 0.0155631 training's binary_logloss: 0.0155631 valid_1's binary_logloss: 0.271251 valid_1's binary_logloss: 0.271251 [100] training's binary_logloss: 0.0150636 training's binary_logloss: 0.0150636 valid_1's binary_logloss: 0.271782 valid_1's binary_logloss: 0.271782 [101] training's binary_logloss: 0.0145839 training's binary_logloss: 0.0145839 valid_1's binary_logloss: 0.271016 valid_1's binary_logloss: 0.271016 [102] training's binary_logloss: 0.0141803 training's binary_logloss: 0.0141803 valid_1's binary_logloss: 0.271182 valid_1's binary_logloss: 0.271182 [103] training's binary_logloss: 0.0136586 training's binary_logloss: 0.0136586 valid_1's binary_logloss: 0.271901 valid_1's binary_logloss: 0.271901 [104] training's binary_logloss: 0.0131463 training's binary_logloss: 0.0131463 valid_1's binary_logloss: 0.271226 valid_1's binary_logloss: 0.271226 [105] training's binary_logloss: 0.0127756 training's binary_logloss: 0.0127756 valid_1's binary_logloss: 0.271724 valid_1's binary_logloss: 0.271724 [106] training's binary_logloss: 0.0123527 training's binary_logloss: 0.0123527 valid_1's binary_logloss: 0.272895 valid_1's binary_logloss: 0.272895 [107] training's binary_logloss: 0.0119772 training's binary_logloss: 0.0119772 valid_1's binary_logloss: 0.271722 valid_1's binary_logloss: 0.271722 [108] training's binary_logloss: 0.0115223 training's binary_logloss: 0.0115223 valid_1's binary_logloss: 0.272213 valid_1's binary_logloss: 0.272213 [109] training's binary_logloss: 0.0111636 training's binary_logloss: 0.0111636 valid_1's binary_logloss: 0.273245 valid_1's binary_logloss: 0.273245 [110] training's binary_logloss: 0.0108692 training's binary_logloss: 0.0108692 valid_1's binary_logloss: 0.273132 valid_1's binary_logloss: 0.273132 Early stopping, best iteration is: [60] training's binary_logloss: 0.0572088 training's binary_logloss: 0.0572088 valid_1's binary_logloss: 0.257168 valid_1's binary_logloss: 0.257168 오차 행렬 [[34 3] [ 2 75]] 정확도: 0.9561, 정밀도: 0.9615, 재현율: 0.9740, F1: 0.9677, AUC:0.9888
In [ ]:
# plot_importance()를 이용하여 feature 중요도 시각화
from lightgbm import plot_importance
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(lgbm_clf, ax=ax)
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7757c96910>