분류 자의 성능을 시각화하기 위해 혼동 행렬을 플로팅하고 싶지만 레이블 자체가 아닌 레이블 수만 표시합니다.
from sklearn.metrics import confusion_matrix
import pylab as pl
y_test=['business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business']
pred=array(['health', 'business', 'business', 'business', 'business',
'business', 'health', 'health', 'business', 'business', 'business',
'business', 'business', 'business', 'business', 'business',
'health', 'health', 'business', 'health'],
dtype='|S8')
cm = confusion_matrix(y_test, pred)
pl.matshow(cm)
pl.title('Confusion matrix of the classifier')
pl.colorbar()
pl.show()
혼동 매트릭스에 레이블 (건강, 비즈니스 .. 등)을 추가하려면 어떻게해야합니까?
답변
에서 암시 된 바와 같이 이 질문 , 당신은 “열기”를에있는 낮은 수준의 아티스트 API 그림을 저장하여, 및 축이합니다 (호출하기 matplotlib 기능에 의해 전달 객체 fig
, ax
그리고 cax
아래 변수). 그런 다음 set_xticklabels
/를 사용하여 기본 x 축 및 y 축 눈금을 바꿀 수 있습니다 set_yticklabels
.
from sklearn.metrics import confusion_matrix
labels = ['business', 'health']
cm = confusion_matrix(y_test, pred, labels)
print(cm)
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm)
plt.title('Confusion matrix of the classifier')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
labels
목록을 confusion_matrix
함수에 전달 하여 틱과 일치하도록 올바르게 정렬되었는지 확인했습니다.
결과는 다음과 같습니다.
답변
최신 정보:
scikit-learn 0.22에는 혼동 행렬을 직접 플로팅하는 새로운 기능이 있습니다.
문서 참조 : sklearn.metrics.plot_confusion_matrix
이전 답변 :
seaborn.heatmap
여기서 의 사용을 언급 할 가치가 있다고 생각합니다 .
import seaborn as sns
import matplotlib.pyplot as plt
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax); #annot=True to annotate cells
# labels, title and ticks
ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels');
ax.set_title('Confusion Matrix');
ax.xaxis.set_ticklabels(['business', 'health']); ax.yaxis.set_ticklabels(['health', 'business']);
답변
에서 생성 된 혼동 행렬을 그릴 수있는 함수를 찾았습니다 sklearn
.
import numpy as np
def plot_confusion_matrix(cm,
target_names,
title='Confusion matrix',
cmap=None,
normalize=True):
"""
given a sklearn confusion matrix (cm), make a nice plot
Arguments
---------
cm: confusion matrix from sklearn.metrics.confusion_matrix
target_names: given classification classes such as [0, 1, 2]
the class names, for example: ['high', 'medium', 'low']
title: the text to display at the top of the matrix
cmap: the gradient of the values displayed from matplotlib.pyplot.cm
see http://matplotlib.org/examples/color/colormaps_reference.html
plt.get_cmap('jet') or plt.cm.Blues
normalize: If False, plot the raw numbers
If True, plot the proportions
Usage
-----
plot_confusion_matrix(cm = cm, # confusion matrix created by
# sklearn.metrics.confusion_matrix
normalize = True, # show proportions
target_names = y_labels_vals, # list of names of the classes
title = best_estimator_name) # title of graph
Citiation
---------
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
"""
import matplotlib.pyplot as plt
import numpy as np
import itertools
accuracy = np.trace(cm) / np.sum(cm).astype('float')
misclass = 1 - accuracy
if cmap is None:
cmap = plt.get_cmap('Blues')
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
if target_names is not None:
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names, rotation=45)
plt.yticks(tick_marks, target_names)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 1.5 if normalize else cm.max() / 2
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
if normalize:
plt.text(j, i, "{:0.4f}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
else:
plt.text(j, i, "{:,}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
plt.show()
답변
https://github.com/pandas-ml/pandas-ml/에 관심이있을 수 있습니다.
Confusion Matrix의 Python Pandas 구현을 구현합니다.
일부 기능 :
- 혼동 행렬 플로팅
- 정규화 된 정오 분류 표 플로팅
- 수업 통계
- 전반적인 통계
다음은 그 예입니다.
In [1]: from pandas_ml import ConfusionMatrix
In [2]: import matplotlib.pyplot as plt
In [3]: y_test = ['business', 'business', 'business', 'business', 'business',
'business', 'business', 'business', 'business', 'business',
'business', 'business', 'business', 'business', 'business',
'business', 'business', 'business', 'business', 'business']
In [4]: y_pred = ['health', 'business', 'business', 'business', 'business',
'business', 'health', 'health', 'business', 'business', 'business',
'business', 'business', 'business', 'business', 'business',
'health', 'health', 'business', 'health']
In [5]: cm = ConfusionMatrix(y_test, y_pred)
In [6]: cm
Out[6]:
Predicted business health __all__
Actual
business 14 6 20
health 0 0 0
__all__ 14 6 20
In [7]: cm.plot()
Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x1093cf9b0>
In [8]: plt.show()
In [9]: cm.print_stats()
Confusion Matrix:
Predicted business health __all__
Actual
business 14 6 20
health 0 0 0
__all__ 14 6 20
Overall Statistics:
Accuracy: 0.7
95% CI: (0.45721081772371086, 0.88106840959427235)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.608009812201
Kappa: 0.0
Mcnemar's Test P-Value: ToDo
Class Statistics:
Classes business health
Population 20 20
P: Condition positive 20 0
N: Condition negative 0 20
Test outcome positive 14 6
Test outcome negative 6 14
TP: True Positive 14 0
TN: True Negative 0 14
FP: False Positive 0 6
FN: False Negative 6 0
TPR: (Sensitivity, hit rate, recall) 0.7 NaN
TNR=SPC: (Specificity) NaN 0.7
PPV: Pos Pred Value (Precision) 1 0
NPV: Neg Pred Value 0 1
FPR: False-out NaN 0.3
FDR: False Discovery Rate 0 1
FNR: Miss Rate 0.3 NaN
ACC: Accuracy 0.7 0.7
F1 score 0.8235294 0
MCC: Matthews correlation coefficient NaN NaN
Informedness NaN NaN
Markedness 0 0
Prevalence 1 0
LR+: Positive likelihood ratio NaN NaN
LR-: Negative likelihood ratio NaN NaN
DOR: Diagnostic odds ratio NaN NaN
FOR: False omission rate 1 0
답변
from sklearn import model_selection
test_size = 0.33
seed = 7
X_train, X_test, y_train, y_test = model_selection.train_test_split(feature_vectors, y, test_size=test_size, random_state=seed)
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
model = LogisticRegression()
model.fit(X_train, y_train)
result = model.score(X_test, y_test)
print("Accuracy: %.3f%%" % (result*100.0))
y_pred = model.predict(X_test)
print("F1 Score: ", f1_score(y_test, y_pred, average="macro"))
print("Precision Score: ", precision_score(y_test, y_pred, average="macro"))
print("Recall Score: ", recall_score(y_test, y_pred, average="macro"))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
def cm_analysis(y_true, y_pred, labels, ymap=None, figsize=(10,10)):
"""
Generate matrix plot of confusion matrix with pretty annotations.
The plot image is saved to disk.
args:
y_true: true label of the data, with shape (nsamples,)
y_pred: prediction of the data, with shape (nsamples,)
filename: filename of figure file to save
labels: string array, name the order of class labels in the confusion matrix.
use `clf.classes_` if using scikit-learn models.
with shape (nclass,).
ymap: dict: any -> string, length == nclass.
if not None, map the labels & ys to more understandable strings.
Caution: original y_true, y_pred and labels must align.
figsize: the size of the figure plotted.
"""
if ymap is not None:
y_pred = [ymap[yi] for yi in y_pred]
y_true = [ymap[yi] for yi in y_true]
labels = [ymap[yi] for yi in labels]
cm = confusion_matrix(y_true, y_pred, labels=labels)
cm_sum = np.sum(cm, axis=1, keepdims=True)
cm_perc = cm / cm_sum.astype(float) * 100
annot = np.empty_like(cm).astype(str)
nrows, ncols = cm.shape
for i in range(nrows):
for j in range(ncols):
c = cm[i, j]
p = cm_perc[i, j]
if i == j:
s = cm_sum[i]
annot[i, j] = '%.1f%%\n%d/%d' % (p, c, s)
elif c == 0:
annot[i, j] = ''
else:
annot[i, j] = '%.1f%%\n%d' % (p, c)
cm = pd.DataFrame(cm, index=labels, columns=labels)
cm.index.name = 'Actual'
cm.columns.name = 'Predicted'
fig, ax = plt.subplots(figsize=figsize)
sns.heatmap(cm, annot=annot, fmt='', ax=ax)
#plt.savefig(filename)
plt.show()
cm_analysis(y_test, y_pred, model.classes_, ymap=None, figsize=(10,10))
https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7 사용
답변
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
model.fit(train_x, train_y,validation_split = 0.1, epochs=50, batch_size=4)
y_pred=model.predict(test_x,batch_size=15)
cm =confusion_matrix(test_y.argmax(axis=1), y_pred.argmax(axis=1))
index = ['neutral','happy','sad']
columns = ['neutral','happy','sad']
cm_df = pd.DataFrame(cm,columns,index)
plt.figure(figsize=(10,6))
sns.heatmap(cm_df, annot=True)
답변
에 대한 @ akilat90의 업데이트에 추가하려면 sklearn.metrics.plot_confusion_matrix
:
ConfusionMatrixDisplay
내에서 sklearn.metrics
직접 클래스 를 사용할 수 있으며 분류자를 plot_confusion_matrix
. 또한 display_labels
원하는대로 플롯에 표시되는 레이블을 지정할 수 있는 인수가 있습니다.
의 생성자 ConfusionMatrixDisplay
는 플롯을 추가로 사용자 정의하는 방법을 제공하지 않지만 ax_
해당 plot()
메서드를 호출 한 후 속성을 통해 matplotlib 축 obect에 액세스 할 수 있습니다 . 이것을 보여주는 두 번째 예를 추가했습니다.
.NET을 사용하여 플롯을 생성하기 위해 대량의 데이터에 대해 분류기를 다시 실행해야하는 것이 성가신 것을 알았습니다 plot_confusion_matrix
. 예측 된 데이터에서 다른 플롯을 생성하고 있으므로 매번 재 예측하는 데 시간을 낭비하고 싶지 않습니다. 이것은 또한 그 문제에 대한 쉬운 해결책이었습니다.
예:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()
사용 예 ax_
:
cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()
cmd.ax_.set(xlabel='Predicted', ylabel='True')