programing

sklearn plot 레이블이 있는 혼동 행렬

newnotes 2023. 9. 7. 22:02

sklearn plot 레이블이 있는 혼동 행렬

분류자의 성능을 시각화하기 위해 혼동 행렬을 표시하려고 하지만 레이블 자체는 표시하지 않고 레이블 숫자만 표시합니다.

from sklearn.metrics import confusion_matrix
import pylab as pl
y_test=['business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business', 'business']

pred=array(['health', 'business', 'business', 'business', 'business',
       'business', 'health', 'health', 'business', 'business', 'business',
       'business', 'business', 'business', 'business', 'business',
       'health', 'health', 'business', 'health'], 
      dtype='|S8')

cm = confusion_matrix(y_test, pred)
pl.matshow(cm)
pl.title('Confusion matrix of the classifier')
pl.colorbar()
pl.show()

혼동 행렬에 라벨(건강, 비즈니스 등)을 추가하려면 어떻게 해야 합니까?

업데이트:

확인.

오래된 답변:

저는 여기서 사용하는 것을 언급할 가치가 있다고 생각합니다.

import seaborn as sns
import matplotlib.pyplot as plt     

ax= plt.subplot()
sns.heatmap(cm, annot=True, fmt='g', ax=ax);  #annot=True to annotate cells, ftm='g' to disable scientific notation

# labels, title and ticks
ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels'); 
ax.set_title('Confusion Matrix'); 
ax.xaxis.set_ticklabels(['business', 'health']); ax.yaxis.set_ticklabels(['health', 'business']);

이 질문에서 암시된 것처럼 하위 아티스트 API를 "열려야" 합니다. 호출하는 매트플롯리브 함수를 통과한 도형 및 축 객체를 저장합니다.fig,ax그리고.cax아래의 변수).그런 다음 기본 x축 및 y축 눈금을 바꿀 수 있습니다.set_xticklabels/set_yticklabels:

from sklearn.metrics import confusion_matrix

labels = ['business', 'health']
cm = confusion_matrix(y_test, pred, labels)
print(cm)
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm)
plt.title('Confusion matrix of the classifier')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

제가 합격했다는 것을 참고하세요.labels에 열거하다confusion_matrix틱을 일치시켜 적절히 정렬되었는지 확인하는 기능입니다.

그 결과 다음 그림이 나타납니다.

enter image description here

나는 다음으로부터 생성된 혼동 행렬을 플롯할 수 있는 함수를 찾았습니다.sklearn.

import numpy as np


def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """
    import matplotlib.pyplot as plt
    import numpy as np
    import itertools

    accuracy = np.trace(cm) / np.sum(cm).astype('float')
    misclass = 1 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=45)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")


    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
    plt.show()

이렇게 보일 겁니다.

@akilat90의 업데이트에 추가하려면 다음과sklearn.metrics.plot_confusion_matrix:

사용할 수 있습니다.ConfusionMatrixDisplay내의 계급sklearn.metrics직접적으로 그리고 분류기를 전달할 필요를 우회합니다.plot_confusion_matrix. 그것은 또한.display_labels인수: 그림에 표시되는 레이블을 원하는 대로 지정할 수 있습니다.

의 시공자.ConfusionMatrixDisplay플롯을 추가로 사용자 지정하는 방법은 제공하지 않지만 다음을 통해 매트플롯 리브 축에 액세스할 수 있습니다.ax_속성을 호출한 후 속성plot()방법.이것을 보여주는 두 번째 예를 추가했습니다.

저는 단지 줄거리를 만들기 위해 많은 양의 데이터를 통해 분류기를 다시 실행해야 하는 것이 귀찮다는 것을 알았습니다.plot_confusion_matrix. 예측된 데이터를 바탕으로 다른 그림을 만들고 있으므로 매번 다시 예측하는 데 시간을 낭비하고 싶지 않습니다.이는 또한 그 문제에 대한 쉬운 해결책이었습니다.

예:

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()

사용 예제ax_:

cm = confusion_matrix(y_true, y_preds, normalize='all')
cmd = ConfusionMatrixDisplay(cm, display_labels=['business','health'])
cmd.plot()
cmd.ax_.set(xlabel='Predicted', ylabel='True')

from sklearn import model_selection
test_size = 0.33
seed = 7
X_train, X_test, y_train, y_test = model_selection.train_test_split(feature_vectors, y, test_size=test_size, random_state=seed)

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

model = LogisticRegression()
model.fit(X_train, y_train)
result = model.score(X_test, y_test)
print("Accuracy: %.3f%%" % (result*100.0))
y_pred = model.predict(X_test)
print("F1 Score: ", f1_score(y_test, y_pred, average="macro"))
print("Precision Score: ", precision_score(y_test, y_pred, average="macro"))
print("Recall Score: ", recall_score(y_test, y_pred, average="macro")) 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

def cm_analysis(y_true, y_pred, labels, ymap=None, figsize=(10,10)):
    """
    Generate matrix plot of confusion matrix with pretty annotations.
    The plot image is saved to disk.
    args: 
      y_true:    true label of the data, with shape (nsamples,)
      y_pred:    prediction of the data, with shape (nsamples,)
      filename:  filename of figure file to save
      labels:    string array, name the order of class labels in the confusion matrix.
                 use `clf.classes_` if using scikit-learn models.
                 with shape (nclass,).
      ymap:      dict: any -> string, length == nclass.
                 if not None, map the labels & ys to more understandable strings.
                 Caution: original y_true, y_pred and labels must align.
      figsize:   the size of the figure plotted.
    """
    if ymap is not None:
        # change category codes or labels to new labels 
        y_pred = [ymap[yi] for yi in y_pred]
        y_true = [ymap[yi] for yi in y_true]
        labels = [ymap[yi] for yi in labels]
    # calculate a confusion matrix with the new labels
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    # calculate row sums (for calculating % & plot annotations)
    cm_sum = np.sum(cm, axis=1, keepdims=True)
    # calculate proportions
    cm_perc = cm / cm_sum.astype(float) * 100
    # empty array for holding annotations for each cell in the heatmap
    annot = np.empty_like(cm).astype(str)
    # get the dimensions
    nrows, ncols = cm.shape
    # cycle over cells and create annotations for each cell
    for i in range(nrows):
        for j in range(ncols):
            # get the count for the cell
            c = cm[i, j]
            # get the percentage for the cell
            p = cm_perc[i, j]
            if i == j:
                s = cm_sum[i]
                # convert the proportion, count, and row sum to a string with pretty formatting
                annot[i, j] = '%.1f%%\n%d/%d' % (p, c, s)
            elif c == 0:
                annot[i, j] = ''
            else:
                annot[i, j] = '%.1f%%\n%d' % (p, c)
    # convert the array to a dataframe. To plot by proportion instead of number, use cm_perc in the DataFrame instead of cm
    cm = pd.DataFrame(cm, index=labels, columns=labels)
    cm.index.name = 'Actual'
    cm.columns.name = 'Predicted'
    # create empty figure with a specified size
    fig, ax = plt.subplots(figsize=figsize)
    # plot the data using the Pandas dataframe. To change the color map, add cmap=..., e.g. cmap = 'rocket_r'
    sns.heatmap(cm, annot=annot, fmt='', ax=ax)
    #plt.savefig(filename)
    plt.show()

cm_analysis(y_test, y_pred, model.classes_, ymap=None, figsize=(10,10))

https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7 를 사용하여

다음을 사용하는 경우rocket_r그것은 색을 뒤집을 것이고 어떻게든 아래와 같이 더 자연스럽고 더 좋아 보입니다.

https://github.com/pandas-ml/pandas-ml/ 에 관심이 있을 수도 있습니다.

이것은 Python Pandas의 Confusion Matrix 구현을 구현합니다.

일부 기능:

플롯 혼동 행렬
그림 정규화된 혼동 행렬
학급 통계
전체 통계

다음은 예입니다.

In [1]: from pandas_ml import ConfusionMatrix
In [2]: import matplotlib.pyplot as plt

In [3]: y_test = ['business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business',
        'business', 'business', 'business', 'business', 'business']

In [4]: y_pred = ['health', 'business', 'business', 'business', 'business',
       'business', 'health', 'health', 'business', 'business', 'business',
       'business', 'business', 'business', 'business', 'business',
       'health', 'health', 'business', 'health']

In [5]: cm = ConfusionMatrix(y_test, y_pred)

In [6]: cm
Out[6]:
Predicted  business  health  __all__
Actual
business         14       6       20
health            0       0        0
__all__          14       6       20

In [7]: cm.plot()
Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x1093cf9b0>

In [8]: plt.show()

In [9]: cm.print_stats()
Confusion Matrix:

Predicted  business  health  __all__
Actual
business         14       6       20
health            0       0        0
__all__          14       6       20


Overall Statistics:

Accuracy: 0.7
95% CI: (0.45721081772371086, 0.88106840959427235)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.608009812201
Kappa: 0.0
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                 business health
Population                                    20     20
P: Condition positive                         20      0
N: Condition negative                          0     20
Test outcome positive                         14      6
Test outcome negative                          6     14
TP: True Positive                             14      0
TN: True Negative                              0     14
FP: False Positive                             0      6
FN: False Negative                             6      0
TPR: (Sensitivity, hit rate, recall)         0.7    NaN
TNR=SPC: (Specificity)                       NaN    0.7
PPV: Pos Pred Value (Precision)                1      0
NPV: Neg Pred Value                            0      1
FPR: False-out                               NaN    0.3
FDR: False Discovery Rate                      0      1
FNR: Miss Rate                               0.3    NaN
ACC: Accuracy                                0.7    0.7
F1 score                               0.8235294      0
MCC: Matthews correlation coefficient        NaN    NaN
Informedness                                 NaN    NaN
Markedness                                     0      0
Prevalence                                     1      0
LR+: Positive likelihood ratio               NaN    NaN
LR-: Negative likelihood ratio               NaN    NaN
DOR: Diagnostic odds ratio                   NaN    NaN
FOR: False omission rate                       1      0

    from sklearn.metrics import confusion_matrix
    import seaborn as sns
    import matplotlib.pyplot as plt
    model.fit(train_x, train_y,validation_split = 0.1, epochs=50, batch_size=4)
    y_pred=model.predict(test_x,batch_size=15)
    cm =confusion_matrix(test_y.argmax(axis=1), y_pred.argmax(axis=1))  
    index = ['neutral','happy','sad']  
    columns = ['neutral','happy','sad']  
    cm_df = pd.DataFrame(cm,columns,index)                      
    plt.figure(figsize=(10,6))  
    sns.heatmap(cm_df, annot=True)

를 사용하여 이 작업을 수행하는 매우 쉬운 방법이 있습니다.ConfusionMatrixDisplay합니다. 합니다.display_labels을데할수다는다수rt을shn할의yeod데하는

import numpy as np
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
np.random.seed(0)
y_true = np.random.randint(0,3, 100)
y_pred = np.random.randint(0,3, 100)

labels = ['cat', 'dog', 'rat']

cm = confusion_matrix(y_true, y_pred)
ConfusionMatrixDisplay(cm, display_labels=labels).plot()
#plt.savefig("Confusion_Matrix.png")

출력:

참조: 혼동 매트릭스 디스플레이

편집 1:

X축 레이블을 수직 위치로 변경하는 방법(클래스 레이블이 그림에서 겹칠 경우 필요)과 예측에서 직접 그림을 표시하는 방법.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
np.random.seed(0)

n = 10
y_true = np.random.randint(0,n, 100)
y_pred = np.random.randint(0,n, 100)

labels = [f'class_{i+1}' for i in range(n)]

fig, ax = plt.subplots(figsize=(15, 15))
ConfusionMatrixDisplay.from_predictions(
    y_true, y_pred, display_labels=labels, xticks_rotation="vertical",
    ax=ax, colorbar=False, cmap="plasma")

출력:

주어진 모형, validx, validy.다른 답변들로부터 많은 도움을 받아, 이것이 제 요구에 맞는 것입니다.

스클렌드, 스클렌드의.plot_lot_lot_lot

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(26,26))
sklearn.metrics.plot_confusion_matrix(model, validx, validy, ax=ax, cmap=plt.cm.Blues)
ax.set(xlabel='Predicted', ylabel='Actual', title='Confusion Matrix Actual vs Predicted')

classifier = svm.SVC(kernel="linear", C=0.01).fit(X_train, y_train)
disp = ConfusionMatrixDisplay.from_estimator(
       classifier,
       X_test,
       y_test,
       display_labels=class_names,
       cmap=plt.cm.Blues,`enter code here`
       normalize=normalize,
)
    
disp.ax_.set_title(title) # this line is your answer
    
plt.show()

언급URL : https://stackoverflow.com/questions/19233771/sklearn-plot-confusion-matrix-with-labels

'programing' 카테고리의 다른 글

팬더 데이터 프레임에서 없음을 NaN으로 바꿉니다. (0)	2023.09.07
jQuery.each() index? (0)	2023.09.07
정의되지 않은 간격띄우기를 방지하는 방법 (0)	2023.09.07
함수가 받는 키워드 인수를 나열할 수 있습니까? (0)	2023.09.07
jQuery로 iframe의 컨텐츠에 액세스하는 방법은 무엇입니까? (0)	2023.09.07

현재글sklearn plot 레이블이 있는 혼동 행렬

각종 프로그래밍 정보를 다루는 블로그입니다.

azure, MongoDB, MariaDB, JQuery, Android, WPF, spring, bash, CSS, Ajax, ios, AngularJS, spring-boot, JSON, TypeScript, Wordpress, PowerShell, excel, oracle, ReactJS,

Today :
Yesterday :

newnote

sklearn plot 레이블이 있는 혼동 행렬

sklearn plot 레이블이 있는 혼동 행렬

편집 1:

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

« 2026/07 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

sklearn plot 레이블이 있는 혼동 행렬

sklearn plot 레이블이 있는 혼동 행렬

편집 1:

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바