ZHAO, X., Feng, G., Liu, J. &Ke Deng. (2018). We agreed to measure compliance – redefined reliability sanctifies Krippendorffs Alpha. China Media Research, 14(2), 1-15. From repository.hkbu.edu.hk/hkbu_staff_publication/6803 A case that is sometimes considered a problem with Cohen`s kappa occurs if one compares the Kappa calculated for two pairs of evaluators with the two evaluators in each pair with the same percentage of concordance, but one pair gives a similar number of evaluations in each class, while the other pair gives a very different number of assessments in each class. [7] (In the following cases, note B has 70 votes in favour and 30 against in the first case, but these figures are reversed in the second case.) For example, in the following two cases, there is an equality of correspondence between A and B (60 out of 100 in both cases) with respect to the correspondence in each class, so we would expect the relative values of Kappa cohens to reflect this. Calculation of Kappa cohens for each: there are a series of statistics that have been used to measure the reliability of the interrater and intrarater. An incomplete list contains match percentages, Cohens Kappa (for two evaluators), Fleiss Kappa (adaptation of Cohen`s cappa for 3 or more evaluators), contingency coefficient, Pearson r and Spearman Rho, intraclass correlation coefficient, concordat correlation coefficient and Krippendorff alpha (useful if there are several evaluators and several possible evaluations).

The use of correlation coefficients such as Pearsons r may be a poor reflection of the extent of the agreement between evaluators, resulting in extreme overestimation or underestimation of the actual level of overvaluation (6). In this paper, we`ll consider just two of the most common measures, compliance percentage and Cohen`s kappa. Fleiss` Kappa (named after Joseph L. Fleiss) is a statistical measure for assessing the reliability of compliance between a fixed number of reviewers when assigning categorical ratings to a number of articles or in the classification of articles. This contrasts with other kappas, such as Cohen`s Kappa, which only work if the concordance between no more than two evaluators or intra-counselor reliability (for an expert against himself) is evaluated. The measure calculates the degree of compliance in the classification in relation to what would be expected by chance. Kappa accepts its maximum theoretical value of 1 only if the two observers distribute equal codes, i.e. if the corresponding amounts of rows and columns are identical. Everything is less than a perfect match. Nevertheless, the maximum value that kappa could reach in the case of unequal distributions makes it possible to interpret the actually conserved value of kappa. The equation for the maximum κ is:[16] As Marusteri and Bacarea (9) have found, there is never 100% certainty about the research results, even if statistical significance is reached.

Statistical results for testing hypotheses about the relationship between independent and dependent variables become insignificant if there are inconsistencies in the evaluation of variables by evaluators. . . .