Shrout, P. E., and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing the reliability of advisors. Psychol. Bull. 86, 420-428. doi: 10.1037/0033-2909.86.2.420 The concept of agreement (or consensus) refers to the degree to which evaluations are identical (see vet et al.`s detailed tables, 2006; Shoukri, 2010) often described with the proportion of evaluation pairs identical to diverged (Kottner et al., 2011). However, in determining whether two evaluations differ statistically from each other, it is necessary to take into account the psychometric aspects of the tool used, such as reliability. B reliability (for example, reliability. B tests or intraclass correlations as a measure of reliability between rates).

The general characteristics of the evaluation scale. B, such as the existence or absence of valid evaluation categories (Jonsson and Svingby, 2007) and the number of points (and therefore decisions) including a score, will have a direct impact on the likelihood of absolute agreement. For example, the more elements in a scale that contains a gross value, the less likely it is to have absolute agreement on the points. Therefore, two gross values or two standardized values (z.B. T-scores) that differ in absolute terms are not necessarily statistically different. A (absolute) difference may be too small to reflect a systematic discrepancy in the distribution of partitions. The extent of non-systemic errors should therefore be taken into account before deciding on the shares of the agreement. Unfortunately, many studies that attempt to assess the agreement between councils completely ignore the distinction between absolute and statistically reliable differences and do not use standardized values (. B for example, Bishop and Baird, 2001; Bishop et al., 2006; Gudmundsson and Gretarsson, 2009). In the area of language learning, for example, direct comparison of raw notes seems to be the norm rather than the exception, despite long lists that include vocabulary assessment tools (for example. B Marchman and Martinez-Sussmann, 2002; Norbury et al., 2004).

This report uses a set of concrete data to demonstrate how a complete assessment of reliability between credit rating agencies, the agreement between the rating agencies (concordataire) and the linear correlation between credit ratings can be achieved and notified. On the basis of this example, we want to divert often confusing evaluation aspects and thus contribute to improving the comparability of future rating analyses.