Table 3

Inter-rater agreement (intraclass correlation coefficients) on overall mean score and individual item scores

Item	Intraclass correlation		For the mean score of an item on an A3 (mean of 12 raters)
Item	Coefficient	95% CI	Mean across 6 A3s*	SD across 6 A3s
Overall assessment of A3s (mean of 23 item scores†)	0.89	0.75 to 0.98	2.1	0.51
Individual items
Background Why is the problem important?
1. Negative consequences (eg, harm, frustration, waste): how specific is the clearest statement of a negative consequence of the problem?	0.32	0.11 to 0.77	2.7	0.37
2. Individuals/Groups impacted by the negative consequences (eg, harm, frustration, waste): how specific is the clearest statement identifying an impacted individual, group/unit or organisation?	0.44	0.19 to 0.84	2.5	0.61
3. Severity of the negative consequences (eg, harm, frustration, waste): how specific is the clearest statement of the severity (eg, extent/amount) of at least one negative consequence?	0.71	0.45 to 0.94	2.3	0.82
4. Frequency of the negative consequences (eg, harm, frustration, waste): how specific is the clearest statement of the frequency (# events/unit of time) of at least one negative consequence?	0.68	0.41 to 0.93	1.8	1.01
Current situation What is actually happening?
5. Current level of performance	0.71	0.46 to 0.94	1.8	0.90
6. How is work done (process/workflow)?	0.72	0.47 to 0.94	1.8	1.07
7. Clear identification of who is involved in performing the work?	0.71	0.45 to 0.94	1.5	1.01
8. Performance problem/gap?	0.58	0.31 to 0.90	1.8	0.90
Goal What target condition or specific performance is desired? By when?
9. How specific is the goal?	0.79	0.57 to 0.96	2.0	0.83
10. Is the goal measurable?	0.60	0.33 to 0.91	2.3	0.68
11. How relevant is the goal to addressing the problem?	0.10	0.0 to 0.52	2.7	0.28
12. How time-bound (clear timeframe for accomplishment) is the goal?	0.96	0.90 to 0.99	1.9	1.49
Analysis What is contributing to the problem? What are its root causes?
13. Is the display of method(s) for analysing root causes easy to understand? (eg, fishbone diagram, ‘5-whys’/root cause tree diagram, Pareto chart)	0.65	0.38 to 0.92	2.1	0.91
14. How clear are the identified root causes?	0.39	0.15 to 0.81	2.3	0.55
Countermeasures What options/alternatives were considered? What countermeasures/strategies are proposed?
15. How many options for countermeasures were considered?	0.78	0.55 to 0.96	2.7	0.60
16. Identify the strongest countermeasure considered. How strong is it?	0.41	0.17 to 0.82	2.1	0.55
17. How many of the proposed countermeasures are linked to identified root causes?	0.46	0.21 to 0.85	2.0	0.85
Action plan To pilot and implement the selected countermeasures: what, who, when?
18. For the action plan on the A3, how clearly are activities described (ie, ‘what’ is to be done)?	0.60	0.33 to 0.91	2.3	0.68
19. Are individuals identified to be responsible for each action item to be carried out (ie, ‘who’)?	0.90	0.77 to 0.98	2.4	1.14
20. Are estimated completion dates identified for each action item (ie, ‘when’)?	0.97	0.93–1.0	2.5	1.18
21. Is monitoring planned for the implementation of actions (what will be monitored, by whom, when)?	0.57	0.30 to 0.89	1.3	1.06
Follow-up plans Checking whether desired goal(s) was achieved?
22. Is follow-up planned to measure achievement of the desired goal(s) (what will be measured, by whom, when)?	0.83	0.63 to 0.97	1.7	1.00
Across A3 sections
23. How clearly does the title identify the problem to be addressed?	0.56	0.29 to 0.89	2.3	0.60

Each item has response options that range from 0 to 3 on a 4-point scale. Each response option has verbal anchors appropriate for the item, for example, 0=not addressed, 1=vague, 2=somewhat specific and 3=very specific. The response anchors for each item and their illustrative descriptions and comparisons are presented in the ‘Description of Ratings’ in the online supplemental digital content.
For each of 6 problem-solving A3s, 12 raters assessed each of 23 items. This produced a total of 1656 ratings, including 12 ratings for each item on each A3, 72 ratings per item across the 6 A3s and 276 ratings per A3 across items.
*The six A3s used to assess inter-rater agreement were modified to increase the range of scores across A3s on several items. The mean scores along with their SD help indicate the extent of variation across A3s for the item. The mean scores do not necessarily reflect a representative sample of student’s scores.
†The overall assessment of an A3 is the mean of the 12 raters’ assessments for each of the 23 items on an A3 (276 ratings).