Establish a mechanism for ensuring trustworthiness among multiple coders
Establish a mechanism for ensuring trustworthiness among multiple coders
When multiple researchers work together on qualitative data analysis, establishing consistency in how they interpret and code data becomes essential. This process, known as inter-coder reliability (ICR), strengthens the credibility and trustworthiness of research findings. Let's explore key approaches to building this reliability within research teams using Dedoose, a popular qualitative analysis platform.
At its core, establishing a mechanism for ensuring trustworthiness among multiple coders – what Haslerig and Grummert (2024) referred to as “coding fidelity” (p. 191) - creates a foundation of methodological rigor in your data analysis process. Ravitch and Carl (2021) note that this process provides valuable opportunities for research teams to:
our method should always guide your processes and decisions when using a QDAS (qualitative data analysis software) tool. When the digital tool you are using to conduct your analysis heavily influences or controls your method, you may be at risk of the old idiom where “the tail is wagging the dog.” Below are three of the most common methods for establishing and evaluating reliability between coders, each developed from the most common methodological approaches to employing multiple coders in a qualitative project:
Best used: Early in the coding process and throughout your project. When your project aligns with qualitative methods, values researcher as instrument ,and multiple perspectives.
Approach: Team members independently code the same exact file, then compare their code applications and interpretations until a satisfactory level of agreement is reached. Moving forward, team members ‘audit’ eachother’s work via a ‘disagree’ code to prevent coding drift as the project progresses.
Benefits: Rooted in traditional qualitative methodology, this approach encourages rich discussion about how different researchers perceive the data, while also systematically documenting any disagreements about code applications or excerpt length. These conversations often lead to important refinements in the codebook, coding criteria, and produces analytical insights that are invaluable to building your findings.
For more approaches to implementing this method in a systematic and rigorous way, visit one of the Institute for Mixed Methods Research’s courses or arrange a capacity building session for your team.
Best used: First step in the process of coding where there may or may not be a clearly defined research purpose and focus.
Approach: Researchers independently code their designated file "clones," then review applications and interpretations
Benefits: Provides more structured "apples-to-apples" comparisons while maintaining the qualitative focus on interpretation and meaning-making
Best used: When you want to train someone else up on an existing codebook or a publishing outlet requires a Cohen’s Kappa statistic.
Approach: Create a code application or code weight test and have someone else on your team take the test. The Testing Center uses Kappa (a quantitative statistic) to calculate the level of agreement and therefore cannot take into account qualitative data properly (e.g, context, disagreements on excerpt length rather than code application, memos on excerpts, etc.).
Benefits: Offers quantitative metrics to serve as a valuable tool for training new team members on code definitions and their application.
While quantitative assessments of reliability (like those provided by Dedoose's Training Center) offer valuable metrics, they should complement rather than replace qualitative approaches to establishing agreement and trustworthiness among multiple coders. An excerpt expressing the value-added of a more qualitative approach from a book chapter my co-author and I wrote (Grummert & Haslerig, 2024, p. 190-191) is nested below:
“Fundamentally, the term “interrater reliability” is reflective of the quantitative underpinnings of the concept and its origin in developing assessment or survey items that are reliable across different raters, rather than in qualitative coding. An interrater reliability test via Cohen’s kappa is inherently limiting because it is a quantitative statistic attempting to measure a qualitative process. Many QDAS programs offer this statistical test, yet by extracting coding from its original dataset, these tests remove the data from the context in which it was coded. Further, because the tests are often run with only a limited selection of codes, this process obscures meaningful discussions related to code co-occurrence and how much context should be included in excerpts. Many of our most generative conversations during the process of establishing reliability were about code co-occurrences: Whether those co- occurrences accurately captured all relevant aspects of the excerpt or if an additional, more specified, code was needed. In the absence of the full code- book, the benefit of undertaking a collaborative, qualitative approach to intercoder agreement is potentially missed.”
Ideally, research teams should document how they are going to maintain trustworthiness and consistency among multiple coders within their methods section and/or research design. A few things to consider are:
The ultimate goal isn't simply achieving high reliability scores but developing a coding system that team members can apply consistently while capturing the complexity of the data. This requires an iterative process involving:
This methodological transparency strengthens the credibility of findings and provides important context for readers evaluating the research.
Establishing reliability among multiple coders represents more than a methodological checkbox—it's a process of building shared understanding among researchers that strengthens analysis. By employing (and documenting!) your approach, teams can develop coding systems that balance consistency with interpretive depth, ultimately producing more trustworthy and nuanced findings.