Abstract
Attribute levels (dichotomous attributes and polytomous attributes) and item ideal scoring methods (0-1 scoring and polytomous scoring) constitute two important dimensions in cognitive diagnostic test design. Polytomous attribute tests can provide more detailed diagnostic information, while polytomous scoring tests can improve classification accuracy; however, existing cognitive diagnostic tests lack integrated designs for polytomous attributes and polytomous scoring. Drawing upon the concept of structured/unstructured simplest complete Q-matrix (SSCQM/USCQM) for dichotomous attributes with polytomous scoring, this paper proposes a unified simplest complete Q-matrix design methodology for cognitive diagnostic tests, addressing design challenges across various combinations of attribute levels and item ideal scoring methods. Under both long-test and short-test conditions, and using (quasi-)reachability matrices as references, simulation studies were conducted to compare the accuracy of various SSCQM and USCQM designs. The results demonstrate that, overall, SSCQM and USCQM exhibit higher classification accuracy. Empirical research data further validate the advantages of SSCQM and USCQM tests.
Full Text
A Unified Design Method for the Simplest Complete Q-Matrix in Cognitive Diagnostic Testing
Tang Xiaojuan¹, Mao Mengmeng², Li Yu³, Ding Shuliang⁴, Peng Zhixia⁵
(¹ School of Education, Jiangxi Normal University, Nanchang 330022, China)
(² School of Public Policy and Administration, Nanchang University, Nanchang 330036, China)
(³ Mental Health Education Center, School of Marxism, Zhejiang Gongshang University, Hangzhou 310018, China)
(⁴ College of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China)
(⁵ School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou 310018, China)
Abstract
Attribute level (dichotomous vs. polytomous attributes) and item scoring method (0-1 scoring vs. polytomous scoring) represent two critical dimensions in cognitive diagnostic test design. Polytomous attribute tests provide more detailed diagnostic information, while polytomously scored tests yield higher classification accuracy. However, existing cognitive diagnostic tests lack integrated designs that combine polytomous attributes with polytomous scoring. Drawing upon concepts of structured/unstructured simplest complete Q-matrices (SSCQM/USCQM) for dichotomous attributes and polytomous scoring, this paper proposes a unified design method for the simplest complete Q-matrix in cognitive diagnostic testing. This approach addresses test design challenges across various combinations of attribute levels and scoring methods. Using (quasi-)reachability matrices as benchmarks, simulation studies compare the classification accuracy of various SSCQM and USCQM configurations under both long and short test conditions. Results demonstrate that SSCQM and USCQM generally achieve higher classification accuracy, with empirical data further validating the advantages of these test designs.
Keywords: cognitive diagnostic testing, test design, simplest complete Q-matrix, unified design method
1. Introduction
Educational assessment fundamentally shapes the direction of educational development, serving as a guiding compass for instructional practices. The Overall Plan for Deepening Education Evaluation Reform in the New Era issued by the Central Committee of the Communist Party of China and the State Council reflects the government's emphasis on formative assessment while imposing higher demands on the diagnostic functions of educational evaluation. Multiple intelligences theory similarly advocates for strengthening diagnostic capabilities to provide empirical foundations for teachers to guide students. Grounded in cognitive psychology theory, cognitive diagnostic testing (CDT) offers more granular diagnostic information than standardized assessments (Liu et al., 2016). Since the quality of cognitive diagnostic tests directly impacts the precision of diagnostic information and subsequently their remedial functions, test design plays a pivotal role in the cognitive diagnostic process. Gorin (2007) proposed that an effective test should elicit specific behaviors from examinees while remaining diagnostically tractable. Consequently, ideal test designs can trigger differentiated response patterns across different knowledge states (KS), thereby better distinguishing among examinees by establishing one-to-one correspondences between knowledge states and ideal response patterns (IRP) or observed response patterns (ORP). Research indicates that under certain conditions, (quasi-)reachability matrices facilitate establishing these one-to-one mappings (丁树良等, 2011; 丁树良, 罗芬等, 2014; 丁树良, 汪文义等, 2014). However, when the number of attributes (or attribute levels) is large, (quasi-)reachability matrices contain numerous columns, making them impractical for short-test scenarios. A central challenge in test design is how to effectively differentiate examinees through minimal item counts in formative assessment contexts. In practice, cognitive diagnostic testing involves complex scenarios encompassing various combinations of attribute levels and scoring methods, yet relevant design research remains fragmented and lacks integrated methodologies, hindering both implementation and theoretical advancement.
Attribute levels and item scoring methods constitute two essential dimensions of cognitive diagnostic test design. The Q-matrix represents cognitive diagnostic tests, where each row corresponds to an attribute and each column $\mathbf{q}_j$ represents an item. Attributes describe knowledge or skills, while items are represented by attribute vectors. Most existing research addresses dichotomous attributes (levels 0 or 1), where 0 indicates an item does not assess an attribute and 1 indicates it does, yielding a binary Q-matrix or Boolean matrix. As cognitive diagnostic theory and applications have evolved, dichotomous attribute items have proven limited in precisely describing examinees' attribute mastery levels. Polytomous attribute items can provide more detailed diagnostic information about examinee proficiency, leading to the development of polytomous Q-matrices (called multi-valued Q-matrices). For $K$ attributes, dichotomous attribute knowledge states (also represented as attribute vectors) produce $2^K$ possible states, whereas polytomous attribute knowledge states produce $\prod M_k$ possibilities (where $M_k$ represents the number of levels for attribute $k$, $k = 1,2,...,K$, with some attributes having $M_k \geq 2$). Clearly, polytomous attributes enable finer-grained classification. Researchers have developed numerous polytomous attribute cognitive diagnostic models (CDM), including five models mentioned by Chen and de la Torre (2013), plus GDD-P (Sun et al., 2013), PA-rRUM and PA-DINA (蔡艳, 涂冬波, 2015), RPa-DINA, Rpa-DINO and Rpa-LLM (詹沛达等, 2016), GRPa-DINA (王立君等, 2022), among others.
Cognitive diagnostic test scoring methods can be refined from 0-1 scoring to polytomous scoring. Polytomously scored tests provide more detailed diagnostic information (van der Ark, 2001; Ma & de la Torre, 2016). Most polytomous cognitive diagnostic models extend from 0-1 scoring models (高旭亮等, 2021).
Based on different combinations of attribute levels and scoring methods, cognitive diagnostic tests can be categorized into four types: dichotomous attribute 0-1 scoring, polytomous attribute 0-1 scoring, dichotomous attribute polytomous scoring, and polytomous attribute polytomous scoring. These four test types exhibit substantial differences in classification accuracy. Although polytomous attribute 0-1 scoring tests provide richer diagnostic information, their classification accuracy is lower than that of dichotomous attribute 0-1 scoring tests (詹沛达等, 2016). Dichotomous attribute polytomous scoring tests, which can probe examinees' cognitive processes more deeply, achieve higher classification accuracy than their 0-1 scoring counterparts, particularly when attribute counts are large (吴方文, 2015). However, dichotomous attributes provide less diagnostic information than polytomous attribute tests. Polytomous attribute polytomous scoring tests can encompass various educational assessment scenarios while yielding rich and precise diagnostic results, thus offering broad application prospects (王立君等, 2022). Existing research primarily focuses on dichotomous attribute 0-1 scoring tests, constructing such tests from two perspectives: item structure (Tatsuoka, 1995; Leighton et al., 2004; 丁树良等, 2010; 2012; 2019) and test assembly indices (Cheng, 2010; Henson & Douglas, 2005; Henson et al., 2008; Kuo et al., 2016; 唐小娟等, 2013). Research on other test types is relatively scarce or even nonexistent, let alone unified methods applicable to all four cognitive diagnostic test types. Specifically, the multiple attribute levels in polytomous tests and diverse scoring methods in polytomously scored tests complicate test design. Relevant research is extremely limited, yet given the importance of test design and practical testing needs, addressing these challenges is essential.
A complete Q-matrix can identify all examinees, whereas an incomplete Q-matrix misclassifies them into incorrect categories (Chiu et al., 2009). Therefore, Q-matrix completeness is crucial in cognitive diagnostic test design (Chiu, 2013; De Carlo, 2012; Groß & George, 2014; Liu et al., 2012; 2013). Building upon complete Q-matrices, this study proposes a unified design method for the simplest complete Q-matrix applicable to various combinations of attribute levels (dichotomous and polytomous) and scoring methods (0-1 and polytomous), providing theoretical support for addressing test design challenges.
2. Complete Q-Matrix
As a high-quality test Q-matrix capable of identifying all examinees, complete Q-matrices can be applied to various test types, with (quasi-)reachability matrices being widely used (丁树良等, 2010; 2015). For dichotomous attributes, when the number of attributes is $K$, the reachability matrix (generally denoted as $\mathbf{R}$) is a $K$-order matrix. For polytomous attributes, matrices with this property are called quasi-reachability matrices (generally denoted as $\mathbf{R}_P$). Consider an example with three attributes (Figure 1 [FIGURE:1]):
Example 1: The reachability matrix for Figure 1(a) is $\mathbf{R} = ()$. For Figure 1(b), based on the dichotomous attribute reachability matrix $\mathbf{R}$, columns are added on the main diagonal for attributes where the $K$-th attribute has more than one level, yielding the polytomous attribute quasi-reachability matrix. For instance, if attributes $A_1$ to $A_3$ have 2, 3, and 4 levels respectively, the quasi-reachability matrix becomes $\mathbf{R}_P = ()$.
Current research on complete Q-matrices primarily focuses on two foundational perspectives and item structure. The two main perspectives for complete Q-matrix design are: (1) Ideal Response Patterns. 丁树良等 (2012) argue that the relationship among knowledge states, ideal response patterns, and observed response patterns forms the core of cognitive diagnosis. Response patterns unaffected by item characteristics, motivation, or random factors are ideal response patterns; otherwise, they are observed response patterns. Under equivalent conditions, complete Q-matrices diagnosing ideal (or observed) response patterns achieve higher knowledge state identification accuracy than incomplete Q-matrices. Based on ideal response patterns, 丁树良, 罗芬等 (2015) proposed that if attributes are non-compensatory, a 0-1 scoring test Q-matrix containing the reachability matrix as a submatrix is complete. 丁树良, 罗芬等 (2014) and 丁树良, 汪文义等 (2014) developed construction methods and proofs for polytomous scoring test complete Q-matrices for several basic attribute hierarchy structures. 唐小娟等 (2024) further proposed a construction method for the simplest complete Q-matrix (SCQM) applicable to various attribute hierarchy structures in polytomous scoring tests. (2) Cognitive Diagnosis Models. Some scholars argue that completeness is model-dependent—a Q-matrix complete for one CDM may not be complete for another (Chiu, 2013). For example, Köhn and Chiu (2019, 2021) proposed that Q-matrices complete for the DINA model are matrices between the reachability matrix and identity matrix. For DINA, DINO, GPDINA, sequential DINA, RegLCMs, and other models, a necessary and/or sufficient condition for a cognitive diagnostic test Q-matrix to be complete is that the test includes an identity matrix (Chen et al., 2015; Culpepper, 2019; Fang et al., 2019; Gu & Xu, 2019, 2021; Lin & Xu, 2023; Jing & Xu, 2022).
The distinction and connection between these two perspectives: In fact, examining test completeness based on ideal response patterns forms the foundation for examining completeness based on CDMs, and allows evaluation of test blueprints before administration. The former only requires one-to-one correspondence between knowledge states and ideal response patterns without considering random error, while the latter incorporates random error (requiring correspondence between knowledge states and observed response patterns). When item scoring methods and maximum item scores are identical and no random error exists, both approaches yield identical results. When random error is small—for instance, when slip ($s$) and guess ($g$) parameters follow a $U(0, t)$ distribution ($0 < t < 0.1$)—the resulting knowledge state estimates differ minimally. This naturally requires meticulously crafted test items, high motivation, and ideal testing conditions.
Based on whether all items in the Q-matrix fully conform to attribute hierarchy structures, complete Q-matrices are classified as structured complete Q-matrices or unstructured complete Q-matrices. Existing research is scattered across different combinations of attribute levels and scoring methods, employing disparate design approaches for different test types without an integrated framework. First, for dichotomous attribute 0-1 scoring tests, the reachability matrix serves as the structured complete Q-matrix, while the identity matrix is the unstructured complete Q-matrix (except for independent structures), with matrices between them remaining unstructured complete Q-matrices (Köhn & Chiu, 2021). Furthermore, 丁树良等 (2022) demonstrated that test Q-matrices containing the unstructured complete Q-matrix constructed by Köhn & Chiu (2021) as a submatrix retain completeness. Second, for dichotomous attribute polytomous scoring tests, 丁树良, 罗芬等 (2014) and 丁树良, 汪文义等 (2014) proposed structured complete Q-matrices for several basic attribute hierarchy structures. For various attribute hierarchy structures, 唐小娟等 (2024) extracted structured simplest complete Q-matrices (SSCQM) from reachability matrices and extended Köhn and Chiu's (2021) method for constructing unstructured complete Q-matrices to obtain polytomous scoring unstructured simplest complete Q-matrices (USCQM). Third, for polytomous attribute 0-1 scoring tests, the structured complete Q-matrix is the quasi-reachability matrix (蔡艳, 涂冬波, 2015; 丁树良, 汪文义等, 2015), though research on other structured and unstructured complete Q-matrices remains scarce. Fourth, for polytomous attribute polytomous scoring tests, the structured complete Q-matrix is the quasi-reachability matrix or matrices formed by merging its columns (Sun et al., 2013). While detailed research on unstructured complete Q-matrices is lacking, some scholars mention that unstructured complete Q-matrices can consist of items satisfying T-matrix conditions (Culpepper, 2019; Fang et al., 2019). The T-matrix, an alternative Q-matrix representation, describes the relationship between observed response distributions and model structure, particularly constructing linear dependencies between attribute distributions and observed response distributions (see Liu et al., 2012).
In summary, Q-matrix completeness design methods in cognitive diagnostic testing exhibit spontaneous and fragmented characteristics, lacking a unified perspective. Notably, the simplest complete Q-matrix (唐小娟等, 2024) represents the complete Q-matrix with the fewest items in the reachability matrix, offering the advantage of minimal item count with strong classification power—particularly advantageous in short tests. This holds significant practical implications under China's "Double Reduction" policy and warrants investigation. In fact, the reachability matrix is the simplest complete Q-matrix for dichotomous attribute 0-1 scoring tests, while the quasi-reachability matrix serves this role for polytomous attribute 0-1 scoring tests. However, only one such simplest complete Q-matrix exists for each, compromising test security. For more complex scenarios, such as those addressed by 唐小娟等 (2024) for dichotomous attribute polytomous scoring tests, the proposed simplest complete Q-matrix design methods (SSCQM and USCQM) generate non-unique matrices, enhancing test security. For the most complex scenario—polytomous attribute polytomous scoring tests—corresponding simplest complete Q-matrix research remains absent. The three aforementioned test types can all be viewed as special cases of polytomous attribute polytomous scoring tests. Therefore, from both classification capability and test security perspectives, researching simplest complete Q-matrix designs for polytomous attribute polytomous scoring tests holds general theoretical significance. This study proposes a unified design method for the simplest complete Q-matrix applicable to various cognitive diagnostic test types, based on the perspective of item attribute total scores. Grounded in the two foundational perspectives of complete Q-matrix design and item structure, this research comprises the following components: First, based on ideal response patterns, we propose unified methods for designing structured and unstructured simplest complete Q-matrices. Second, we examine whether completeness is maintained when integrating structured and unstructured simplest complete Q-matrices with certain cognitive diagnosis models. Finally, we validate the classification capabilities of simplest complete Q-matrices through simulation and empirical studies.
3.1 Elements of Cognitive Diagnostic Test Design
Cognitive diagnostic test design involves elements including attributes, attribute hierarchy structures, item scoring methods, and maximum item scores. This study examines five basic attribute hierarchy structures: linear, convergent, divergent, unstructured, and independent structures. Other hierarchy structures can be formed by combining these basic types. Regarding attributes, we primarily investigate the impact of attribute level count, examining both dichotomous and polytomous attributes. Item scoring methods consider both 0-1 scoring and polytomous scoring types. The proposed item scoring method satisfies the attribute-score correspondence assumption (詹沛达等, 2016) with equal attribute weights (referring to the relative importance of each attribute in overall evaluation).
(1) Item Scoring Method $\eta_{ij}$
The item scoring method is defined as:
$$\eta_{ij} = \begin{cases}
(I{\alpha_i \geq q_j})^\gamma, & \text{when } \gamma = 1 \
1-\gamma, & \text{when } \gamma = 0
\end{cases}$$
$$\beta_k I{\alpha_{ik} \geq q_{kj}} q_{kj}$$
where examinee $i$ has attribute pattern $\alpha_i = {\alpha_{i1}, \alpha_{i2}, \cdots, \alpha_{iK}}$, and item $j$ has attribute vector $q_j = {q_{1j}, q_{2j}, \cdots, q_{Kj}}$.
When $\gamma = 1$, the scoring method is 0-1 scoring: if $\forall k \leq K, \alpha_{ik} \geq q_{kj}$, the score is 1; otherwise, 0. Each item's maximum score is $m_j = 1$. When $\gamma = 0$, the scoring method is polytomous: if $\alpha_{ik} \geq q_{kj}$, examinee $\alpha_i$ gains additional $\beta_k q_{kj}$ points on item $j$ (applicable to both dichotomous and polytomous attributes). Each item's maximum score is:
$$m_j = \sum \beta_k q_{kj}$$
Specifically, when attribute weight $\beta_k = 1$, the examinee's ideal score increases by $q_{kj}$ for each additional mastered attribute $k$ in item $j$. Each item's maximum score is $m_j = \sum$.
(2) Item Attribute Total Score (IATS)
The unified design method proposed in this study is primarily based on the concept of Item Attribute Total Score (IATS), defined as:
$$IATS_j = \beta_k q_{kj}$$
For 0-1 scoring tests with attribute weight $\beta_k = 1$, $IATS_j$ equals 1 divided by the number of attributes assessed by the item. For polytomous scoring tests using the proposed scoring method and maximum scores, the item attribute total score is always $IATS_j = 1$. In practical testing, the average examinee score on an item's attributes divided by the item attribute total score reflects the difficulty of that item's attributes to some extent.
(3) Vector-Related Definitions
When selecting columns (vectors) from (quasi-)reachability matrices, comparisons among columns require the following concepts:
Definition 1: For two $K$-dimensional vectors $\mathbf{X} = {x_1, x_2, \cdots, x_k}^T$ and $\mathbf{Y} = {y_1, y_2, \cdots, y_k}^T$, if $x_k \leq y_k (\forall k \leq K)$, then $\mathbf{X} \leq \mathbf{Y}$, and $\mathbf{X}$ and $\mathbf{Y}$ are comparable, with $\mathbf{Y}$ greater than or equal to $\mathbf{X}$. $\mathbf{X} = \mathbf{Y}$ if and only if $x_k = y_k (\forall k \leq K)$, where "$\leq$" is a partial order relation. If no partial order relation exists between $\mathbf{X}$ and $\mathbf{Y}$, they are incomparable.
Definition 2: Among all comparable different vectors, if there exists $\mathbf{Y}$ such that for all $\mathbf{X}$, $\mathbf{X} \leq \mathbf{Y}$, then $\mathbf{Y}$ is called the maximum column.
Example 2: Given matrix $\mathbf{R} = (q_1, q_2, q_3) = ()$, since $q_{k1} \leq q_{k2} (\forall k \leq 3)$, $q_1$ and $q_2$ are comparable, with $q_2$ greater than $q_1$. Similarly, $q_1 \leq q_3$, with $q_3$ greater than $q_1$. Since $q_{22} \geq q_{23}$ and $q_{32} \leq q_{33}$, $q_2$ and $q_3$ are incomparable. If we partition $\mathbf{R}$ into groups based on comparability among $q_j (j = 1,2,3)$, we obtain two groups where vectors within each group are comparable and vectors across groups contain at least one incomparable pair. For example, $(q_1, q_2)$ and $q_3$, where $q_2$ is the maximum column of $(q_1, q_2)$. Alternatively, $\mathbf{R}$ can be partitioned into $(q_1, q_3)$ and $q_2$, where the maximum column of $(q_1, q_3)$ is $q_3$.
Below, we present unified design methods for structured simplest complete Q-matrices (SSCQM) and unstructured simplest complete Q-matrices (USCQM) for various types of cognitive diagnostic tests.
3.2 Unified Design Method for Structured Simplest Complete Q-Matrix (SSCQM)
Unlike previous SSCQM research design methods (唐小娟等, 2024), this study proposes a unified SSCQM design method based on item attribute total scores.
3.2.1 SSCQM Unified Design Procedure
The unified design method for cognitive diagnostic test SSCQM follows these specific steps:
Step 1: Partition comparable column vectors in the dichotomous attribute reachability matrix into one or multiple groups based on partial order relations.
Step 2: Retain the maximum column from each group to facilitate Q-matrix minimization while maintaining completeness.
Step 3: Within each group, retain columns with item attribute total scores different from the maximum column, then combine these columns with the retained maximum columns from Step 2 to form the dichotomous attribute SSCQM.
Step 4: If the test involves polytomous attributes, add all columns where attribute levels exceed 1 to the dichotomous attribute SSCQM.
3.2.2 SSCQM Design Example
SSCQM design applies to various attribute hierarchy structures. We illustrate the unified SSCQM design method using a three-attribute divergent structure (Figure 1) with attribute weights $\beta_k$ set to 1. For polytomous attributes, $A_1$ to $A_3$ have 2, 3, and 4 levels respectively.
Example 3 (continuing Examples 1 and 2): The divergent structure's reachability matrix is $\mathbf{R} = (q_1, q_2, q_3)$. In the first partition, $(q_1, q_2)$ and $q_3$, $q_2$ is the maximum column of $(q_1, q_2)$. Since $q_3$ contains only one column, it is designated as that group's maximum column. Similarly, in the alternative partition $(q_1, q_3)$ and $q_2$, the maximum columns are $q_3$ and $q_2$ respectively.
(1) 0-1 Scoring Tests
① Dichotomous Attribute 0-1 Scoring Test
For dichotomous attribute 0-1 scoring tests, $\gamma = 1$ and item maximum score $m_j = (\sum \beta_k q_{kj} = 1)$. Each attribute level is $q_{kj} = 0$ or 1. For the first partition $(q_1, q_2)$ and $q_3$, we first retain the maximum columns $q_2$ and $q_3$. Items $q_1$ and the maximum column $q_2$ assess 1 and 2 attributes respectively. The maximum score for $q_1$ is $m_1 = 1$, assessing 1 attribute ($q_{11} = 1, q_{21} = q_{31} = 0$), with $\beta_1 q_{11} = 1$, so $q_1$'s item attribute total score is $m_1 / (\beta_1 q_{11}) = 1$. For $q_2$, the maximum score $m_2 = 1$ assesses 2 attributes ($q_{12} = q_{22} = 1, q_{32} = 0$), with $\beta_1 q_{11} + \beta_2 q_{12} = 2$, so $q_2$'s item attribute total score is $m_2 / 2$. Since $q_1$ and the maximum column $q_2$ have different item attribute total scores, we retain $q_1$, then combine it with the maximum columns $q_2$ and $q_3$ retained in Step 2. Thus, the SSCQM for dichotomous attribute 0-1 scoring tests is the reachability matrix $\mathbf{R} = (q_1, q_2, q_3)$. Similarly, for the second partition $(q_1, q_3)$ and $q_2$, the maximum columns are $q_3$ and $q_2$. The item attribute total scores for $q_1$ and $q_3$ are 1 and $1/3$ respectively. Since $q_1$ differs from the maximum column $q_3$'s item attribute total score, we retain $q_1$, then combine it with the maximum columns $q_3$ and $q_2$ from Step 2, yielding the same SSCQM: the reachability matrix.
② Polytomous Attribute 0-1 Scoring Test
For polytomous attribute item $j$, each attribute has level $q_{kj}$. Building upon the SSCQM obtained from dichotomous attribute 0-1 scoring tests, we add columns on the main diagonal for attributes where the $K$-th attribute has more than one level. Since attribute $A_2$ has an additional level 2 and $A_3$ has additional levels 2 and 3, we add the corresponding column vectors to obtain the polytomous attribute 0-1 scoring SSCQM, which is the quasi-reachability matrix $\mathbf{R}_P$ in Figure 1(b).
(2) Polytomous Scoring Tests
① Dichotomous Attribute Polytomous Scoring Test
For polytomous scoring tests, $\gamma = 0$ and item maximum score $m_j = \sum \beta_k q_{kj}$. For the first partition $(q_1, q_2)$ and $q_3$ of reachability matrix $\mathbf{R}$, we first retain the maximum columns $q_2$ and $q_3$. Under the proposed item scoring method and maximum scores, all item attribute total scores equal 1. Since $q_1$ and the maximum column $q_2$ have identical item attribute total scores, we do not retain $q_1$. Combining the maximum columns $q_2$ and $q_3$ from Step 1 yields the dichotomous attribute polytomous scoring SSCQM (denoted $\mathbf{Q}{RS}$) as $\mathbf{Q} = ()$. The alternative partition follows similarly.
② Polytomous Attribute Polytomous Scoring Test
Building upon the dichotomous attribute polytomous scoring SSCQM, we add all columns where attribute levels exceed 1 (as in (1)②) to obtain the polytomous attribute polytomous scoring SSCQM (denoted $\mathbf{Q}{RPS}$) as $\mathbf{Q} = ()$.
If the item scoring method or maximum scores change, the column merging in Step 3 varies across groups, altering the columns comprising the SSCQM. Adding columns where attribute levels exceed 1 to the dichotomous attribute SSCQM yields the dichotomous attribute polytomous scoring SSCQM. The design method based on item attribute total scores can encompass various cognitive diagnostic test types across all attribute hierarchy structures, including dichotomous attribute 0-1 scoring, dichotomous attribute polytomous scoring, polytomous attribute 0-1 scoring, and polytomous attribute polytomous scoring tests, demonstrating broad applicability. Theoretical proofs for SSCQM are provided in the appendix. Since multiple grouping possibilities exist, each cognitive diagnostic test may have multiple SSCQMs (particularly when attribute counts and level numbers are large). Therefore, selecting different SSCQMs in practical applications can reduce item exposure rates and enhance test security.
3.3 Unified Design Method for Unstructured Simplest Complete Q-Matrix (USCQM)
Köhn and Chiu (2021) demonstrated the construction method for unstructured complete Q-matrices in dichotomous attribute 0-1 scoring tests, where unstructured complete Q-matrices lie between the reachability matrix (upper bound) and identity matrix (lower bound). 唐小娟等 (2024) extended this method to construct unstructured simplest complete Q-matrices (USCQM) for dichotomous attribute polytomous scoring tests. The specific method uses the structured simplest complete Q-matrix (SSCQM) as USCQM's upper bound. If SSCQM's columns correspond to reachability matrix $\mathbf{R}$ column indices $(j_1, j_2, \cdots, j_t)$, we select columns with the same indices from identity matrix $\mathbf{E}$ to form USCQM's lower bound $\mathbf{E}'$, while ensuring each row of USCQM contains at least one 1. Since rows represent attributes, all attributes are assessed. Using the divergent structure in Figure 1 as an example, the dichotomous attribute polytomous scoring SSCQM from Example 3 is $\mathbf{Q}{RS} = ()$, corresponding to reachability matrix $\mathbf{R}$ column indices $(2, 3)$. Selecting the same columns from the same-order identity matrix yields $\mathbf{E}' = ()$. The dichotomous attribute polytomous scoring USCQM (denoted $\mathbf{Q}}$) satisfies $\mathbf{E}' < \mathbf{Q{RU1} < \mathbf{Q} = ()$.}$, with each row containing at least one 1, such as $\mathbf{Q}_{RU1
Figure 2 [FIGURE:2] illustrates the unified design method for cognitive diagnostic test USCQM: From Section 3.2, we obtain USCQM's upper bound SSCQM (i.e., (quasi-)reachability matrix or its complete submatrix) with columns corresponding to (quasi-)reachability matrix column indices $(j_1, j_2, \cdots, j_t)$. We select columns with the same indices from the same-order (quasi-)identity matrix $\mathbf{E}$ ($\mathbf{E}_P$) to form the lower bound $\mathbf{E}'$ ($\mathbf{E}_P'$). USCQM lies between $\mathbf{E}'$ ($\mathbf{E}_P'$) and SSCQM, with each USCQM row containing at least one 1. Using the divergent structure in Figure 1(b) as an example:
(1) 0-1 Scoring Tests
① Dichotomous Attribute 0-1 Scoring Test
From Section 3.2(1)①, the upper bound SSCQM is the reachability matrix $\mathbf{R}$, and the lower bound is the same-order identity matrix $\mathbf{E}$. Thus, USCQM (denoted $\mathbf{Q}{RU2}$) satisfies $\mathbf{E} < \mathbf{Q} = ()$, which is precisely Köhn and Chiu's (2021) method.} < \mathbf{R}$, such as $\mathbf{Q}_{RU2
② Polytomous Attribute 0-1 Scoring Test
From Section 3.2(1)②, the upper bound SSCQM is the quasi-reachability matrix $\mathbf{R}P$, and the lower bound is the same-order quasi-identity matrix $\mathbf{E}_P = ()$. Thus, USCQM (denoted $\mathbf{Q}}$) satisfies $\mathbf{EP < \mathbf{Q}} < \mathbf{RP$, such as $\mathbf{Q} = ()$.
(2) Polytomous Scoring Tests
① Dichotomous Attribute Polytomous Scoring Test: As described in the first paragraph of Section 3.3.
② Polytomous Attribute Polytomous Scoring Test
From Section 3.2(2)②, the upper bound SSCQM is $\mathbf{Q}{RPS}$, corresponding to quasi-reachability matrix column indices $(2,3,4,5,6)$. Selecting the same columns from the quasi-identity matrix yields $\mathbf{E}_P' = ()$. Thus, USCQM (denoted $\mathbf{Q}}$) satisfies $\mathbf{EP' < \mathbf{Q}} < \mathbf{Q{RPS}$, with each row containing at least one 1, such as $\mathbf{Q} = ()$.
Notably, for independent structures, SSCQM is the (quasi-)identity matrix, where upper and lower bounds coincide, so no USCQM exists. Both upper and lower bounds of 0-1 scoring USCQM are complete, so the resulting USCQM is complete. However, the lower bound of polytomous scoring USCQM is a submatrix of the identity or quasi-identity matrix and lacks completeness, so the resulting unstructured Q-matrix may not be complete and requires completeness verification. The numerous USCQMs generated by this method enrich test diversity.
4. Integration of Simplest Complete Q-Matrices with Cognitive Diagnosis Models
As previously discussed, based on the two foundational perspectives of cognitive diagnostic test design, to achieve high classification accuracy in practical applications, we further integrate SSCQM and USCQM with cognitive diagnosis models to examine their classification capabilities. For dichotomous attribute 0-1 scoring tests, Köhn and Chiu (2021) used the DINA model for completeness verification. For dichotomous attribute polytomous scoring tests, 唐小娟 (2024) used a modified RP-DINA model (蔡艳等, 2017) for completeness verification, which we will not reiterate here. Below, we apply the SSCQM and USCQM for polytomous attribute 0-1 scoring and polytomous attribute polytomous scoring tests from Section 3 to their respective cognitive diagnosis models.
4.1 Polytomous Attribute 0-1 Scoring Cognitive Diagnosis Model
Chen and de la Torre (2013) extended the DINA model to the Pa-DINA model, also known as the RPa-DINA model (詹沛达等, 2016), with the probability model:
$$P_{ij} = P(Y_{ij} = 1|\alpha_i) = (1 - s_j - g_j)\eta_{ij} + g_j$$
$$\eta_{ij} = \prod \omega_{ijk}$$
$$\omega_{ijk} = I{\alpha_{ik} \geq q_{kj}}$$
$$^* = I{q_{kj} > 0}$$
where $P_{ij}$ represents the probability of examinee $i$ correctly answering item $j$; $s_j$ is the slip parameter; $g_j$ is the guess parameter; $K$ is the total number of attributes; $q_{kj}$ is an element of the Q-matrix; $\alpha_{ik}$ is examinee $i$'s mastery of attribute $k$; $\eta_{ij}$ is examinee $i$'s ideal response to item $j$; $^*$ is an element of the collapsed Q-matrix (where elements greater than zero in the Q-matrix are collapsed to 1, indicating whether an item assesses the attribute); and $\omega_{ijk}$ can be viewed as examinee $i$'s latent response to attribute $k$ in item $j$ (Maris, 1995, 1999; Whitely, 1980; Embretson, 1984).
4.2 Polytomous Attribute Polytomous Scoring Cognitive Diagnosis Model
Current research on polytomous attribute polytomous scoring cognitive diagnosis models is limited, primarily including GDD-P (Sun et al., 2013) and GRPa-DINA (王立君等, 2022). Since the DINA model is widely used and has been extended to polytomous attribute and polytomous scoring models in numerous studies, this research focuses on the extended polytomous attribute polytomous scoring DINA model, namely GRPa-DINA.
4.2.1 RP-DINA Model
The P-DINA model, extended from the DINA model, is a polytomous scoring model (涂冬波等, 2010; Chen & de la Torre, 2018) with the probability model:
$$P(Y_{ij} = t|\alpha_i) = P^(Y_{ij} \geq t|\alpha_i) - P^(Y_{ij} \geq t + 1|\alpha_i)$$
$$P^*(Y_{ij} \geq t|\alpha_i) = (1 - s_{jt})^{\eta_{ij}} g_{jt}$$
$$\eta_{ij} = \prod \alpha_{ik}$$
where $P(Y_{ij} = t|\alpha_i)$ is the probability that an examinee with mastery pattern $\alpha_i$ receives exactly $t$ points on item $j$, and $P^*(Y_{ij} \geq t|\alpha_i)$ is the probability of receiving $t$ or more points.
Since P-DINA's ideal scoring only includes 0 and 1, affecting classification precision, 蔡艳等 (2017) modified P-DINA's item scoring method, proposing the RP-DINA model:
$$P^*(Y_{ij} \geq t|\alpha_i) = (1 - s_{jt})^{\delta_{ijt}} g_{jt}$$
$$\delta_{ijt} = \begin{cases}
1, & \text{if } \eta_{ij} \geq t \
0, & \text{if } \eta_{ij} < t
\end{cases}$$
where $\eta_{ij} = f_{ix}[' \times m_j]$, $f_{ix}$ is an integer function, and $m_j$ is item $j$'s maximum score.
唐小娟等 (2004) noted that this scoring method might alter Q-matrix completeness, thus modifying $\eta_{ij}$ to the ideal score, i.e., formula (1).
4.2.2 GRPa-DINA Model
王立君等 (2022) extended the RPa-DINA model to polytomous scoring by adapting P-DINA's construction method based on cumulative category response functions. The proposed GRPa-DINA model is:
$$P(Y_{ij} = t|\alpha_i) = P^(Y_{ij} \geq t|\alpha_i) - P^(Y_{ij} \geq t + 1|\alpha_i)$$
$$P^*(Y_{ij} \geq t|\alpha_i) = (1 - s_{jt} - g_{jt})\eta_{ij} + g_{jt}$$
where $\eta_{ij}$, $\omega_{ijk}$, and $q_{kj}^*$ are the same as in equations (5), (6), and (7).
Since $\eta_{ij}$ in equation (14) may prevent the reachability matrix from being a complete Q-matrix (唐小娟, 2024), this study modifies $\eta_{ij}$ in the GRPa-DINA model to $\delta_{ijt}$ from equation (12).
4.3 Integration of SSCQM and USCQM with Cognitive Diagnosis Models
Köhn and Chiu (2021) proposed that if $S(\alpha) = S(\alpha') \rightarrow \alpha = \alpha'$, then the Q-matrix is complete, where $S(\alpha) = E(Y|\alpha)$ represents the expected observed response pattern $Y = (Y_1, Y_2, \cdots, Y_J)$ for examinee $\alpha (\alpha \in KS)$ across all items, and $S_j(\alpha) = E(Y_j|\alpha) = \sum t P(Y_j = t|\alpha) (j \in {1,2, \cdots, J})$ represents the expected observed response for examinee $\alpha$ on item $j$, with $m_j$ as the item maximum score from equation (2). The $S(\alpha)$ values for polytomous attribute 0-1 scoring SSCQM and USCQM applied to the RPa-DINA model are shown in Appendix Table 1 [TABLE:1], while those for polytomous attribute polytomous scoring SSCQM and USCQM applied to the modified GRPa-DINA model are shown in Appendix Tables 2 [TABLE:2] and 3. These tables demonstrate that for both SSCQM and USCQM, all $S(\alpha)$ values are distinct, indicating that completeness is preserved when integrated with cognitive diagnosis models.
5. Simulation Studies
Generally, summative evaluation tests are longer, while formative evaluation tests are shorter, imposing higher demands on cognitive diagnostic tests. Theoretically, SSCQM and USCQM offer the advantage of being minimal-item complete matrices. Long tests using them as submatrices also maintain completeness. Their validity for both formative and summative evaluation merits investigation. As a key validity indicator for cognitive diagnostic tests, classification accuracy serves as a crucial metric (汪文义等, 2014). Simulation Studies 1-4 examine how attribute hierarchy structure, attribute count, attribute level count, and number of complete Q-matrices affect SSCQM and USCQM classification accuracy under both long and short test conditions. The studies also compare SSCQM and USCQM classification accuracy against (quasi-)reachability matrices and incomplete Q-matrices. Since dichotomous attribute tests can be viewed as polytomous attribute tests with attribute level counts of 2, simulation studies focus primarily on polytomous attribute 0-1 scoring and polytomous attribute polytomous scoring tests.
We denote attribute hierarchy structure, attribute count, attribute level count, and number of complete Q-matrices in a test as $H$, $K$, $L$, and $M$ respectively.
5.1 Monte Carlo Simulation
(1) Simulation Conditions (Table 1, where UC represents incomplete Q-matrix):
Table 1. Simulation Conditions
Simulation Object Quantity Polytomous attribute polytomous scoring: Examinees follow normal distribution; Polytomous attribute 0-1 scoring: Examinees follow uniform distribution All knowledge states obtained through expansion algorithm (Ding et al., 2008) and generalized expansion algorithm (丁树良等, 2015) 0-1 scoring: Length equals quasi-reachability matrix (denoted $\mathbf{R}_P$) column count Length = 50 Tests contain SSCQM, USCQM, and incomplete Q-matrix Tests are SSCQM (i.e., quasi-reachability matrix), USCQM, and UC Item parameters $s_j$ and $g_j \sim U(0,0.25)$ Length = $\mathbf{R}_P$ column count ($N^*$) Item parameters $s_j$ and $g_j \sim U(0,0.25)$ Tests are $\mathbf{R}_P$, UC, and containing 1 each of SSCQM and USCQM Item parameters $s_j$ and $g_j \sim U(0,0.35)$ Tests are SSCQM, USCQM, and UC Item parameters $s_j$ and $g_j \sim U(0,0.35)$ SSCQM column count ($n^*$) Length = 35 Tests contain quasi-reachability matrix, SSCQM, USCQM, and UC Item parameters $s_j$ and $g_j \sim U(0,0.35)$ Polytomous attribute 0-1 scoring uses RPa-DINA model; Polytomous attribute polytomous scoring uses modified GRPa-DINA model Examinees' observed response patterns Estimate examinee knowledge states SSCQM and USCQM obtained through respective construction methods; incomplete Q-matrix randomly selected from quasi-reachability matrix Other test items selected and fixed from non-zero knowledge states SSCQM, USCQM, and UC generation methods as above; other test items selected and fixed from non-zero knowledge states See Sections 4.1 and 4.2 Based on examinee true values, test Q-matrix, and CDM, simulate examinee responses Using simulated response data and CDM, estimate via Maximum A Posteriori (MAP) methodThe purpose of setting short test length to $N^$ is to examine classification accuracy when tests containing SSCQM and USCQM reach the quasi-reachability matrix column count, comparing their accuracy with quasi-reachability matrices and incomplete Q-matrices. If test length is $n^$, when the quasi-reachability matrix column count reduces to SSCQM's column count, it becomes an incomplete Q-matrix (UC), allowing comparison of its classification accuracy with SSCQM and USCQM.
Theoretically, since polytomous scoring tests yield higher classification accuracy, we set polytomous scoring item quality slightly lower and long test length shorter than 0-1 scoring tests to explore whether accuracy remains superior. Specifically, 0-1 scoring item parameters $s_j$ and $g_j \sim U(0,0.25)$ with test length 50; polytomous scoring item parameters $s_j$ and $g_j \sim U(0,0.35)$ with test length 35.
(2) Evaluation Metrics
Classification accuracy primarily includes Pattern Match Ratio (PMR) and Marginal Match Ratio (MMR):
$$PMR = \frac{N_{i-correct}}{N}$$
$$MMR = \frac{N_{ik-correct}}{K}$$
where $N$ is total examinee count, $N_{i-correct}$ indicates whether examinee $i$'s attribute mastery pattern is correctly classified (1 if correct, 0 otherwise); $K$ is attribute count, $N_{ik-correct}$ indicates whether examinee $i$'s attribute $k$ is correctly classified (1 if correct, 0 otherwise).
Simulation experiments were repeated 100 times (since numerous USCQMs are available, they were randomly selected), with average PMR and MMR values reported.
5.2.1 Experimental Conditions
Attribute count was fixed at 5. Since the quasi-reachability matrix for independent structures is also a quasi-identity matrix with no USCQM, this study only examined four attribute hierarchy structures: linear (L), convergent (C), divergent (D), and unstructured (U). The following discussions of "changes with attribute hierarchy structure" refer to the sequence: linear → convergent → divergent → unstructured. Attribute level counts are shown in Table 2:
Table 2. Attribute Level Counts
Attribute Level Count Levels 0,1,2 0,1,2,3 0,1,2,3,4 0,1,2 0,1,2Note: 5 attributes were used in Study 1, 7 attributes in Study 2.
5.2.2 Experimental Results
Long and short test results are presented in Tables 3 [TABLE:3] and 4 [TABLE:4]:
(1) In polytomous attribute 0-1 scoring tests, long tests show higher classification accuracy than short tests. Both long and short test accuracy decreases with changes in attribute hierarchy structure. In short tests, the quasi-reachability matrix (i.e., SSCQM) achieves the highest accuracy, USCQM ranks second with accuracy close to SSCQM, and incomplete Q-matrices show the lowest accuracy. In long tests, accuracy ranks from highest to lowest as: containing USCQM, containing quasi-reachability matrix (i.e., SSCQM), and incomplete Q-matrix.
(2) For polytomous attribute polytomous scoring tests, short test accuracy decreases sequentially with attribute hierarchy changes, with MMR reductions smaller than PMR reductions. Long test accuracy generally increases, with MMR increasing by approximately 4% and PMR increasing by no more than 6%. Polytomous attribute polytomous scoring test accuracy (especially PMR) exceeds that of polytomous attribute 0-1 scoring tests, even with lower item quality and shorter long test length. Test length remains an important factor affecting accuracy, with long tests outperforming short tests. When short test length equals quasi-reachability matrix column count ($N^$), tests containing SSCQM and USCQM outperform the quasi-reachability matrix and incomplete Q-matrix, with USCQM-containing tests achieving the highest accuracy and incomplete Q-matrix (UC) tests the lowest. When short test length equals SSCQM column count ($n^$) (i.e., fewer columns than reachability matrix), accuracy ranks from highest to lowest as: USCQM, SSCQM, and incomplete Q-matrix UC.
Table 3. Classification Accuracy of Polytomous Attribute 0-1 Scoring Tests Across Attribute Hierarchy Structures
Structure R(SSCQM) USCQM R(SSCQM) USCQM L C D UTable 4. Classification Accuracy of Polytomous Attribute Polytomous Scoring Tests Across Attribute Hierarchy Structures
$N^/n^$ SSCQM USCQM SSCQM USCQM 11/10 0.9730/ 0.9639/ 0.9589/ 0.9477/ 0.9157/ 0.9347/ 0.8784/ 0.8355/ 0.8216/ 0.7855/ 0.6713/ 0.7371/ U 0.9399/ 0.9130/ 0.9532/ 0.9152/ 0.9319/ 0.8905/ 0.7497/ 0.6641/ 0.8001/ 0.6641/ 0.7079/ 0.5742/5.3.1 Experimental Conditions
This study examined divergent structures with attribute counts $K$ set from 4 to 7 attributes. Attribute level counts are shown in Table 2.
5.3.2 Experimental Results
Results are presented in Tables 5 [TABLE:5] and 6 [TABLE:6]:
(1) In polytomous attribute 0-1 scoring tests, classification accuracy for both long and short tests decreases as attribute count increases, with short test accuracy declining less than long tests. In short tests, quasi-reachability matrix (i.e., SSCQM) accuracy is highest, USCQM ranks second, and incomplete Q-matrix (UC) is lowest. In long tests, accuracy ranks from highest to lowest as: containing USCQM, containing SSCQM, and incomplete Q-matrix. MMR differences between SSCQM and USCQM do not exceed 0.02, and PMR differences do not exceed 0.04.
(2) As attribute count increases, polytomous attribute polytomous scoring test accuracy decreases, with short test accuracy declining more than long tests. Long test accuracy exceeds short test accuracy. Polytomous attribute polytomous scoring test accuracy remains higher than polytomous attribute 0-1 scoring test accuracy. In long tests, complete Q-matrix accuracy exceeds incomplete Q-matrix accuracy. In short tests, when test length is $N^$, accuracy ranks from highest to lowest as: containing USCQM, containing SSCQM, quasi-reachability matrix, and incomplete Q-matrix UC. When test length is $n^$, USCQM accuracy is highest, SSCQM second, and incomplete Q-matrix UC lowest.
Table 5. Classification Accuracy of Polytomous Attribute 0-1 Scoring Tests Across Attribute Counts
K R(SSCQM) USCQM R(SSCQM) USCQM 4 5 6 7Table 6. Classification Accuracy of Polytomous Attribute Polytomous Scoring Tests Across Attribute Counts
$N^/n^$ SSCQM USCQM SSCQM USCQM 13/10 0.9465/ 0.9189/ 0.9532/ 0.9399/ 0.9430/ 0.9310/ 0.9364/ 0.9272/ 15/12 0.8196/ 0.7450/ 0.8001/ 0.7497/ 0.7281/ 0.6864/ 0.6670/ 0.6288/ 0.7371/ 0.7079/ 0.4935/ 0.4622/ 0.9275/ 0.9319/ 0.8810/ 0.8895/5.4.1 Experimental Conditions
This study examined divergent structures with attribute count fixed at 4 and all attributes having 2-5 levels.
5.4.2 Experimental Results
Long and short test results across attribute level counts are shown in Tables 7 [TABLE:7] and 8 [TABLE:8]. These results replicate findings from Studies 1 and 2 regarding completeness effects on accuracy: SSCQM and USCQM accuracy far exceeds that of incomplete Q-matrix UC. Attribute level count primarily affects accuracy as follows:
(1) In polytomous attribute 0-1 scoring tests, accuracy is highest when attribute level count is 2. Generally, accuracy decreases as attribute level count increases for both long and short tests, with long test decreases exceeding short test decreases. In long tests, USCQM-containing tests achieve the highest accuracy.
(2) In polytomous attribute polytomous scoring tests, short test accuracy decreases as attribute level count increases. When test length is $N^$ and level count is 2-4, SSCQM-containing tests show highest accuracy, incomplete Q-matrix UC lowest, with USCQM-containing and quasi-reachability matrix tests in between. When level count is 5, USCQM-containing tests show highest accuracy, incomplete Q-matrix UC lowest, with other tests in between. When test length is $n^$, SSCQM and USCQM accuracy exceeds incomplete Q-matrix accuracy. In long tests, accuracy first increases then decreases with attribute level count, peaking when level count is 3 or 4.
Table 7. Classification Accuracy of Polytomous Attribute 0-1 Scoring Tests Across Attribute Level Counts
L R(SSCQM) USCQM R(SSCQM) USCQM 2 3 4 5Table 8. Classification Accuracy of Polytomous Attribute Polytomous Scoring Tests Across Attribute Level Counts
$N^/n^$ SSCQM USCQM SSCQM USCQM 12/10 0.9575/ 0.9598/ 0.9255/ 0.9454/ 0.9183/ 0.9228/ 0.9118/ 0.8797/ 16/14 0.8431/ 0.8511/ 0.7640/ 0.8125/ 0.7328/ 0.7412/ 0.7101/ 0.6101/ 0.4718/ 0.5925/ 0.6327/ 0.6137/ 0.8569/ 0.8817/ 0.8940/ 0.8779/5.5.1 Experimental Conditions
Attribute hierarchy structure was divergent with attribute count of 5. This study only examined long tests containing $M$ complete Q-matrices to explore accuracy changes.
5.5.2 Experimental Results
Tables 9 [TABLE:9] and 10 [TABLE:10] present classification accuracy for tests containing $M$ complete Q-matrices and incomplete Q-matrix UC. As matrix count increases:
(1) Test accuracy increases for all conditions. Polytomous attribute 0-1 scoring tests show lower initial accuracy but greater improvement than polytomous attribute polytomous scoring tests, though still not reaching the upper bound of polytomous scoring test accuracy.
(2) In polytomous attribute 0-1 scoring tests, SSCQM-containing test accuracy is most affected by complete Q-matrix count, benefiting the most. Complete Q-matrix accuracy consistently exceeds incomplete Q-matrix accuracy.
(3) In polytomous attribute polytomous scoring tests, SSCQM- and USCQM-containing test accuracy outperforms quasi-reachability matrix tests, with incomplete Q-matrix tests showing the lowest accuracy.
Table 9. Classification Accuracy of Polytomous Attribute 0-1 Scoring Tests Across Complete Q-Matrix Counts
M R(SSCQM) USCQM R(SSCQM) USCQM 1 2 3Table 10. Classification Accuracy of Polytomous Attribute Polytomous Scoring Tests Across Complete Q-Matrix Counts
SSCQM USCQM SSCQM USCQM6. Empirical Study
Using real data, we further investigated SSCQM and USCQM classification capabilities for knowledge states. Unlike simulation studies, this empirical dataset lacks true knowledge state information. Under the proposed item scoring method and maximum scores, and considering that reachability and quasi-reachability matrices are complete Q-matrices for dichotomous and polytomous attribute tests respectively with relatively high accuracy, we used reachability or quasi-reachability matrix-based examinee attribute and pattern estimates as benchmarks to calculate attribute accuracy rate (AR) and pattern accuracy rate (PR) from SSCQM and USCQM.
$$AR = \frac{N_{i-Rkcorrect}}{N_{i-Rk}}$$
$$PR = \frac{N_{i-Rcorrect}}{N_{i-R}}$$
where $N_{i-Rk}$ represents attribute $k$ for examinee $i$ estimated by the (quasi-)reachability matrix; $N_{i-Rkcorrect}$ equals 1 when other test estimates match the (quasi-)reachability matrix estimate, otherwise 0. $N_{i-R}$ represents examinee $i$'s attribute pattern estimated by the (quasi-)reachability matrix; $N_{i-Rcorrect}$ equals 1 when other test estimates match the (quasi-)reachability matrix pattern, otherwise 0.
6.1 Experimental Conditions
Data consisted of cognitive diagnostic test results for numeral system conversion (祝玉芳, 2015), with 750 students participating and 705 valid response patterns. Item scoring methods and maximum scores followed equations (1) and (2). The test involved five dichotomous attributes: (1) concept of numeral systems ($A_1$), (2) decimal to other base conversion ($A_2$), (3) other base to decimal conversion ($A_3$), (4) binary to octal or hexadecimal conversion ($A_4$), and (5) octal or hexadecimal to binary conversion ($A_5$). The attribute hierarchy structure was unstructured, with $A_1$ as the prerequisite attribute for all others. Based on expert judgment, these five attributes could be compressed into three polytomous attributes: $A_1'$ (numeral system concept) with 2 levels (0,1) indicating whether the concept is assessed; $A_2'$ (combining $A_2$ and $A_3$) with 3 levels (0,1,2) representing no conversion assessment, other-to-decimal conversion, or decimal-to-other conversion; and $A_3'$ (combining $A_4$ and $A_5$) with 3 levels representing no inter-base conversion assessment, binary-to-octal/hexadecimal conversion, or octal/hexadecimal-to-binary conversion. The hierarchy remained unstructured with $A_1$ as prerequisite. Table 11 [TABLE:11] presents test items in both dichotomous and polytomous attribute representations.
Table 11. Test Items
To further explore USCQM classification capability, we modified structured items to unstructured items corresponding to independent structures, setting all attribute level values for $A_1$ to 0 except for items 2-4 in Table 11.
From the unified SSCQM and USCQM design methods in Section 3, the dichotomous attribute 0-1 scoring SSCQM is the reachability matrix. Since USCQM lies between identity matrix $\mathbf{E}$ and SSCQM, we randomly selected USCQM = . The dichotomous attribute polytomous scoring SSCQM = , and since USCQM lies between identity submatrix $\mathbf{E}'$ and SSCQM, we randomly selected USCQM = . The polytomous attribute 0-1 scoring SSCQM is the quasi-reachability matrix $\mathbf{R}_P = ()$, and since USCQM lies between quasi-identity matrix $\mathbf{E}_P$ and SSCQM, we randomly selected USCQM = (). The polytomous attribute polytomous scoring SSCQM = (), and since USCQM lies between quasi-identity submatrix $\mathbf{E}_P'$ and SSCQM, we randomly selected USCQM = (). Each test type used one SSCQM, one USCQM, and one incomplete Q-matrix to examine classification capability. Theoretically, reachability matrix $\mathbf{R}$ and quasi-reachability matrix $\mathbf{R}_P$ yield relatively high accuracy, so we selected two $\mathbf{R}$ and $\mathbf{R}_P$ accuracy rates for comparison.
Using MCMC algorithms, we estimated item parameters from examinee response data (observed response patterns) and CDMs (from Section 5.1(3)), then estimated examinee knowledge states and calculated attribute accuracy rates (AR) and pattern accuracy rates (PR). Since multiple SSCQMs and USCQMs can be constructed from test items, we averaged the accuracy rates obtained from SSCQM and USCQM during the experiment to evaluate their classification capabilities.
6.2 Experimental Results
Table 12 [TABLE:12] shows $\mathbf{R}1$ and $\mathbf{R}_2$ as reachability matrices in dichotomous attribute tests; Table 13 [TABLE:13] shows $\mathbf{R}}$ and $\mathbf{R{P2}$ as quasi-reachability matrices in polytomous attribute tests. We set $\mathbf{R}_1$ and $\mathbf{R}}$ attribute accuracy rates (AR) and pattern accuracy rates (PR) to 1. $\mathbf{R2$ and $\mathbf{R}$ accuracy rates serve as benchmarks for other tests.
Table 12 presents attribute accuracy rates for dichotomous attribute tests. In 0-1 scoring tests, USCQM attribute accuracy exceeds $\mathbf{R}_2$ (i.e., SSCQM), all above 0.90, while incomplete Q-matrix UC attribute accuracy is lower at 0.8377. In polytomous scoring tests, complete Q-matrix attribute accuracy exceeds 0.84, with SSCQM and USCQM both above 0.87, higher than reachability matrix $\mathbf{R}_2$, and incomplete Q-matrix UC lowest at 0.7401. For 0-1 scoring tests, complete Q-matrix pattern accuracy exceeds 0.61, with USCQM pattern accuracy higher than $\mathbf{R}_2$ (i.e., SSCQM), while incomplete Q-matrix UC pattern accuracy is 0.3844. For polytomous scoring tests, pattern accuracy ranks from highest to lowest as: SSCQM, USCQM, reachability matrix $\mathbf{R}$, and incomplete Q-matrix, with incomplete Q-matrix pattern accuracy substantially lower than complete Q-matrices.
Table 13 presents attribute and pattern accuracy rates for polytomous attribute tests, with conclusions similar to dichotomous attribute tests: complete Q-matrix accuracy exceeds incomplete Q-matrix accuracy for both 0-1 and polytomous scoring, with SSCQM and USCQM accuracy nearly always exceeding quasi-reachability matrix $\mathbf{R}_{P2}$.
Table 12. Attribute Accuracy/Pattern Accuracy Rates for Dichotomous Attribute Tests
Test Type USCQM SSCQM USCQM Dichotomous attribute 0-1 scoring Dichotomous attribute polytomous scoringTable 13. Attribute Accuracy/Pattern Accuracy Rates for Polytomous Attribute Tests
Test Type USCQM SSCQM USCQM Polytomous attribute 0-1 scoring Polytomous attribute polytomous scoring7. Discussion and Conclusions
Achieving precise cognitive diagnostic assessment with minimal items represents a persistent goal. Toward this end, 丁树良, 罗芬等 (2014) and 丁树良, 汪文义等 (2014) proposed basic complete Q-matrix design concepts for several fundamental attribute hierarchy structures in dichotomous attribute polytomous scoring tests. Building on this, 唐小娟等 (2024) developed structured and unstructured simplest complete Q-matrices for dichotomous attribute polytomous scoring tests applicable to all attribute hierarchy structures. Theoretically, polytomous attribute tests provide richer examinee information, while polytomous scoring tests yield higher classification accuracy, making research on these test designs significant.
This paper proposes unified design methods for structured and unstructured simplest complete Q-matrices applicable to all combinations of attribute levels (dichotomous and polytomous) and scoring methods (0-1 and polytomous) across various attribute hierarchy structures. Through four simulation studies and one empirical study, we systematically investigated influencing factors (attribute hierarchy structure, attribute count, attribute level count, and complete Q-matrix count) and validity (marginal classification accuracy, pattern classification accuracy, attribute accuracy, and pattern accuracy) of unified designs (SSCQM and USCQM). We now discuss the findings.
7.1 Discussion and Future Directions
(1) Simplest complete Q-matrix is closely related to item scoring method and maximum scores. Generally, reachability matrices are considered complete Q-matrices. However, reachability matrices are only complete under specific item scoring methods and maximum score conditions (唐小娟, 2024). The polytomous scoring method proposed in this study is widely applicable, satisfies the attribute-score correspondence assumption (詹沛达等, 2016), and uses equal attribute weights. If attribute weights differ, this method cannot be applied. When attribute weights are set to 1, item maximum scores equal the sum of assessed attribute level counts. If item maximum scores exceed the sum of attribute level counts, the proposed unified design method for simplest complete Q-matrices remains applicable. In cognitive diagnostic tests where attributes and scores are independent, the unified design method can also accommodate 0-1 scoring. Future test design research should further investigate other ideal scoring methods (e.g., attribute manifestation assumptions, attribute conjunction condensation rules (詹沛达等, 2016)) and maximum score variations (e.g., $m_j \leq \sum$ and $m_j \geq \sum$ with unequal attribute weights). Cognitive diagnosis models involve item scoring methods and maximum scores; future research should examine simplest complete Q-matrix designs for models like DINO, A-CDM, and G-DINA.
(2) Exploring SSCQM and USCQM construction theories and methods from different item types. This study primarily based SSCQM and USCQM construction on reachability and quasi-reachability matrices. Future research should investigate finding SSCQM and USCQM beyond these matrices or from all item types. For all item types, SSCQM is a submatrix of (quasi-)reachability matrices—is it the complete Q-matrix with the fewest columns? Not necessarily. For example, with 6 attributes in an unstructured hierarchy, the dichotomous attribute polytomous scoring SSCQM obtained from the reachability matrix is $\mathbf{Q}{21} = $, while SSCQM obtained from items outside the reachability matrix is $\mathbf{Q} = $. Investigating classification accuracy and influencing factors of simplest complete Q-matrices constructed from different item types holds significant importance for test design.
(3) USCQMs generated by unified design methods require completeness verification. As noted, since the lower bound for generating USCQM is incomplete, matrices between upper and lower bounds require completeness verification to confirm whether they establish one-to-one correspondence between knowledge states and ideal response patterns. When tests assess many attributes and levels, the number of matrices between bounds grows substantially. Constructing complete lower bound matrices could reduce verification time. For example, in inverted pyramid structures (Figure 4 [FIGURE:4]), lower bound completeness construction is closely related to dichotomous attribute polytomous scoring USCQM lower bounds: prerequisite attributes cannot be assessed in the same column in dichotomous attribute polytomous scoring USCQM lower bounds (see appendix example). Whether other special attribute hierarchy structures exist requires further investigation.
(4) Study results inform short test type selection. Although theoretically SSCQM and USCQM have equivalent knowledge state differentiation capabilities, various experimental conditions reveal respective advantages. If the attribute hierarchy structure is clear, short tests may consider using SSCQM. If attribute hierarchy structures are difficult to ascertain, USCQM may be preferable.
(5) Regarding cognitive diagnostic test validity. Test validity refers to the degree to which test content aligns with the characteristics of the construct being measured. Cognitive diagnostic test validity indicators primarily include attribute/pattern classification accuracy, classification consistency, and theoretical construct validity (汪文义等, 2014). Attribute/pattern classification accuracy and classification consistency describe the degree of consistency between estimated and true examinee attributes and patterns derived from observed scores. Theoretical construct validity (TCV) primarily examines the extent to which test Q-matrices represent theoretical attributes and hierarchy structures (丁树良等, 2012), assessing consistency between ideal response patterns and their corresponding true latent classes (汪文义等, 2014). Currently, TCV is mainly applied to dichotomous attribute test design, with no relevant indicators for polytomous attribute test design. Therefore, this study primarily used attribute/pattern classification accuracy as validity indicators—namely MMR and PMR in simulation studies, and AR and APR in empirical studies. Cognitive diagnostic test reliability faces similar issues, warranting future research on validity and reliability indicators for polytomous attribute tests.
7.2 Research Conclusions
Theoretically, multiple grouping possibilities based on comparable items generate more SSCQMs and USCQMs than quasi-reachability matrices, helping address test design uniformity, reduce item exposure, and enhance test security. Findings indicate: (1) Complete Q-matrix classification accuracy exceeds incomplete Q-matrix accuracy. In various long and short tests containing SSCQM, USCQM, and (quasi-)reachability matrices, accuracy decreases with changes in attribute hierarchy structure and increases in attribute count and level count (except for polytomous attribute polytomous scoring long tests), while increasing with complete Q-matrix count. (2) For short tests, when test length equals (quasi-)reachability matrix column count, tests containing SSCQM and USCQM almost always outperform (quasi-)reachability matrices. When test length equals SSCQM column count (i.e., fewer columns than reachability matrix), SSCQM and USCQM are complete Q-matrices with minimal columns and outperform incomplete Q-matrices. Simulation studies demonstrate SSCQM and USCQM superiority over (quasi-)reachability matrices in short tests. (3) For long tests, tests containing SSCQM and USCQM demonstrate classification capabilities comparable to (quasi-)reachability matrices. (4) Empirical research shows that, relative to (quasi-)reachability matrices, SSCQM and USCQM achieve relatively high classification accuracy in short tests.
References
References are preserved exactly as in the original Chinese manuscript, including both English and Chinese entries with proper formatting.
Appendices
I. Theoretical Proof of SSCQM Completeness
丁树良, 罗芬等 (2014) and 丁树良, 汪文义等 (2014) have proven that Q-matrices designed for dichotomous attribute polytomous scoring tests are complete (consistent with this study's design results). For polytomous attribute polytomous scoring simplest complete Q-matrix proofs, we propose converting polytomous Q-matrices (polytomous attribute matrices) to 0-1 matrices (Boolean matrices, i.e., dichotomous attribute matrices), then combining Boolean matrix polytomous scoring complete Q-matrix proof results to demonstrate polytomous attribute SSCQM completeness.
The proof primarily relies on the following facts:
1) Polytomous Q-matrices can be transformed into Boolean matrices through expansion algorithms (P-to-D conversion), creating specially constructed Boolean matrices containing polytomous Q-matrix information. These can be converted back to polytomous Q-matrices through compression algorithms (D-to-P conversion) (see 丁树良等, 2015). Thus, polytomous Q-matrices and Boolean matrices have a one-to-one correspondence.
2) Using the item scoring method (1) and maximum scores (2) proposed in this study, computer programs mine polytomous simplest complete Q-matrices (denoted B) with minimal columns, ensuring simplicity.
3) Completeness proof steps: First, transform polytomous Q-matrix tests into Boolean matrices via expansion algorithms. Second, apply Boolean matrix polytomous scoring complete Q-matrix theoretical proofs (丁树良, 罗芬等, 2014; 丁树良, 汪文义等, 2014) to obtain Boolean simplest complete Q-matrices for the same attribute hierarchy relationships with different attribute counts and level counts. Through compression algorithms, we obtain polytomous simplest complete Q-matrices, i.e., B, consistent with this study's results.
4) Computer programs verified that simplest complete Q-matrices obtained under various attribute hierarchy relationships, attribute counts, and level counts indeed establish one-to-one correspondence between knowledge state sets and ideal response pattern sets.
II. Application of Simplest Complete Q-Matrices to Cognitive Diagnosis Models
(1) Integration of Polytomous Attribute 0-1 Scoring SSCQM and USCQM with CDMs (see Appendix Table 1)
Appendix Table 1. $S(\alpha)$ for 3-Attribute Divergent Structure SSCQM/USCQM Applied to RPa-DINA Model
(000) (100) (110) (120) (101) (102) (103) (111) (112) (113) (121) (122) (123) (100)/(100) $S_1(\alpha)$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ $1 - s_1$ SSCQM $\mathbf{R}P$/USCQM $\mathbf{Q}$ (110)/(010) $S_2(\alpha)$ $1 - s_2$ $1 - s_2$ $1 - s_2$ $1 - s_2$ $1 - s_2$ $1 - s_2$ $1 - s_2$ $1 - s_2$ (120)/(020) $S_3(\alpha)$ $1 - s_3$ $1 - s_3$ $1 - s_3$ $1 - s_3$ (101)/(101) $S_4(\alpha)$ $1 - s_4$ $1 - s_4$ $1 - s_4$ $1 - s_4$ $1 - s_4$ $1 - s_4$ $1 - s_4$ $1 - s_4$ $1 - s_4$ (102)/(002) $S_5(\alpha)$ $1 - s_5$ $1 - s_5$ $1 - s_5$ $1 - s_5$ $1 - s_1$ $1 - s_1$ (103)/(003) $S_6(\alpha)$ $1 - s_6$ $1 - s_6$ $1 - s_6$(2) Integration of Polytomous Attribute Polytomous Scoring SSCQM and USCQM with CDMs (see Appendix Tables 2 and 3)
Appendix Table 2. $S(\alpha)$ for 3-Attribute Divergent Structure SSCQM Applied to Modified GRPa-DINA Model
(000) (100) (110) (120) (101) (102) (103) (110) $S_1(\alpha)$ $1 \times (g_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (1 - s_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (1 - s_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (1 - s_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (1 - s_{11} - g_{12})$ SSCQM $\mathbf{Q}_{RPS}$ (120) $S_2(\alpha)$ $1 \times (g_{21} - g_{23}) + 3 \times g_{23}$ $1 \times (1 - s_{21} - g_{23}) + 3 \times g_{23}$ $1 \times (1 - s_{21} - g_{23}) + 3 \times g_{23}$ $1 \times (s_{23} - s_{21}) + 3 \times (1 - s_{23})$ $1 \times (1 - s_{21} - g_{23}) + 3 \times g_{23}$ $1 \times (1 - s_{21} - g_{23}) + 3 \times g_{23}$ $1 \times (1 - s_{21} - g_{23})$ (101) $S_3(\alpha)$ $1 \times (g_{31} - g_{32}) + 2 \times g_{32}$ $1 \times (1 - s_{31} - g_{32}) + 2 \times g_{32}$ $1 \times (1 - s_{31} - g_{32}) + 2 \times g_{32}$ $1 \times (1 - s_{31} - g_{32}) + 2 \times g_{32}$ $1 \times (s_{32} - s_{31}) + 2 \times (1 - s_{32})$ $1 \times (s_{32} - s_{31}) + 2 \times (1 - s_{32})$ $1 \times (s_{32} - s_{31})$ (102) $S_4(\alpha)$ $1 \times (g_{41} - g_{43}) + 3 \times g_{43}$ $1 \times (1 - s_{41} - g_{43}) + 3 \times g_{43}$ $1 \times (1 - s_{41} - g_{43}) + 3 \times g_{43}$ $1 \times (1 - s_{41} - g_{43}) + 3 \times g_{43}$ $1 \times (1 - s_{41} - g_{43}) + 3 \times g_{43}$ $1 \times (s_{43} - s_{41}) + 3 \times (1 - s_{43})$ $1 \times (s_{43} - s_{41})$ (103) $S_5(\alpha)$ $1 \times (g_{51} - g_{54}) + 4 \times g_{54}$ $1 \times (1 - s_{51} - g_{54}) + 4 \times g_{54}$ $1 \times (1 - s_{51} - g_{54}) + 4 \times g_{54}$ $1 \times (1 - s_{51} - g_{54}) + 4 \times g_{54}$ $1 \times (1 - s_{51} - g_{54}) + 4 \times g_{54}$ $1 \times (1 - s_{51} - g_{54}) + 4 \times g_{54}$ $1 \times (s_{54} - s_{51})$Appendix Table 3. $S(\alpha)$ for 3-Attribute Divergent Structure USCQM Applied to Modified GRPa-DINA Model
(000) (100) (110) (120) (101) (102) (103) (111) (112) (113) (121) (122) (123) (110) $S_1(\alpha)$ $1 \times (g_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (1 - s_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (1 - s_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (1 - s_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (1 - s_{11} - g_{12}) + 2 \times g_{12}$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ $1 \times (s_{12} - s_{11}) + 2 \times (1 - s_{12})$ USCQM $\mathbf{Q}_{RPU2}$ (001) $S_3(\alpha)$ $2 \times g_{22}$ $2 \times g_{22}$ $2 \times g_{22}$ $2 \times (1 - s_{22})$ $2 \times g_{22}$ $2 \times g_{22}$ $2 \times g_{22}$ $2 \times (1 - s_{22})$ $2 \times (1 - s_{22})$ $2 \times (1 - s_{22})$ (020) $S_2(\alpha)$ $1 \times g_{31}$ $1 \times g_{31}$ $1 \times g_{31}$ $1 \times g_{31}$ $1 \times (1 - s_{31})$ $1 \times (1 - s_{31})$ $1 \times (1 - s_{31})$ $1 \times (1 - s_{31})$ $1 \times (1 - s_{31})$ $1 \times (1 - s_{31})$ (002) $S_4(\alpha)$ $2 \times g_{42}$ $2 \times g_{42}$ $2 \times g_{42}$ $2 \times g_{42}$ $2 \times (1 - s_{42})$ $2 \times (1 - s_{42})$ $2 \times g_{42}$ $2 \times (1 - s_{42})$ $2 \times (1 - s_{42})$ (003) $S_5(\alpha)$ $3 \times g_{53}$ $3 \times g_{53}$ $3 \times g_{53}$ $3 \times g_{53}$ $3 \times g_{53}$ $3 \times (1 - s_{53})$ $3 \times g_{53}$ $3 \times g_{53}$ $3 \times (1 - s_{53})$III. Example from Discussion and Conclusions (Section 7.1(3))
Using Figure 1 as an example, attributes 1-5 have level counts of 2, 2, 3, 4, and 5 respectively. The lower bound for dichotomous attribute polytomous scoring USCQM can be $\mathbf{Q}{11} = $, $\mathbf{Q}} = $, or $\mathbf{Q{13} = $, but cannot be $\mathbf{Q}_1$'s column 2. Therefore, the complete lower bound for polytomous attribute polytomous scoring USCQM requires that both $\star$ and $\clubsuit$ contain at least one 1.} = $. In polytomous attribute polytomous scoring USCQM lower bounds, the first two attributes cannot simultaneously appear in columns 3-5 of the quasi-identity submatrix $\mathbf{E}_1 = $ expanded from dichotomous attribute identity submatrix $\mathbf{E
Figure 4 [FIGURE:4]. Inverted Pyramid Structure