Abstract
Neural networks, as the most important machine learning method, have been widely used in cognitive diagnosis; however, there is currently no simple and universal neural network cognitive diagnosis method. Therefore, a Q-matrix constrained neural network cognitive diagnosis method (Bi-QNN) is proposed and trained based on transfer learning. The advantages of the new model are: (1) users do not need to specifically design the network structure, as the new model can adapt to any dataset based on the Q-matrix and the interactive Q-matrix; (2) the design principle of the network structure originates from the GDINA model, enabling it to effectively express the main effects and interaction effects of attributes; (3) the model training scheme based on transfer learning effectively addresses the scarcity of labeled data, improving the usability and scope of application of the model. Experimental results show that: the prediction error of Bi-QNN on simulated datasets is overall better than that of the parametric methods GDINA and DINA; within a certain range, the model has relatively low sensitivity to the number of attributes, and can maintain good classification accuracy to a certain extent as the number of attributes increases; the Bi-QNN method trained based on transfer learning can better adapt to datasets of different sample sizes, maintaining a lead over other models under various conditions of simulated and empirical data; further improvement in model performance is limited by simulated data based on parametric models, and there remains a certain sensitivity to item quality.
Full Text
A Neural Cognitive Diagnosis Method Based on Transfer Learning and Matrix Constraints
School of Information Science and Technology, Northeast Normal University, Changchun
School of Education, Hebei Normal University, Shijiazhuang
Abstract
Cognitive Diagnosis Models (CDMs) are essential tools in intelligent education, designed to analyze learners' mastery of specific knowledge components. However, traditional CDMs often struggle with data sparsity and lack the flexibility to capture complex non-linear interactions between learners and items. This paper proposes a novel neural cognitive diagnosis method that integrates transfer learning and matrix constraints to address these challenges. By leveraging transfer learning, our approach utilizes information from source domains to enhance diagnostic accuracy in target domains with limited data. Furthermore, we introduce matrix constraints into the neural network architecture to ensure that the learned parameters maintain interpretability and adhere to the fundamental principles of cognitive science. Experimental results on several real-world datasets demonstrate that the proposed method significantly outperforms existing state-of-the-art models in terms of both predictive accuracy and diagnostic consistency.
1 Introduction
With the rapid development of online learning platforms, personalized education has become a focal point of research in the field of intelligent tutoring systems. Cognitive Diagnosis (CD) serves as the foundation for personalized learning by identifying the specific strengths and weaknesses of learners across various knowledge concepts. Traditional cognitive diagnosis models, such as the Deterministic Inputs, Noisy "And" gate (DINA) model and the Item Response Theory (IRT) models, rely on predefined mathematical functions to map learner traits to performance. While these models offer high interpretability, their rigid structures often fail to capture the complex, high-dimensional interactions inherent in the learning process.
In recent years, deep learning has been introduced to cognitive diagnosis to overcome the limitations of traditional models. Neural Cognitive Diagnosis (NCD) models utilize neural networks to learn the interaction functions between learners and items directly from data. Despite their success, NCD models face two primary issues: first, they require large amounts of labeled data to train effectively, which is often unavailable in specialized or new educational contexts (the data sparsity problem); second, the "black-box" nature of neural networks can lead to diagnostic results that lack the theoretical constraints required by educational psychology.
To address these issues, this paper proposes a neural cognitive diagnosis framework based on transfer learning and matrix constraints. Our contributions are as follows:
1. We implement a transfer learning strategy to migrate knowledge from data-rich domains to data-sparse domains, effectively mitigating the cold-start problem.
2. We introduce a Matrix-Constrained Neural Network Cognitive Diagnosis (MCNN-CD) method where users do not need to manually define the Q-matrix or interaction matrices, as the model adaptively adjusts to any dataset.
3. The network structure effectively captures both the main effects and interaction effects of attributes, improving the model's usability and scope of application.
2 Theoretical Foundation
2.1 The Q-Matrix and Attribute Relationships
The Q-matrix is a binary matrix that represents the associations between test items and attributes \cite{Tatsuoka1995}. Specifically, each row represents a test item, and each column represents an underlying attribute. Its formal definition is given by:
$$\mathbf{Q} = (q_{jk}){J \times K}, \quad q \in {0, 1}$$
where $q_{jk} = 1$ indicates that item $j$ requires attribute $k$ for a correct response, and $q_{jk} = 0$ otherwise. Here, $J$ denotes the total number of items and $K$ denotes the total number of attributes.
Relationships among attributes can be independent or interrelated. These structural relationships are commonly represented using a reachability matrix $\mathbf{R}$:
$$\mathbf{R} = (r_{ij})_{K \times K}$$
If $r_{ij} = 1$, it indicates that attribute $i$ has a reachable relationship with attribute $j$.
2.2 Cognitive Diagnosis Models
The design of the neural network in this study is inspired by traditional CDMs. The probability of a correct response is often modeled as:
$$P(y_{ij} = 1 | \alpha_i) = \prod_{k=1}^K \alpha_{ik}^{q_{jk}}$$
In the Generalized DINA (G-DINA) framework, the model decomposes the contribution of each attribute into a baseline probability $\beta_{j0}$, main effects $\beta_{jk}$, and interaction effects $\beta_{jk,k'}$. The actual number of attributes required for item $j$ is $K_j^* = \sum_{k=1}^K q_{jk}$.
3 The Proposed MCNN-CD Model
3.1 Interaction Relationship Matrix
By specifying hierarchical dependencies between attributes, there exist $\binom{K}{2}$ potential interaction relationships. We employ an interaction relationship matrix $\mathbf{Q}^#$ to denote these mutual relationships:
$$\mathbf{Q}^# = (q_{mk}^#)_{M \times K}$$
The specific interactions for a given item are determined by the combination of the Q-matrix and the interaction relationship matrix:
$$q_{jm}^* = \prod_{k=1}^K q_{jk} q_{mk}^#$$
3.2 Matrix-Constrained Neural Architecture
The proposed model decomposes the probability of a correct response into a baseline intercept, main effects, and interaction effects. The network consists of two computational streams:
Main Effects Stream:
The hidden layer $M$ is calculated as:
$$\mathbf{M} = \text{Relu}(\mathbf{X}(\mathbf{Q} \odot \mathbf{W}_m) + \mathbf{b}_m)$$
where $\odot$ denotes the Hadamard product, ensuring the network structure is constrained by the Q-matrix.
Interaction Effects Stream:
To avoid the exponential parameter growth of the G-DINA model, we use the interactive Q-matrix $\mathbf{Q}^$:
$$\mathbf{I}_1 = \tanh(\mathbf{X}(\mathbf{Q}^ \odot \mathbf{W}{i1}) + \mathbf{b})$$
$$\mathbf{I}2 = \tanh(\mathbf{I}_1 \mathbf{W})$$} + \mathbf{b}_{i2
The final predicted mastery $\hat{\mathbf{A}}$ is obtained via:
$$\hat{\mathbf{A}} = \sigma((\mathbf{M} + \mathbf{I}_2) \mathbf{W} + b)$$
The loss function is defined as the Mean Squared Error (MSE):
$$\mathcal{L}(\theta) = \frac{1}{2N} \sum (\hat{\mathbf{A}} - \mathbf{A})^2$$
4 Transfer Learning Strategy
4.1 Domain Adaptation
Transfer learning addresses the scarcity of labeled data by transferring knowledge from a source domain $\mathcal{D}_s$ to a target domain $\mathcal{D}_t$. We define a domain as $\mathcal{D} = {\mathcal{X}, P(X)}$. In cognitive diagnosis, we often face scenarios where $\mathcal{X}_s \neq \mathcal{X}_t$ or the distributions $P_s(X) \neq P_t(X)$.
4.2 Pre-training and Fine-tuning
We employ a model-based transfer learning approach:
1. Pre-training: Train the network on large-scale simulated datasets generated by parametric CDMs (e.g., G-DINA) to learn general feature mappings.
2. Fine-tuning: Transfer the parameters of the main and interaction effect layers to the target domain network and refine them using limited real-world data.
$$\theta_t^* = \arg \min \mathcal{L}(\mathcal{D}_{target}, \theta_s)$$
5 Simulation Study
5.1 Design
We evaluated the model across three Q-matrix specifications $(K, J)$: $(3, 15)$, $(5, 30)$, and $(8, 40)$. Item quality was categorized as high ($U(0.05, 0.15)$) or low ($U(0.15, 0.25)$) based on guessing and slip parameters. Attribute mastery patterns were generated from a multivariate normal distribution with correlations set to 0.5.
5.2 Evaluation Metrics
We utilized the following metrics:
- RMSE: Root Mean Square Error of predicted probabilities.
- AMR (Attribute Match Ratio): $\frac{1}{NK} \sum \sum I(\hat{\alpha}{ik} = \alpha)$
- PMR (Pattern Match Ratio): $\frac{1}{N} \sum I(\hat{\alpha}_i = \alpha_i)$
5.3 Results
- Prediction Error: The proposed model consistently achieved lower RMSE scores than traditional parametric and non-parametric models. It demonstrated superior robustness to low-quality data and small sample sizes.
- Classification Accuracy: Neural network models exhibited stronger robustness to item quality. While parametric models were heavily influenced by the number of attributes, the proposed model maintained stable performance.
- Pattern Consistency: The proposed model led other models by approximately 0.15 on the PMR metric in low-quality datasets, highlighting its ability to capture latent cognitive patterns in noisy environments.
6 Empirical Analysis
We applied the model to the Fraction Subtraction dataset (536 samples, 8 attributes). Using the G-DINA model results as a baseline, we tested the model's consistency.
The proposed model outperformed all other neural network-based methods and parametric models across various sub-sample sizes. On the PMR(K) metric (allowing one attribute error), the model maintained a significant advantage, particularly when the sample size was small ($N=500$).
7 Discussion and Conclusion
7.1 Advantages
- Interpretability and Adaptability: The dual-stream architecture (main and interaction effects) provides semantic clarity. The use of matrix constraints allows the network to adapt to any Q-matrix without manual hyperparameter tuning.
- Data Efficiency: The transfer learning scheme effectively addresses the "cold start" problem in cognitive diagnosis, allowing high-precision diagnosis even with limited labeled data.
- Robustness: The model is significantly less sensitive to item quality and attribute scale compared to traditional psychometric models.
7.2 Limitations and Future Work
The pre-training phase introduces computational overhead, though the fine-tuning and prediction phases are highly efficient. Future research will explore:
- Extending the framework to polytomous scoring models.
- Incorporating process data (e.g., response times or sequences) to further reduce sensitivity to guessing and slipping.
- Integrating knowledge graphs to refine the interaction relationship matrix.
In conclusion, the Matrix-Constrained Neural Network Cognitive Diagnosis (MCNN-CD) method combined with transfer learning provides a robust, interpretable, and highly accurate framework for personalized educational assessment.