Abstract
Objective: This study integrates the idea of QC matrix with generalized nonparametric classification method (GNPC), and extends GNPC to generalized sequential nonparametric classification method (seq-GNPC) that can be applied to graded scoring items, with a main focus on the problem of saturated models that are poorly discriminated under small samples and the parametric model cannot be estimated in small samples.Methods: Including simulation studies and empirical research. The conditions for the simulation study are as follows: The number of attributes K is set to 3 or 5, and the number of items J is set to 20 or 40. The ratio of fixed graded scoring items is 50%. Two types of QC matrices were also considered: restricted QC matrix and unrestricted QC matrix.Results: The simulation results show that seq-GNPC is better than the parametric method in small samples when the data pattern confirms the saturated model. When the data pattern confirms the reduce model, the parametric model shows a significant inability to converge in the small sample situation, yet seq-GNPC guarantees 100% estimation rate. The empirical results also show that seq-GNPC is more stable than seq-GDINA in the small-sample case, with higher attribute pattern remeasurement rates and smaller standard deviations.Limitations: This study counted the number of replicates that the parametric model was able to estimate for analysis under each condition, but it did not discuss in depth the specific circumstances under which the parametric model would no longer fail to diagnose the classification under each sample size, which may need to be explored further.Conclusions: The seq-GNPC method proposed in this paper has good applicability in small samples and can effectively solve the problem of diagnostic evaluation of graded or mixed scoring programs in small samples.
Full Text
Preamble
A generalized nonparametric classification method of small samples using Qc matrix ingyu hunhua Zhejiang Philosophy and Social Science Laboratory for the Mental Health and Crisis Intervention of Children and Adolescents, Zhejiang Normal University, Jinhua, 321004, China; Jinhua Education College , Jinhua, 321004 Author Contribution Statement Xingyu CHEN Be responsible for generating simulated data, collecting empirical data, analyzing data, and drafting the paper.
Yakun YAN Be responsible for revising the final version.
Chunhua KANG Propose research ideas and design research schemes This research was supported by the Humanities and Social Sciences Fund of the Ministry of Education 22YJA190005 and the key open fund from Zhejiang Philosophy and Social Science Laboratory for the Mental Health and Crisis Intervention of Children and Adolescents, PR China (No. 23MHCICAZD04)
Abstract
Objective This study integrates the idea of matrix with generalized nonparametric classification method (GNPC), and extends GNPC to generalized sequential nonparametric classification method (seq GNPC) that can be applied to graded scoring items, with a main focus on the problem of saturated models that are poorly discriminated under small samples and the parametric model cannot be estimated in small samples.
Methods
Including simulation studies and empirical research.
The conditions for the simulation study are as follows: number of attributes K is set to 3 or 5, and the number of items J is set to 20 or 40. The ratio of fixed graded scoring item is 50%.
Two types of Q matrices were also considered: restricted Q matri and unrestricted matri
Results
The simulation results show that seq GNPC is better than the parametric method in small samples when the data pattern co the saturated model. When the data pattern confirms reduce model, the parametric model shows a significant inability to converge in the small sample situation, yet seq GNPC guarantees 100% estimation rate. The empirical results also show that seq GNPC is more stable than seq GDINA in the small sample case, with h igher attribute pattern remeasurement rates and smaller standard de viations.
Limitations This study counted the number of replicates that the parametric model was able to estimate for analysis under each condition, but it did not discuss in depth the specific circumstances under which the parametric model would no longer fail to diagnose the c lassification under each sample size, which may need to be explored further.
Conclusions
he seq GNPC method proposed in this paper has good applicability in small samples and can effectively solve the problem of diagnostic evaluation of graded or mixed scoring programs in small samples
Keywords
generalized nonparametric classification method nonparametric classification method graded, Qc matrix GDINA
1. I
ntroduction In recent years, educational assessment has become an increasingly significant area of research, and diagnostic educational assessment mode, such as cognitive diagnostic assessment (CDA), are playing a vital role in this process. CDA is a key component of the new generation of psychometric theories, which focuses on understanding the cognitive processing of an individual (Wang & Gierl, 2011) Through CDA, it is possible to gain a deep understanding of an individual's cognitive structure and gain insights into their strengths and weaknesses in cognitive attributes or skill mastery , thus, t approach is particularly well suited for diagnostic educational assessment (Guo et al., 2024) Early cognitive diagnostic assessment methods were primarily parametric, with parametric models containing item and structural parameters that follow a particular distribution and require the use of parameter estimation methods for arithmetic. For example, for dichotomous scoring items, there are the deterministic inputs, noisy βandβ gate (DINA) model (Junker & Sijtsma, 2001) , the deterministic inputs, noisy βorβ gate (DINO) model (Templin & Henson, 2006) , and the generalized DINA (GDINA) model (de la Torre, 2011) point scoring items are only scored (1) and not scored (0), which can provide limited information, in order to coincide with the practical needs of assessment, the importance of graded scoring items in testing is gradually increasing. Graded scoring it ems refer to items with three or more scoring levels (e.g., items with 0, 1, and 2 scoring levels).For graded scoring programs, there are the General Diagnostic Model (GDM; Davier, 2008) equential generalized DINA GDINA; Ma & Torre, 2016) the DINA model for graded data (DINA Tu et al., 2018) , and polytomous (Gao et al., 2021) , among others.
In recent studies, some researchers have also proposed cognitive diagnostic models for multiple strategies for individual multiple strategy problem solving (Liao & Jiao, 2023; Ma, 2019; Ma & Guo, 2019; Wei et al., 2023) In addition, other researchers have applied machine learning to cognitive diagnosis to construct machine learning based cognitive diagnosis assessment methods (Cuerda et al., 2022; Gao et al., 2022; Li et al., 2022; Zhang et al., 2023) In the above graded scoring cognitive diagnostic assessment methods, most of the models are mainly based on the Q matrix for diagnostic assessment. Only the relationship between items and attributes was considered, and the relationship between each level a nd attributes was not taken into account The seq GDINA model was proposed to solve the problem of sequential response data discrimination, in which a matrix that can indicate the sequential relationship between the attributes and the item categories is proposed, and the matrix is able to set the relationship between the category and the attributes in the graded scoring By configuring the matrix, the model can distinguish between sequential and non sequential graded scoring data.
As a result, the model has been applied in many other contexts, including the construction of second language assessments (Yuan et al., 2022), the diagnosis of English writing proficiency (Yuan et al., 2022) , the diagnosis of English writing proficiency (Shi et al., 2024) , and the diagnosis of mathematical arithmetic examinations (Saso et al.,
2023) Responses at all categories of all graded scoring items must be present for the seq GDINA model to be used in practice (Ma & Torre, 2020) , the condition that is easily met in large samples but not necessarily in small samples.
It is well known that the purpose of cognitive diagnostic assessment is to identify or diagnose the strengths or deficiencies of a student's (individual) knowledge structure, providing information and a basis for teachers' teaching or students' self study, and thus, daily educational assessment mainly occurs in small sample contexts such as schools and classes (Chiu et al., 2018) However, in small sample situations, commonly used parametric estimation methods such as Marginal Maximum Likelihood Estimation Expectation Maximization (MMLE EM) may encounter boundary problems (Sorrel et al., 2023) , which can reduce the estimation effectiveness. The classification effect of nonparametric cognitive diagnostic methods is less affected by sample size, and can maintain a better diagnostic classification effect in small samples, such as the nonparametric classification method (NPC) suitable for 0 1 scoring proposed by Douglas 2013) and the generalized nonparametric classification method GNPC ; Chiu et al., 2018) proposed subsequently, which these advantages.
However, these methods do not consider sequential response data and do not consider the discrimination of generalized data under graded scoring. Therefore, this study proposes to apply the matrix in seq GDINA to GNPC to construct a generalized sequential nonparametric classification method (seq GNPC) that can be applied to small sample sequential response data, in order to coincide with the practical needs of diversified data types and scor ing modes in the small sample assessment contexts.
Some of the methodological details involved in this study are presented in the Technical Background and Methodology Presentation sections, followed by the Simulation and Empirical Studies sections where we use simulated data and e mpirical data to test and compare the effect of these models.
Finally, in the Discussion and Conclusion section, we summarize the results of the study and their implications in practice.
2. Technical Background
- 1 Cognitive Diagnosis Models
The GDINA model is a generalized dichotomous scoring model proposed by de la Torre 2011) which takes into account the main effects of attributes as well as the interaction effects between attributes in its model:
π π ( πΆ ππ β ) = π π 0 + β π ππ πΌ ππ
π β² = π + 1
π = 1
π = 1
π = 1 Equation 1
is the intercept term of item hat is, the guessed probability of item is then the main effect of attribute on item which is the change in the probability of answering correctly after mastering this attribute alone. is the interaction between attribute and attribute , and represents the change in the probability of correctly answering when both attributes are simultaneously mastered. is the interaction between attribute to attribute , that is, the change in the probability of answering when mastering all the attributes examined in item Generalized models, also known as saturated models, saturated models take into account the full effect between attributes, orresponding to this is the reduced models.
The reduced model constrains some of the effects and has a simpler model, and the DINA model is the classical reduced model. The DINA model can be obtained by setting all parameters in the GDINA model to 0 except The DINA model is a conjunctive model, an individual needs to master all the attributes of the item examined in order to be able to score, and its item response function (IRF) is:
Equation is the latent response variable, is set to be the th attribute in item (i.e., rows and columns in the Q matrix, examined is 1, unexamined is 0), and is the th attribute in the attribute mastery model of individual (mastered is 1, mastered is 0) which is defined as follows
π ππ πΎ π = 1 Equation 3
is the sli probability of item , and is the guess probability of item , calculated by combining the individual's potential response variable on item with the observed response values
π π = π ( π π π = 0 | π π π = 1 ) Equation 4
π π = π ( π π π = 1 | π π π = 0 ) Equation 5
In contrast to the conjunctive model, which is a disjunctive model in which an individual can score by simply mastering at least one of the attributes examined in the item, the DINO model is typically a disjunctive model in which the latent response variab les are defined as:
π π π = 1 β β ( 1 β πΌ ππ ) π ππ πΎ π = 1 Equation 6
The computation of for the DINO model is the same as that for the DINA model except for the computation of latent category variables.
The DINO model can also be obtained by setting the parameters in the GDINA model.
Ma & Torre (2016) proposed sequential generalized deterministic inputs, noisy "and" gate (seq GDINA) based on GDINA, and graded scoring data to consider sequential responses.
GDINA modifies the traditional Q matrix. The traditional Q matrix is a matrix that indicates which attributes were examined by the item (1 for examined and 0 for not examined). For the grade response, the category Q matrix, called the matrix, is proposed as a matrix, with being the highest rank for item Each row of the matrix has K elements, which are used to
indicate which attributes were examined for the category. When the relationship between the attributes and the categories is clear, a restrict matrix can be set up, and the attributes examined in each category of the project are clearly labeled, see Table for an example.
In practical applications, it is not guaranteed that the relationship between categories and attributes can be understood, so when the relationship between attributes and categories is not clear, an unrestricted matrix can be used, and all categories of the same project examining the attributes are set to be the same, for specific examples, see Item 2 and Item 3 in
If all items are 0 1 scoring items, the matrix is equivalent to the traditional Q matrix. Although it is assumed that response categories are obtained sequentially, it is not necessary for different categories to measure different attributes, nor do the attributes have to follow any particular structure.
The design of seq GDINA based on the matrix is capable of splitting the graded scoring function
π π ( π | πΆ πππ β ) = π ππ 0 + β π πππ πΌ ππ
π β² = π + 1
π = 1
π = 1
π = 1 Equation 7
The following assumptions are included Equation Maximum grade of The process function is the key to the seq GDINA model, and based on the process of category , the IRF of the function item
π ( π π = π | πΆ π ) = [ 1 β π π ( π + 1 | πΆ π ) ] β π π ( π₯ | πΆ π ) π π₯ = 0 Equation 9
At the same time, it is bound as follows
β π ( π π = π | πΆ π ) π» π π = 0 = 1 β π Equation 10
is the probability that an individual whose KS is scores b on item The sum of the probabilities that an individual scores in category 0 through category In the seq GDINA model, which allows different cognitive processes to be modeled in different categories within a single project, GDINA is able to parameterize each category separately.
As with the GDINA model, the seq GDINA model and the seq DINO model can be obtained by constraining the parameters. The seq model is obtained by setting all parameters of the process functions except to zero. onparametric ethod Douglas (2013) proposed a nonparametric discriminant based on the Hamming distance (HD), which utilizes the distance between observed response pattern (ORP) and ideal response pattern to determine an individual's attribute mastery pattern for 0 1 scoring items.
π ππ πΎ π = 1 , then πΌ π =
First, define the ideal response of individual on item , and since there are a total of 2 attribute mastery modes , the same corresponds to
2 K
ideal responses on each item that is, be the distance between individual β²s observed response pattern across all items and the ideal response pattern under
attribute mastery pattern. For 0 1 scoring items, a widely used and natural distance metric for clustering is the Hemming distance, which only counts the number of times the two vectors disagree:
Equation Where the individual attribute mastery pattern discriminated by the attribute mastery pattern corresponding to the attribute mastery pattern that minimizes the d_h (y,Ξ·) value of the individual observation response pattern.
However, since the variation produced by each item response is not all the same, the weighted Hamming distance method can be used in this case:
π π€ β ( π , πΌ ) = β 1
π π Μ Μ Μ ( 1 β π π Μ Μ Μ ) | π¦ π β π π | π½ π = 1 Equation 12
is the score rate for item The penalized Hamming distance method was constructed by adding a penalty term to the original Hamming distance:
Equation is the assigned guessing weight and is the assigned sliding weight. Considering the difference between each item, can also be replaced by and each item has its corresponding guess weight and sli weight. When , the penalized Hemming distance method is equivalent to the Hemming distance method, and more weight will be given to guess when , and more weight will be given to sli . NPC can be applied to data conforming to conjunctive or disjunctive models such as DINA and DINO.
With the development of generalized models, et al ( 2018) extended NPC to GNPC, aiming at solving the problem of poor classification effect of generalized parametric models under small samples.
The method is able to maintain good discrimination in small samples, such as in the classroom, and the method can take into account both conjunctive disjunctive response data
π ππ ( π€ ) = π€ ππ π ππ ( π ) + ( 1 β π€ ππ ) π ππ ( π ) = π ππ ( π ) + π€ ππ ( π ππ ( π ) β π ππ ( π ) ) Equation 14
f item examines attributes ), then item is able to distinguish potential categories instead of because attributes other than the attributes examined by that item do not provide any information, so the potential categories involved in the item can be collapse , and the potential categories after the collapsed denotes attributes currently examined by , and the ideal attribute mastery pattern in the case of
β = 2.
denotes the attributes not examined by Suppose item = 3 and Ideally there are ideal mastery patterns, but since item does not examine attribute 3, the collapsed attribute mastery patterns have Where (1,0,0), (1,0,1) belong to the same (0,1,0), (0,1,1) belong to the same 1,1,0) 1,1,1) belong to the same , and ( 0,0,0) The attribute
mastery patterns in the same are the same as those in item is the weighted ideal response to item j for an individual with a collapsed potential category of denotes the ideal response under conjunctive model (e.g., DINA), denotes the ideal response under junctive model (e.g., DINO), and is the weight for the item j for an individual with a collapsed potential category of which is given by the Equation:
β ( π¦ ππ β π ππ ( π ) ) π β πΆ π | | πΆ π | | ( π ππ ( π ) β π ππ ( π ) ) Equation 15
π€ ππ =
stands for the number of individuals whose attribute mastery pattern is . A loss function is used in GNPC to represent the total distance between an individualβ²s observed response and the item weighted ideal response, and the attribute mastery pattern corresponding to , which minimizes the sum of the loss function values over all items, is taken as an individual's KS
π ππ = β ( π¦ ππ π β πΆ π β π ππ ( π€ ) ) 2 = β ( π¦ ππ π β πΆ π β π ππ ( π ) β π€ ππ ( π ππ ( π ) β π ππ ( π ) ) ) 2 Equation 16
2 π½ π = 1 Equation 17
Equation
πΆ Μ π = πππ πππ π β { 1 , β¦ , 2 πΎ }
patterns
3.1 Methodology Presentation
seq GNPC is a 0 1 scoring cognitive diagnostic method based on Q matrix. In this paper, we will apply the Qc matrix of seq GDINA and its splitting idea to GNPC, and extend GNPC to generalized sequential nonparametric classification method(seq GNPC) which is su itable for graded scoring. Certain restrictions are imposed on seq GNPC:π πππ ( π ) β = {
Equation
π πππ ( π ) β = {
Equation Equation Equation
The idea behind calculating the total seq GNPC distance is to first calculate the process distance for each class under each item separately, and finally sum it up Equation is the 0β1 response of individual in category of item . Assuming that the highest category of is 3 and that individual scores 2 on item , then the individualβ²s responses on the three categories of the item are If the individual scores 1 on item j, then , and both are 0. Where is derived by the following calculation:
β ( π¦ πππ β π πππ ( π ) ) π β πΆ ππ | | πΆ ππ | | ( π πππ ( π ) β π πππ ( π ) ) Equation 24
π€ ππ π =
π πππ ( π€ ) = π€ πππ π πππ ( π ) + ( 1 β π€ πππ ) π πππ ( π ) = π πππ ( π ) + π€ πππ ( π πππ ( π ) β π πππ ( π ) ) Equation 25
In GNPC, the classification results of NPC are used as the initial input. Therefore, this study similarly extends NPC to sequential nonparametric classification
method
NPC) to serve as the input for the initial attribute mastery model of seq GNPC in the case of graded scoring items. seq has the same constraints as seq GNPC:
π πππ β = {
Equation Equation is the process ideal response of category is the ideal response of category b. Using the idea of sequence splitting to construct method is to split the graded counting categories into multiple 0 1 counting categories, constrain them with certain conditions, and then use NPC to calculate the distance between their individual observed responses and the ideal response, and finally them up:
Equation with reference to the process function named process distance in seq GDINA, and since NPC proposes a total of three distance computation methods, the process distance is also computed in the following three ways:
π β ππ ( π¦ ππ , π ππ ) =
π»ππππππ πππ π‘ππππ π»ππππππ πππ π‘ππππ Equation πππππππ§ππ π»ππππππ π‘ππππ
3.2.1 Simulation Study
Design
The simulation study was divided into two parts for simulation, one part set the data pattern to conform to the reduce model (seq DINA model) and the other part set the data pattern to conform to the saturated model (seq GDINA).
The probability of a correct answer is more discretized in the reduce model (e.g., DINA model: only g or 1 s), and it can be expected that the parametric model will have more failures to converge when estimating small samples of data when the data pattern follows the reduce model.
The number of attributes K is set to 3 or 5, and the number of i tems J is set to 20 or 40.
The ratio of fixed graded scoring items is 50%, which means that when J = 20, the 0 1 score items are in the range of 10, and the graded scoring items are also in the range of 10, and the maximum graded scoring item category is s et to 4 categories (0, 1, 2, and 3).
Two types of matrices were also considered: restricted matri and unrestricted matri matrices were randomly generated using an R program.
Under the restricted matrix condition, the same attributes would not be set between the categories of the graded scoring program, and under the unrestricted matrix, the attributes examined in each category would be identical. matrices were tested for completeness (KΓΆhn & Chiu, 2017) The sample size N was set to 30, 50, 100, and 500, respectively.
For data that confirming the seq DINA model, three item qualities were set: high, medium, and low. Combining the parameter settings of Ma & Torre 2016) et al. ( 2018) , when the item quality is high, the sliding and guessing parameters of the items are randomly generated in a U(0, 0.1) distribution.
When the project quality is moderate, the project parameters are randomly generated in U(0, 0.2). When the project quality is poor, it is randomly generated in U(0, 0.3).
For data conforming to the seq GDINA model, the project parameters were designed with reference to et al. ( 2018) described in the Appendix ζͺζΎε°εΌη¨ζΊγ
In a uniform distribution, each attribute mastery pattern has the same probability of being selected (p = 1/2K). In the case of multivariate normal distribution, the latent distribution of the attribute mastery patterns is set to follow a multivariate norm al distribution, and the relationship between the attributes follows , with being the covariance matrix
Ξ£ = (
Where π was set to 0.5( π = 0.5) γ
Randomly generate potential continuous scores that conform to , and the is determined by the following conditions In this study, the GDINA package in R is used to call the seq GDINA model (GDINA function), and the seq GNPC method calls a self programmed program in R.
The pattern matching rate (PMR) and average attribute matching rate (AAMR) are used as evaluation metrics, which are calculated as follows
πππ = β π π _ πππππππ‘
π = 1
π΄π΄ππ = β β π ππ _ πππππππ‘ ππΎ
π = 1
π = 1
PMR can be used to calculate the classification accuracy of cognitive diagnostic methods for attribute mastery patterns on each dataset, and AAMR can be used to calculate the classification accuracy of cognitive diagnostic methods for attribute levels on e ach dataset.
Beyond that, the number of convergences of different cognitive diagnostic assessment methods over 100 repetitions counted in this study.
This is used to indicate the presence of non estimable conditions (e.g., failure to converge or lack of response for a category in a graded scoring program), which are not included in the calculation of the PMR and AAMR of the parametric model.
3.2.2 E
mpirical Design In this study, the TIMSS 2007 Mathematics Assessment numbered M041052, M041281, M041275, M041275, M031303, M031309, M031245, M031242A, M031242B, M031242C, M031247, M031247, M031173, M031172 totaling 12 questions were used as empirical items.
These items examined a total of eight attributes, and the specific matrix with attribute meanings can be found in Ma & Torre 2016) Response data from 10 regions Singapore, England, Hungary, Sweden, Armenia, Norway, Colombia, Morocco, Qatar, and Yemen were selected, retaining data from those who responded on all items, yielding a total of 1,399 responses.
4. Results
Simulation Study
Results
The main findings of the simulation study are presented in this paper using a double Y axis bar chart and line graph. The line graph represents the number of replicates that can be analyzed by each of the four methods under the corresponding conditions convergence number corresponding to the Y axis on the right.
The bar charts display the mean PMR values for the estimated replicates, which corresponds to the Y axis on the left. Where "uni" indicates that the students' attribute mastery pattern is uniforml y distributed and "mvn" indicates that the students' attribute mastery pattern is multivariate normal distribution. The full results of the study are presented in table in the Appendix.
Data patterns conform to the seq GDINA model Overall, when the data pattern follows the saturated model (seq GDINA), the PMR as well as the AAMR of seq GNPC are basically better than those of seq GDINA in small samples, and are more advantageous in comparison. Under the simulation conditions set up i n this study, seq GDINA was only able to estimate all 100 replications in 55 out of 64 conditions, and still failed to estimate all 9 conditions.
In the case of PMR, the difference between the seq GNPC method and the seq GDINA model was greater when the restricted Q matrix was unrestricted than when the Q matrix was restricted Figure , and the difference was greater when K = 5 than when K = 3 Figure .The same was true for AAMR Figure Figure The PMR of the seq GNPC method ranged from 0.44 to 0.99 and the standard deviation ranged from 0.003 to 0.101 under all simulation conditions.
The AAMR ranged from 0.82 to 1.00 and its standard deviation ranged from 0.002 to 0.049.
The PMR of seq GDINA model ranges from 0.13 to 1.00 and its standard deviation ranges from 0.003 to 0.146. The AAMR ranges from 0.69 to 1.00 and its standard deviation ranges from 0.001 to 0.044.
The standard deviation of both methods decreases with increasing sample size, with the seq GNPC method decreasing more than the seq GDINA model.
The number of items as well as the Q matrix had a greater impact on the effect o f seq GDINA than on GNPC, and the distribution of individual attribute patterns had a smaller impact on the effect of both methods.
In order to compare the difference in classification effects between the seq GNPC method and the GDINA model with small samples, means were compared between PMR and AAMR using ANOVA for the N = 30, 50, and 100 conditions.
When using the unrestricted matrix, seq GNPC is
significantly better than seq - GDINA.
Data patterns conform to the seq DINA model When the data pattern followed the reduce model (seq DINA), the two parametric models (seq DINA, seq GDINA) showed a higher number of failures to estimate (converge), with 77 out of 192 conditions not being able to estimate all 100 repetitions.
The two nonparametric methods proposed in this paper (seq NPC, seq GNPC) were able to estimate them all. Both parametric methods had a range of 64 to 100 convergence counts across conditions and were able to fully estimate across conditions with N = 500.
Overall, the two nonparametric methods were able to be able to estimate all replicates and maintain comparable classified effects to the parametric methods at small sample conditions.
In this data model, the effect of item number J on the effectiveness of the two nonparametric methods is relatively small. Item quality had a greater effect on all four methods.
The PMR for the seq GNPC method ranged from 0.57 to 1.00 with a standard deviation range of 0 to 0.108. The PMR for the GDINA model ranged from 0.19 to 1.00 with a standard deviation range of 0 to 0.187. The PMR for the seq NPC method ranged from 0.59 to 1.00 with a standard deviation range of 0 to 0.103. The PMR for the seq A model ranged from 0.60 to 1.00 with a standard deviation range of 0 to 0.144.
The PMR for the seq DINA model ranged from 0.60 to 1.00 with a standard deviation range of 0 to The estimated standard deviations of the two nonparametric methods are smaller than those of the two parametric methods for small samples and are relatively more stable.
Figure Figure present partial results Table
mpirical
Results
These data were analyzed using seq GNPC and seq GDINA, respectively, and the results of the proportions of each attribute for both methods are shown in Figure Of the eight attributes, four (A3, A5, A7, and A8) had essentially the same proportions, and three (A1, A2, and A4) had widely varying proportions.
In order to better assess the effectiveness of seq GNPC in practical applications, its sampling re testability is evaluated.
Randomly select samples with sizes of N=30, 50, and 100 from all data for analysis, and calculate the attribute mastery pattern and consistency between the analysis results and the results obtained using all data.
Each sample size was repeated 100 times and the results are shown in Table It can be seen that in the case of small samples, seq GNPC is relatively more stable than seq GDNA, with a higher attribute model retest rate and a smaller standard deviation.
As for the attribute remeasurement rate, seq GNPC and seq GDINA are basically consistent, with seq having a lower standard deviation.
7. Discussion
In this paper, we extend the item evaluation scope of GNPC based on the Qc matrix, thus obtaining the seq GNPC method, and also extend the pre input method of GNPC, NPC, to the seq NPC method.
The results of simulation studies show that seq GNPC as well as NPC can maintain good results under small sample conditions.
When the data conform to the saturated model, the PMR of the seq GNPC method can outperform the seq GDINA model at small samples, which is consistent with the research results of the GNPC
method
(Chiu et al., 2018) , which is the main research purpose of this paper in proposing the seq GNPC, that is, to solve the problem of the saturated model's poor effect of discriminative classification at small samples. This paper, combined with the research results, has achieved the main research purpose. It is worth noting that the PMR of the seq GDINA model is very low when K = 5, and the type of the Qc matrix is unrestricted due to the fact that it is parameterized by the attribute parameters (Yamaguchi & Okada, 2020) which require more and more parameters to be estimated with the increase in the number of attributes. When other conditions are the same, the parameters estimated using restrictive matrix are more than those estimated using restrictive matrix. According to Equation , under the same conditions, more parameters need to be estimated using the unrestricted matrix than using the restricted matrix.
When using the restricted matrix, the GDINA model was considered for small samples only when fewer attributes were examined and the number of items was large.
In other cases, the seq GNPC method is preferred in the case of small samples.
When the unrestricted matrix is used, seq GNPC is best in the case for small samples. It
should be noted that if more attributes and fewer items are examined, seq GDINA is still worse than GNPC even if N = 500, so a larger sample size is needed if one wants to use the seq GDINA method in this case.
When the data conforms to a decay model (conjunctive or disjunctive), The PMRs of the two nonparametric methods proposed in this paper are essentially similar to those of the parametric model for small samples, which is consistent with the results of Sorrel et al.( 2023) .However, the parametric model shows a more significant inability to diagnostic classification n small samples. Even when the quality of the items is high, nearly half of the data cannot be discriminated against, whereas nonparametric methods not only achieve discriminative classification but also maintain good classification accuracy.
This is the secondary objective of this study, that is, to address the situation where parameter models cannot be estimated in small sample sizes.
In conclusion, the seq GNPC method proposed in this paper has good applicability in small samples and can effectively solve the problem of diagnostic evaluation of graded or mixed scoring programs in small samples, and does not have the labeling problem etermine the KS corresponding to the clustering category of clustering methods (Guo & Zhou, 2022) The method is more applicable in most small sample situations.
In an actual educational assessment, the attribute mastery patterns of students in the same class may be more similar than the theoretical distributions, their ORPs will be more uniform, and the probability of scoring in each category of a graded scoring i tem will be smaller than in the simulation condition.
Additionally, the number of attributes evaluated in the actual assessment may not be limited to 3 or 5; sometimes more attributes are exa mined.
In these cases, using a saturated parametric model to assess the class would not be satisfactory, and the seq GNPC approach would consistently produce good results.
Furthermore, seq GNPC can also be applied to small samples of psychological assessments where items in psychological assessments are mostly graded on a Likert scale and the relationships between the cognitive attributes examined and the item categories are mostly uncertain (using unrestricted matri ), and where the seq GNPC approach is superior to parametric modeling under these conditions. Non parametric cognitive diagnostic methods for small samples have a wide range of applications in practice and further research on non parametric methods needs to be further advanced (Sessoms & Henson, 2018) parametric methods also have some shortcomings, they can not estimate the project parameters and structural parameters (Chiu et al., 2018) , when there are enough conditions to collect large samples of data, the use of parametric methods is still preferred.
Parametric methods have higher stability and provide more information in large samples.
This study counted the number of replicates that the parametric model was able to estimate for analysis under each condition, but it did not discuss in depth the specific circumstances (number of attributes, number of graded scoring items, etc.) under whic h the parametric model would no longer fail to diagnose the classification under each sample size, which may need to be explored further.
Furthermore, s GNPC is a single strategy method, whereas et al.( 2024) proposed a multi
strategy GNPC method based on GNPC, so extending seq GNPC to multi strategy evaluation might also be a worthwhile direction for future research.
Currently, cognitive diagnostic methods are developing rapidly, and how to improve the application rate and applicability of cognitive diagnostic methods in practice (Zhang et al., 2023) is a topic that deserves further exploration.
ACKNOWLEDGEMENTS This research was supported by the Humanities and Social Sciences Fund of the Ministry of Education 22YJA190005 and the key open fund from Zhejiang Philosophy and Social Science Laboratory for the Mental Health and Crisis Intervention of Children and Adolescents, PR China (No. 23MHCICAZD04)
eferences Chiu, C. Y., & Douglas, J. (2013). A Nonparametric Approach to Cognitive Diagnosis by Proximity to Ideal Response Patterns.
Journal of Classification (2), 225 Chiu, C. Y., Sun, Y., & Bian, Y. (2018). Cognitive Diagnosis for Small Educational Programs: The General Nonparametric Classification Method.
Psychometrika (2), 355 Cuerda, C., Zornoza, A., Gallud, J. A., Tesoriero, R., & Ayuso, D. R. (2022). Deep learning assisted cognitive diagnosis for the D Riska application.
Soft Computing (2), 665 Davier, M. (2008). A general diagnostic model applied to language testing data.
British Journal of Mathematical and Statistical Psychology (2), 287 de la Torre, J. (2011). The Generalized DINA Model Framework.
Psychometrika (2), 179 Gao, L., Zhao, Z., Li, C., Zhao, J., & Zeng, Q. (2022). Deep cognitive diagnosis model for predicting studentsβ performance.
Future Generation Computer Systems , 252 Gao, X., Ma, W., Wang, D., Cai, Y., & Tu, D. (2021). A Class of Cognitive Diagnosis Models for Polytomous Data.
Journal of Educational and Behavioral Statistics (3), 297 Guo, L., & Zhou, W. (2022). Nonparametric methods for cognitive diagnosis to multiple choice test
items. Acta Psychologica Sinica , 53 (9), 1032 β 1043.
Guo, L., Zhou, W., & Li, X. (2024). Cognitive Diagnosis Testlet Model for Multiple Choice Items.
Journal of Educational and Behavioral Statistics (1), 32 Junker, B. W., & Sijtsma, K. (2001). Cognitive Assessment Models with Few Assumptions, and Connections with Nonparametric Item Response Theory.
Applied Psychological Measurement (3), 258 KΓΆhn, H.
F., & Chiu, C. Y. (2017). A Procedure for Assessing the Completeness of the Q Matrices of
Cognitively Diagnostic Tests. Psychometrika , 82 (1), 112 β 132.
Li, G., Hu, Y., Shuai, J., Yang, T., Zhang, Y., Dai, S., & Xiong, N. (2022). NeuralNCD: A Neural Network Cognitive Diagnosis Model Based on Multi Dimensional Features.
Applied Sciences Liao, M., & Jiao, H. (2023). Modelling multiple problemβsolving strategies and strategy shift in cognitive diagnosis for growth.
British Journal of Mathematical and Statistical Psychology (1), 20 Ma, W. (2019). A diagnostic tree model for polytomous responses with multiple strategies.
British Journal of Mathematical and Statistical Psychology (1), 61 Ma, W., & Guo, W. (2019). Cognitive diagnosis models for multiple strategies.
British Journal of Mathematical and Statistical Psychology (2), 370 Ma, W., & Torre, J. de la. (2020). GDINA: An R Package for Cognitive Diagnosis Modeling.
Journal of Statistical Software Ma, W., & Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses.
British Journal of Mathematical and Statistical Psychology (3), 253 Saso, S., Oka, M., & Uesaka, Y. (2023). Development of Assessment Tools for Depth Understanding Quantitatively with Cognitive Diagnostic Models. In K. Arai (Ed.), Advances in Information and Communication (pp. 766 774). Springer Nature Switzerland.
Sessoms, J., & Henson, R. (2018). Applications of Diagnostic Classification Models: A Literature Review and Critical Commentary.
Measurement: Interdisciplinary Research and Perspectives Shi, X., Ma, X., Du, W., & Gao, X. (2024). Diagnosing Chinese EFL learnersβ writing ability using
polytomous cognitive diagnostic models. Language Testing , 41 (1), 109 β 134.
Sorrel, M. A., Escudero, S., NΓ‘jera, P., Kreitchmann, R. S., & VΓ‘zquez Lira, R. (2023). Exploring Approaches for Estimating Parameters in Cognitive Diagnosis Models with Small Sample Sizes.
Psych Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models.
Psychological Methods (3), 287 989X.11.3.287 Tu, D., Zheng, C., Cai, Y., Gao, X., & Wang, D. (2018). A Polytomous Model of Cognitive Diagnostic Assessment for Graded Data.
International Journal of Testing (3), 231
Wang, C., & Gierl, M. J. (2011). Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examineesβ Cognitive Skills in Critical Reading.
Journal of Educational Measurement (2), 165 3984.2011.00142.x Wang, D., Ma, W., Cai, Y., & Tu, D. (2024). A general nonparametric classification method for multiple strategies in cognitive diagnostic assessment.
Behavior Research Methods Wei, J., Luo, L., Cai, Y., & Tu, D. (2023). A Multistrategy Cognitive Diagnosis Model Incorporating Item Response Times Based on Strategy Selection Theories.
Journal of Educational and Behavioral Statistics Yamaguchi, K., & Okada, K. (2020). Variational Bayes Inference Algorithm for the Saturated Diagnostic Classification Model.
Psychometrika (4), 973 Yuan, L., Liu, Y., Chen, P., & Xin, T. (2022). Development of a New Learning Progression Verification Method based on the Hierarchical Diagnostic Classification Model: Taking Grade 5 Studentsβ Fractional Operations as an Example.
Educational Measurement: Issues and Practice (3), 69 Zhang, S., Liu, J., & Ying, Z. (2023). Statistical Applications to Cognitive Diagnostic Testing.
Annual Review of Statistics and Its Application (Volume 10, 2023), 651 statistics Zhang, W., Meng, L., & Liang, B. (2023). EW KNN: Evaluating information technology courses in high school with a non parametric cognitive diagnosis method.
Interactive Learning Environments (10), 6783
GDINA model GDINA model GDINA model GDINA model DINA model DINA model
Figures
matrices when data conform to the seq GDINA model
MR under different matrices when data conform to the seq GDINA model
GDINA model
MR at different values of K when the data conform to the seq GDINA model
DINA model
partial results when data conform to the seq DINA model
Tables
Itemqual Method I Method J PMR(I AAMR(I restricted GDINA GDINA
Itemqual Method I Method J PMR(I AAMR(I restricted GDINA GDINA
Method
Attribute mastery Attribute pattern GDINA