Human vs. AI: Assessing Scale Development for Perceived Risks of ChatGPT in Academic Settings

Abdelouahd Bouzar; Khaoula El Idrissi; Samia Moustaghfir

doi:10.53103/cjess.v6i2.469

Authors

Abdelouahd Bouzar Sidi Mohammed Ben Abdellah University, Morocco
Khaoula El Idrissi Sidi Mohammed Ben Abdellah University, Morocco
Samia Moustaghfir Ibn Tofail University, Morocco

DOI:

https://doi.org/10.53103/cjess.v6i2.469

Keywords:

Psychometric Scales, Reliability Assessment, Factor Analysis, Applications of Artificial Intelligence, Educational Technology

Abstract

In this exploration, we conducted an in-depth comparative research of psychometric instrument development and compared the potential of human subject-matter professionals with artificial intelligence systems to conduct the analysis of perceived risks related to ChatGPT in the academic setting. The respondents (two professor-level experts on research methodology) and the AI systems ChatGPT and Claude, produced four different eighteen-item scales, all of them dealing with the triad of risk measures: psychological, ethical, and practical. A group of twenty experienced evaluators was used to evaluate the quality of the items using the accepted assessments of evaluation after which the resulting scales were provided to a sample of 120 respondents who represented a wide academic speciality. Further statistical analyses involved Exploratory Factor Analysis (EFA), as well as, measurement of reliability through Cronbach alpha, calculation of Average Variance Extracted (AVE) and calculation of composite reliability. Evidence showed that the scales produced by AI had significantly higher psychometric quality; in particular, the scale of Claude had the highest score of reliability with the α= 0.942 and the composite reliability value CR= 0.930. The ANOVA did not show statistically significant differences between the scales (F < 2.183, p < 0.168), however, the effect size analysis placed significant emphasis on some differences in the scales; in particular, the effect size obtained between Claude and Professor 1 was large (d < -1.578). Measurement Factor loading tests supported the construct validity of all measures, but AI-generated items indicated a slightly better factor structure. This further implied that the AI-generated scales, and particularly the scale created by Claude, were better than the human-created scales in regards to Clarity, Specificity and Overall Quality, but the human-created scales retained competition in regards to Relevance. All these outcomes mean that AI systems can produce high-quality psychometric tools that are equal or even superior to conventional human-created scales, meaning that they could change the effectiveness of instrument creation in psychological research and remain at high psychometric standards.

Downloads

Download data is not yet available.

References

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.

Cotton, D. R., Cotton, P. A., & Shipway, J. R. (2023). Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2), 228-241. https://doi.org/10.1080/14703297.2023.2190148

Hinkin, T. R. (1998). A brief tutorial on the development of measures for use in survey questionnaires. Organizational Research Methods, 1(1), 104-121. https://doi.org/10.1177/109442819800100106

Kline, R. B. (2016). Principles and practice of structural equation modeling (4th ed.). Guilford Publications.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

Revelle, W. (2021). psych: Procedures for psychological, psychometric, and personality research [R package version 2.1.9]. Northwestern University. https://doi.org/10.32614/CRAN.package.psych

Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. Routledge/Taylor & Francis Group.

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36. https://doi.org/10.18637/jss.v048.i02

Tavakol, M., & Dennick, R. (2011). Making sense of Cronbach's alpha. International Journal of Medical Education, 2, 53-55. https://doi.org/10.5116/ijme.4dfb.8dfd

Worthington, R. L., & Whittaker, T. A. (2006). Scale development research: A content analysis and recommendations for best practices. The Counseling Psychologist, 34(6), 806-838. https://doi.org/10.1177/0011000006288127