Vol. 11 No. 12 (2025): December
Open Access
Peer Reviewed

Student Flowchart Automated Evaluation for Scalable Assessment in Introductory Programming

Authors

Usman Nurhasan , Didik Dwi Prasetya

DOI:

10.29303/jppipa.v11i12.13594

Published:

2025-12-31

Downloads

Abstract

This study evaluates the Automated Flowchart Assessment Tool (AFAT) to overcome limitations in semantic sensitivity and layout robustness prevalent in existing tools. Through a quantitative analysis of 312 student submissions, AFAT demonstrated superior diagnostic performance with a Micro-F1 score of 0.92 and substantial inter-rater agreement (Fleiss' Kappa = 0.88), supporting the hypothesis of expert-level accuracy. Key findings reveal that AFAT significantly enhances operational efficiency, reducing evaluation time by 61.2% (averaging 1.87 minutes per flowchart) while decreasing inter-rater variability by 28%. Generalized Linear Model (GLM) analysis confirmed significant time savings, particularly in high-complexity sessions (Wald χ² = 87.44, p < 0.001). Beyond technical efficiency, this research contributes to applied science education by providing a scalable framework for computational science literacy, enabling the rigorous assessment of algorithmic thinking within integrated STEM curricula. These results substantiate AFAT’s potential for large-scale deployment as a robust tool for automated scoring in formal educational settings

Keywords:

Diagnostic Accuracy Evaluation Efficiency Scoring Reliability Flowchart Assessment Semantic Robustness

References

Ariyanta, N. D., Prasetya, D. D., Ari, I., Zaeni, E., Wicaksono, R., & Hirashima, T. (2025). Assessing the Semantic Alignment in Multilingual Student-Teacher Concept Maps Using mBERT. 25(1), 113–126. https://doi.org/10.30812/matrik.v25i1.5046

Calderon, K., Serrano, N., Blanco, C., & Gutierrez, I. (2023). Automated and continuous assessment implementation in a programming course. Computer Applications in Engineering Education, 32. https://doi.org/10.1002/cae.22681

Chen, Z., Villar, S., Chen, L., & Bruna, J. (2019). On the equivalence between graph isomorphism testing and function approximation with GNNs. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc.

Chowdhury, T., Contractor, M. R., & Rivero, C. (2024). Flexible Control Flow Graph Alignment for Delivering Data-Driven Feedback to Novice Programming Learners. J. Syst. Softw., 210, 111960. https://doi.org/10.1016/j.jss.2024.111960

Cui, H., Xie, M., Su, T., Zhang, C., & Tan, S. H. (2024). An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues. 1(1), 1–26. http://arxiv.org/abs/2408.13855

Dikici, S., & Bilgin, T. T. (2025). Advancements in automated program repair: a comprehensive review. Knowledge and Information Systems, 67(6), 4737–4783. https://doi.org/10.1007/s10115-025-02383-9

Florou, C., Stamoulis, G., Xenakis, A., & Plageras, A. (2024). The role of educators in facilitating students’ self-assessment in learning computer programming concepts: addressing students’ challenges and enhancing learning. Educ. Inf. Technol., 30, 8567–8590. https://doi.org/10.1007/s10639-024-13172-2

Gambo, I., Abegunde, F.-J., Gambo, O., Ogundokun, R., Babatunde, A., & Lee, C. (2024). GRAD-AI: An automated grading tool for code assessment and feedback in programming course. Educ. Inf. Technol., 30, 9859–9899. https://doi.org/10.1007/s10639-024-13218-5

Geetika, Kaur, N., & Kaur, A. (2025). A Semantic-driven approach to detect Type-4 code clones by using AST and PDG. International Journal of Information Technology. https://doi.org/10.1007/s41870-025-02670-2

Huang, A., Lin, C., Su, S., & Yang, S. (2025). The impact of GenAI‐enabled coding hints on students’ programming performance and cognitive load in an SRL‐based Python course. British Journal of Educational Technology. https://doi.org/10.1111/bjet.13589

Huang, C., Fu, L., Hung, S., & Yang, S. (2025). Effect of Visual Programming Instruction on Students’ Flow Experience, Programming Self‐Efficacy, and Sustained Willingness to Learn. Journal of Computer Assisted Learning. https://doi.org/10.1111/jcal.13110

Kinnear, G., Jones, I., & Davies, B. (2025). Comparative judgement as a research tool: A meta-analysis of application and reliability. Behavior Research Methods, 57. https://doi.org/10.3758/s13428-025-02744-w

Lee, H.-Y., Lin, C.-J., Wang, W.-S., Chang, W., & Huang, Y.-M. (2023). Precision education via timely intervention in K-12 computer programming course to enhance programming skill and affective-domain learning objectives. International Journal of STEM Education, 10, 1–19. https://doi.org/10.1186/s40594-023-00444-5

Messer, M., Brown, N. C. C., Kölling, M., & Shi, M. (2024). Automated Grading and Feedback Tools for Programming Education: A Systematic Review. ACM Trans. Comput. Educ., 24(1). https://doi.org/10.1145/3636515

Pedagogy, M. (n.d.). Ontology Design of a Modern Learning Environment and Modern Pedagogy Using Protégé Software *. https://doi.org/10.30762/ijomer.v2i1.2755

Prasetya, D. D., Pinandito, A., Hayashi, Y., & Hirashima, T. (2022). Analysis of quality of knowledge structure and students’ perceptions in extension concept mapping. Research and Practice in Technology Enhanced Learning, 17(1). https://doi.org/10.1186/s41039-022-00189-9

Prasetya, D. D., Widiyaningtyas, T., & Hirashima, T. (2025). Interrelatedness patterns of knowledge representation in extension concept mapping. 1–18.

Pratama, W. S., Prasetya, D. D., Widyaningtyas, T., Wiryawan, M. Z., & Rady, L. G. (2025). Performance Evaluation of Artificial Intelligence Models for Classification in Concept Map Quality Assessment. 24(3), 407–422. https://doi.org/10.30812/matrik.v24i3.4729

Sakulin, S., Alfimtsev, A., & Kalgin, Y. (2025). Improvement of Computer Science Student’s Online Search by Metacognitive Instructions. Emerging Science Journal. https://doi.org/10.28991/esj-2025-sied1-03

Tong, Y., Schunn, C., & Wang, H. (2023). Why increasing the number of raters only helps sometimes: Reliability and validity of peer assessment across tasks of different complexity. Studies in Educational Evaluation. https://doi.org/10.1016/j.stueduc.2022.101233

Ulfa, S., Bringula, R., & An, R. (2025). An adaptive assessment : Online summary with automated feedback as a self-assessment tool in MOOCs environments Recommended citation : An adaptive assessment : Online summary with automated feedback as a self-assessment tool in MOOCs environments Saida Ulfa * Ence Surahman Agus Wedi Izzul Fatawi Rex Bringula. 17(1), 88–113.

Weegar, R., & Idestam-almquist, P. (2023). Reducing Workload in Short Answer Grading Using Machine Learning. International Journal of Artificial Intelligence in Education, 34(2), 1–27. https://doi.org/10.1007/s40593-022-00322-1

Weingarden, M., & Heyd-Metzuyanim, E. (2023). Evaluating mathematics lessons for cognitive demand: Applying a discursive lens to the process of achieving inter-rater reliability. Journal of Mathematics Teacher Education, 1–26. https://doi.org/10.1007/s10857-023-09579-2

Xu, X., Cao, Y., Hu, H., Xiang, H., Qi, L., Xiong, J., & Dou, W. (2025). MGF-ESE: An Enhanced Semantic Extractor with Multi-Granularity Feature Fusion for Code Summarization. In WWW 2025 - Proceedings of the ACM Web Conference (Vol. 1, Issue 1). Association for Computing Machinery. https://doi.org/10.1145/3696410.3714544

Ye, H., Liang, B., Ng, O.-L., & Chai, C. (2023). Integration of computational thinking in K-12 mathematics education: a systematic review on CT-based mathematics instruction and student learning. International Journal of STEM Education, 10, 1–26. https://doi.org/10.1186/s40594-023-00396-w

Zimmerman, A., King, E., & Bose, D. (2023). Effectiveness and utility of flowcharts on learning in a classroom setting: A mixed methods study. American Journal of Pharmaceutical Education, 100591. https://doi.org/10.1016/j.ajpe.2023.100591

Author Biographies

Usman Nurhasan, Universitas Negeri Malang

Author Origin : Indonesia

Didik Dwi Prasetya, Universitas Negeri Malang

Author Origin : Indonesia

Downloads

Download data is not yet available.

How to Cite

Nurhasan, U., & Prasetya, D. D. (2025). Student Flowchart Automated Evaluation for Scalable Assessment in Introductory Programming. Jurnal Penelitian Pendidikan IPA, 11(12), 1230–1240. https://doi.org/10.29303/jppipa.v11i12.13594