ThesisOpen Access

Assessing validity, reliability, and fairness of Quranic recitation assessment rubrics instrument in the Musabaqah Tilawatil Quran (MTQ) : leveraging Many-Facets Rasch Measurement (MFRM)

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Universitas Islam Internasional Indonesia

Publisher DOI

Volume

Issue

Resources

Total Views: 0Total Downloads: 0
download count data not available for this item.

Abstract

Psychometric attribute testing of an assessment instrument is a crucial step that must be undertaken before the instrument is widely implemented. Failure to implement this stage can increase the risk of assessment discrimination, leading to unfair outcomes and a loss of credibility in the evaluation results. One important instrument that has yet to be empirically validated is the Quranic recitation assessment rubric in the Musabaqah Tilawatil Quran (MTQ) competition. Although some normative evaluations have been conducted, quantitative evidence regarding its validity, reliability, and fairness remains limited. This gap helps explain the recurrence of injustice and participant dissatisfaction with the assessment results in the MTQ competition in recent years. Moreover, the complexity of the assessment system, characterized by multiple raters and a performative evaluation, has not been fully accommodated by its current analytical approaches. Therefore, this study aims to empirically examine the quality of the Quranic recitation assessment rubric using the Many-Facet Rasch Measurement (MFRM) approach. Adopting a quantitative and nonexperimental design, the study involved 50 students as the ratees and 16 judges as the raters. The assessment was conducted using the official rubric from the Tilawatil Quran Development Institute (LPTQ), which consists of four dimensions: Tajwed, Fashahah, Lagu, and Suara, comprising 19 items. To ensure assessment consistency, raters received a workshop and a guidebook before the assessment process. The data collection produced 3.760 quantitative responses and 674 qualitative responses, with 40 responses identified as invalid. The results demonstrate that the rubric possesses good construct validity, with Infit and Outfit MnSq values ranging from 0.98 to 1.27 and 0.78 to 0.98, respectively. Infit and Outfit ZSTD values ranged from -0.3 to +1.3 and -0.5 to -0.3, respectively. While point measure correlation values ranged from 0.35 to 0.56. Reliability was also found to be satisfactory, with values ranging from 0.52 to 0.88 for ratees, 0.92 to 0.99 for items, and 0.87 to 0.92 for raters. Fairness was found to be relatively high, as both ratee and rater data fit the model, with only 0.02% significant bias and 4.06% unexpected responses. However, one invalid item was identified in the Fashahah dimension, along with several disordered thresholds in the middle scale categories. Additionally, the reliability and separation index for the ratees in the Fashahah dimension were statistically low. These findings highlight the need for revision of certain item descriptors and restructuring of the rating scale categories. Further training for raters may help align their interpretations of the rubric and promote more standardized and fairness in assessments.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By

License

Except where otherwised noted, this item's license is described as All Rights Reserved