Reliability and Validity Analysis of Statistical Reasoning Test Survey Instrument using the Rasch Measurement Model

This study is an assessment of the reliability and validity analysis of Statistical Reasoning Test Survey (SRTS) instrument using the Rasch Measurement Model. The SRTS instrument was developed by the researchers to assess students’ statistical reasoning in descriptive statistics among Tenth Grade sciencestream students in rural schools. SRTS was a combination of a subjective test and an open-ended format questionnaire which contained of 12 items. The respondents’ statistical reasoning was assessed based on these four constructs: Describing Data, Organizing Data, Representing Data and Analyzing and Interpreting Data. The sample comprised of 115 (76%) girls and 36 (24%) boys aged 15-16 years old from a rural district in Sabah, Malaysia. Overall, the SRTS instrument was found to have a high reliability with a Cronbach’s alpha value (KR-20) of 0.81. Results also showed that SRTS has an excellent item reliability and high item separation value of 0.99 and 9.57 respectively. SRTS also has a good person reliability and person separation value of 0.81 and 2.04 respectively. Meanwhile, the validity of the SRTS instrument was appropriately established through the item fit, person fit, variable map, and unidimensionality. In conclusion, this study indicates that the SRTS is a reliable and valid instrument for measuring the statistical reasoning of science-stream students from rural secondary schools.


INTRODUCTION
Statistical reasoning, along with statistical literacy and statistical thinking are at the focus of interest and is one of the pertinent goals of learning outcomes in statistics education. Statistical reasoning as defined by Garfield and Chance (2000) is the way people reason with statistical ideas and make sense of statistical information. Studies related to statistical reasoning have been carried out extensively in other countries (Karatoprak, Karagöz & Börkan, 2014;Martin, 2013;Tempelaar, 2004;Ulusoy & Altay, 2017;Wang, Wang, & Chen, 2009) but in Malaysia, studies related to this field is progressively new in this decade. The current literature reveals that the level of Malaysian students' statistical reasoning is still poor and unsatisfactory (Chan & Ismail, 2013;Foo, Idris, Mohamed, & Foo, 2014;Ismail & Chan, 2015;"Misconceptions in Inferential Statistics", 2018;Zaidan et al., 2012).
At the secondary school level,  constructed an instrument which was modelled on the technology-based Geogebra software to assess the level of students' statistical reasoning in descriptive statistics. This instrument was developed based on the statistical reasoning construct proposed by Jones, Thornton, Langrall, Mooney, Perry and Putt (2000) and Mooney (2002), while the model of statistical reasoning by Garfield and Chance (2000) namely the Idiosyncratic, Verbal, Transitional, Procedural and Integrated Process was used to determine the level of students' statistical reasoning. This instrument is useful to assess the level of students' statistical reasoning in task-based interviews and small number of samples however it is not suitable for many samples, particularly in the study that utilizes a survey research method.
According to Garfield (1998), although one-to-one communication such as interviews or observations or examination of students' work such as statistical projects may be the best to assess students' statistical reasoning, a carefully designed paper-and-pencil instrument can also be employed to obtain information regarding students' statistical reasoning. Meanwhile, Karatoprak, Karagöz and Börkan (2015) asserted that qualitative methods are not practical for large groups of people. A survey method on the other hand is a more practical and systematic way to collect data and easier to administer and score. Besides that, it also provides an opportunity for researchers to gain more widely and comprehensive feedback from the respondents. Thus, due to these reasons, this study attempts to develop a survey research instrument by using the model of statistical reasoning proposed by Jones, Langrall, Mooney and Thornton (2004). The instrument which is known as the Statistical Reasoning Test Survey (SRTS), was specifically developed by the researchers to assess Malaysian Tenth Grade science-stream students' statistical reasoning in rural areas (Saidi & Siew, 2019).
In relevance to the Malaysian national mathematics achievement, the World Bank in 2010 reported that there was a gap in mathematics achievement between students in the urban and rural schools predominantly in poorer states like Sabah, where the urban school students achieved better results in mathematics than those in rural areas (Marwan, Sumintono & Mislan, 2012). Meanwhile, with regards to the assessment in the statistical learning conducted by Saidi and Siew (2019), the rural secondary school students in one of the district in Sabah were found to have a low level of understanding regarding the properties of measures of central tendency concept, where the students were unable to understand the concept of outliers in the data, as well as failed to understand which measures of central tendency could be used quantitatively and qualitatively. Besides that, the students were also found to have a difficulty in understanding the concept of representativeness in the measures of central tendency, since majority of the students were unable to provide which type of averages (mean, median, or mode) is the best to represent the data, either the data contained outliers or not. Since the students had a very poor understanding regarding the idea of representativeness and outliers in the measures of central tendency concept, majority of them failed to give the correct reasoning or justifications for the reasons why they chose a particular type of averages to best represent the data. These findings provided an early indication of students' poor level in statistical reasoning, particularly among the rural secondary school students in Sabah, Malaysia.
The Rasch Model is a psychometric technique that was developed to improve the precision of a constructed instrument, to monitor the quality of an instrument and compute the performances of respondents (Boone, 2016). It is the simplest model in the Item Response Theory (IRT) as it is a probabilistic model that assesses an item's difficulty and person's ability in such a way that they can be scored on the same continuous scale (Deane et al., 2016). The Rasch Model estimates the probability of a person in choosing a particular item or category (Mahmud & Porter, 2015). The item difficulty and person ability in the Rasch Model are measured in a logit scale (Runnels, 2012).
The analysis from the Rasch Model can inform the researcher about the person and item reliability, item and person separation, as well as Cronbach's alpha value. Meanwhile, the construct validity of an instrument can be assessed through the item and fit, variable map and un-dimensionality. Thus, the abovementioned key concepts will be used by the researchers to establish the reliability and validity evidence of the SRTS instrument using the Rasch analysis.

RESEARCH METHODOLOGY Instrumentation
The SRTS instrument is a combination of a subjective test and an open-ended format questionnaire which contains 12 items. It is developed by researchers based on the constructs of cognitive models of development proposed by Jones et al. (2004) to assess students' statistical reasoning. According to Jones et al. (2004), students' statistical reasoning can be assessed based on these four constructs, which are Describing Data, Organizing Data, Representing Data and Analyzing and Interpreting Data. Describing Data is related to the explicit reading of raw data or data presented in tables, charts, or graphical representations, while Organizing Data is related to arranging, categorizing, or consolidating data into a summary form. Representing Data is related to displaying data in a graphical form, while Analyzing and Interpreting Data is related to recognizing patterns and trends in the data and making inferences and predictions from data. These four constructs contain several sub-processes which guide educators and researchers to assess students' statistical reasoning.
The SRTS instrument aims to assess students' statistical reasoning among Tenth Grade science-stream students. The researchers adapted some of the items from Mooney (2002) and  and at the same time constructed new items. The researchers' purpose for adapting items from these researchers is because of their suitability in the context of Malaysian Tenth Grade secondary school students. Mooney (2002)'s study assessed middle school students' statistical reasoning hence some of the items in the study would not be suitable for the upper secondary school level.  provided more suitable items for the context of Malaysian upper secondary school students. However, the items in their study are technology-based which is not compatible with survey research. In spite of this, some of the items in the construct of Representing Data and Analyzing and Interpreting Data in  could be applied for the current study which used a survey research method. Table 1 shows the distribution of items in the SRTS instrument. This instrument has evidence of content validity as verified by an expert from a university.
There were three tasks in the SRTS instrument: Task 1, Task 2, and Task 3. Task 1 required the students to organize or group data from the raw data given (Item 1a) and expected the students to construct data displays from the grouped data created (Item 1b). The students' reasoning regarding which data displays were the best to represent the data was also assessed (Item 1c). The raw data in Task 1 were obtained from  instrument. Item 1a was a new item created by the researcher to assess the statistical reasoning in the sub-process of 'grouping or organizing data' in the Organizing Data construct. In order to identify whether the students' statistical reasoning in the 'grouping or organizing data' could extend to the Analytical level, a question was forwarded to the students as to whether they could organize the data in different ways. This item was familiar to the students and suitable for the context and level of Malaysian upper secondary school students. Item 1b was also created by the researcher for the same purpose (suitability of the context), as the Tenth-Grade secondary school students had prior knowledge on constructing or drawing a histogram and frequency polygon on a graph paper. Meanwhile, Item 1c was adapted from  study (e.g. Which graph do you think represents the data better, the histogram or the boxplot? Explain why). Constructing a data display for a given data set RD1 1b*** 1 Based on the table in 1a, construct a histogram and frequency polygon graph in the graph paper provided at the last page using a scale of 2 cm to 8 gram amount of protein on the horizontal axis and 2 cm to 2 fast food sandwiches in the vertical axis. Explain how. Evaluating the effectiveness of data displays in representing data RD2 1c** 1 In your opinion, which graph do you think represents the data better, the histogram or frequency polygon? Explain why.

Analyzing and Interpreting Data
Reading between data AI1 3d** 3 Compare the distribution of the two graphs. Explain your answer(s).
Reading beyond data AI2 2e** 2 In your opinion, which type of average (mean, median, and mode) is the most suitable to be used to represent both sets of data? Explain why.

* Item adapted from Mooney (2002) ** Item adapted from Chan and Ismail (2014) *** New Item
Task 2 required the students to reduce the data using measures of central tendency (mean, median, and mode) from two groups of data, where one of the groups contained a significant outlier (Items 2b, 2c, and 2d). Besides that, it also required the students to identify the unit of data value based on the data given (Item 2a). Students' reasoning regarding which type of average was the best to represent both data was also assessed in Task 2 (Item 2e). Item 2a was adapted from Mooney's (2002) study (e.g. Which country won the most gold medals? How can you tell?). Item 2b, 2c and 2d were similarly adapted from Mooney (2002) which assessed the reasoning in the sub-process 'summarizing data in terms of measures of central tendency, in the Organizing Data Construct (e.g. What is the typical salary for the actress? How did you determine the typical salary?). Mooney (2002) used the word 'typical' in the item because middle school students might not be familiar with the word 'average'. Since the Tenth-Grade science-stream students in this study already knew the term 'average', thus, the terms mean, median and mode were used for Items 2b, 2c, and 2d respectively. Meanwhile, Item 2e was adapted from Chan and Ismail (2014) (e.g. Which measures of center is the most suitable to represent the score obtained by students? Explain why).
Task 3 required the students to reduce the data using measures of spread from a data display (Items 3b and 3c). Besides that, students were also required to make comparisons about the distribution of the two data displays (Item 3d). Students' awareness of display features was assessed in Task 3 (Item 3a). The data (in bar graphs), displayed different distributions where one was normal while the other not normal (skewed to the right) and was constructed by the researcher. Items 3a and 3b were adapted from Mooney (2002) (e.g. Examine the bar graph. What information did you get from the graph? How can you tell? What is the range of pets sold? How can you tell?). Meanwhile Item 3d was adapted from  study (e.g. Compare the distribution of both box plots with respect to shape, center, and variability). Item 3c is a new item created by the researcher to measure the reasoning in the sub-process 'Summarizing the data in terms of spread' in the Organizing Data construct, since the concept of standard deviation is taught to Tenth Grade science-stream students in the Additional Mathematics subject.
Corresponds to the four levels of cognitive thinking identified in the SOLO taxonomy model are the Prestructural, Unistructural, Multistructural, and Relational levels. Jones et al. (2004) formulated four level of students' statistical reasoning, namely Idiosyncratic (Level 1), Transitional (Level 2), Quantitative (Level 3), and Analytical (Level 4). The previous work by Jones et al. (2000) and Mooney (2002) also used the same level to characterize the level of children and middle school students' statistical thinking respectively. The Idiosyncratic level corresponds to the Prestructural level, where students are engaged in the task but could be distracted or misled by irrelevant aspects. The Transitional level corresponds to the Unistructural level, where students would only focus on a single relevant aspect. Next, the Quantitative level corresponds to the Multistructural level, where students focus on more than one relevant aspect of the task. Lastly, the Analytical level corresponds to the Relational level, where students can make links between relevant parts of the domain. Based on these features, this study formulated an initial framework to assess the level of students' statistical reasoning for each of the items in the SRTS instrument ( Table 2).

Sample
The Rasch analysis was conducted based on the data collected from a pilot study with a total number of 151 Tenth Grade science-stream students from eight secondary schools in a rural district of Sabah, Malaysia. The students comprised of 115 (76%) girls and 36 (24%) boys aged 15 to 16 years old. In the Malaysian schooling system, the upper secondary school students who are academically inclined can choose between two main streams, either Science or Arts. Evidently, science stream students are more exposed to the statistical contents and mathematics related subjects

Procedure for Analyzing the Data
The items were analyzed using WINSTEPS version 3.73. Polytomous Rasch Model was used because the data for the SRTS instrument was in the form of polytomous data, where there are four possible scores of responses in all the items measuring the constructs in the SRTS instrument. They are "1" for Idiosyncratic, "2" for Transitional, "3" for Quantitative, and "4" for Analytical. Sumintono and Widhiarso (2015) stated that there are three fit indices criteria (Table 3) for establishing the reliability from the Rasch Model which are Cronbach's alpha, item and person reliability, and item and person separation. Meanwhile, the validity of the SRTS instrument using the Rasch Model can be established based on the analysis from the misfit order of the items. The logit which is produced from the Rasch analysis can give an indicator of the ability of a respondent in answering the items based on the item's difficulty (Olsen, 2003). According to Sumintono and Widhiarso (2015), item fit can inform the researcher whether the item is functioning normally in performing the supposed measurements, as well as to assess the suitability of the item. Moreover, it is indicated that the respondents had a misconception regarding the item if the item shows misfit. Boone, Staver and Yale (2014) and Bond and Fox (2015) suggested three criteria to be used for assessing the item fit, which are Outfit Mean Square Values (MNSQ), Outfit Z-Standardized Values (ZSTD), and Point Measure Correlation (PTMEA-CORR).
According to Bond and Fox (2007), Outfit MNSQ can inform the researcher about the suitability of the item in measuring the validity, while PTMEA-CORR informs the extent to which the development of the constructs has achieved its goals. A positive PTMEA-CORR value indicates that the item measured the construct to be measured, while a negative PTMEA-CORR value indicates otherwise. On the other hand, ZSTD are t-tests of the hypothesis which can inform the researcher whether the data perfectly fits the model. Any item that fails to fulfill these three criteria ( Table 4) needs to be improved or modified to ensure the quality and suitability of the item (Sumintono & Widhiarso, 2015).
Besides that, the Rasch analysis also provides the researcher information of the person fit. Boone (2016) stated that the Rasch Model can identify a person fit based on the unusual response pattern. For instance, the unusual patterns that are detected by Rasch analysis suggests that the student may guess wildly, cheat, or is careless when answering the items. The criteria for assessing person misfit are based on the 'MEASURE', Outfit MNSQ, and Outfit ZSTD (Edwards & Alcock, 2019;Nevin et al., 2015). According to Nevin et al. (2015), a high Outfit ZSTD value (> 2.0) coupled with a high MEASURE may indicate that a student with a high ability answered incorrectly on an 'easy' item. Meanwhile, a high Outfit ZSTD value (> 2.0) coupled with a low MEASURE may indicate that a student with a low ability answered correctly a 'difficult' item but incorrectly for the rest of items. According to Mohd Rahim and Norliza (2015), removing the misfit person from the Rasch analysis may improve the Rasch measurement scale such as its reliability.
In addition to the item fit and person fit, Variable Map (also called as Wright Map or Item-Person Map) which demonstrates the distribution of students' ability and item difficulty on a same logit scale -allows the researcher to identify if the items match the ability of the students (Bond & Fox, 2007). In the variable map, the item difficulty is listed on the right side of the map with the most difficult item placed on the top and the easiest item is placed at the bottom. Meanwhile, the person ability is listed on the left side of the map with the lower part for individuals with a low ability and the top is for individuals with a high ability. In other words, higher logits indicate persons with higher ability and more difficult items and vice versa (Iramaneerat, Smith & Smith, 2008).   Boone et al. (2014) Other than that, it is important to evaluate an instrument's unidimensionality to ensure whether it measures what it is supposed to measure (Abdul Aziz, Jusoh, Omar, Amlus, & Awang Salleh, 2014;Sumintono & Widhiarso, 2015), which is in this case, the construct of statistical reasoning. According to Ariffin, Omara, Isaa and Sharif (2010), the items which have been developed should test constructs which measures a single dimension only. The Rasch analysis uses the Principal Component Analysis (PCA) of the standardized residuals to measure to what extent the instrument's diversity measured what it is meant to measure. Sumintono and Widhiarso (2015) provided the criteria of unidimensionality based on the 'raw variance explained by measures' from the standardized residual variance. The value of 'raw variance explained by measures' which is higher than 20% is acceptable, higher than 40% is good, while higher than 60% is excellent. Meanwhile, the ideal value for the 'unexplained variance' should not exceed 15%. Table 5 shows the value for person reliability, item reliability, person separation, item separation and Cronbach's alpha (KR-20) value of the SRTS instrument based on the Rasch analysis in WINSTEPS. The value for person reliability is 0.81 with the person separation value of 2.04. Sumintono and Widhiarso (2015) stated that when the value of person reliability is higher than 0.80, it is 'good', while Bond and Fox (2007) stated that when the person reliability is higher than 0.80, this indicates a good and consistent response from the respondent. For the person separation, the value of 2.04 is interpreted as 'good', and this is supported by Linacre (2003) which stated that a good separation value of item difficulty is appropriate if the person separation value is higher than 2.00. Meanwhile, Krishnan and Idris (2014) stated that the person separation must be more than 1.00 to warrant that the students are measured across the spread.

Reliability, Item and Person Separation
In this study, the value for item reliability is 0.99 with an item separation value of 9.57. Sumintono and Widhiarso (2015) stated that an item reliability which is higher than 0.94 is interpreted as 'excellent'. Meanwhile, Bond and Fox (2007) stated that an item reliability value which is higher than 0.80 has a good value and is strongly acceptable, while a value less than 0.80 is less acceptable. As for the item separation value, the value of 9.57 is interpreted as high and fulfills the condition mentioned by Linacre (2003). Linacre (2003) asserted that an item separation value which is higher than 2.00 is interpreted as good. Meanwhile, Krishnan and Idris (2014) stated that an item separation value which is higher than 1.00 concludes that the items have enough spread.
Moreover, the Cronbach's alpha (KR-20) value which is 0.81 indicates that the SRTS instrument has a very high reliability of internal consistency (Sumintono & Widhiarso, 2015). Meanwhile, Bond and Fox (2007) stated that the value of Cronbach's alpha (which is based on the Rasch analysis approach) that ranges from 0.71 until 0.99 is acceptable as it is at the best level. Thus, this indicates that the SRTS instrument is highly suitable for the the actual research. Table 6 presented the misfit order of the items based on the value of Outfit MNSQ, Outfit ZSTD and PT-MEASURE CORR. The bold figures indicate that the items failed to fulfill the criteria suggested by Boone et al. (2014). It was discovered that the item which was placed at the top (OD2c) tends to be misfit. Thus, this item is considered for change or removal. However, based on the three criteria to identify misfit items suggested by Boone et al. (2014), item OD2c fulfilled all the criteria for Outfit MNSQ (1.31), Outfit ZSTD (1.4), and PTMEA-CORR (0.48). Thus, item OD2c is retained and unchanged. Meanwhile, four items (OD1, OD2b, OD3a, and RD2) fulfilled at least one of the three criteria suggested by Boone et al. (2014), while the rest fulfilled all the criteria. According to Sumintono and Widhiarso (2015), the items which fulfilled at least one of the criteria should be retained. Meanwhile, Abdul Aziz et al., (2014) stated that the item is misfit if all the three criteria are out of the fit range. Thus, no items were changed and removed from the instrument. Table 7 shows the person (which is the student in this case) whose response was most misfit with the Rasch analysis; or in other words, their response was different from the estimation given by the Rasch model. The students in the sample were coded accordingly -the F in F099 refers to the female while 099 was the student's number. The students were ordered according to the highest value of Outfit ZSTD. Based on Table  7, three students (F099, F082, and F085) scored an Outfit ZSTD value higher than 2.0 while one student (F004) had an Outfit ZSTD value lower than 2.0. The remaining students have an Outfit ZSTD value within the acceptable range (from -2.0 to +2.0). This indicates that in the pilot study, the items were suitable for almost all the students (97.35%) and the analysis conducted on those students showed quality findings for the assessment using the Rasch analysis.

Person Fit
A considerably high total score and MEASURE as performed by student F099 indicates that the individual most likely answered easy items incorrectly. This was indeed the case since for item RD1, students F099 scored only "2" while in fact item RD1 is regarded as an easy-to-answer item based on the Rasch analysis (-1.74 logit). Meanwhile, student F082 and F085 have a low MEASURE but have an Outfit ZSTD value higher than 2.0 which may indicate that they answered a difficult item correctly, but incorrectly for other items. This is true since for student F082, she scored "3" for a quite difficult item DD1 (-0.05 logit), while student F085 scored "3" for a difficult item AI1 (1.21 logit). Furthermore, a large negative Outfit ZSTD value for student F004 (-2.2) is to be viewed as "too predictable" (Linacre, 2002). Figure 1 presented the variable map which shows the distribution of persons (students) and items in a logit measurement scale. The variable map provides useful information on how the spread of item difficulty matches to the person ability (Sumintono & Widhiarso, 2015). Based on the right side of the variable map, item DD1 is calculated as being at the mean of the item difficulty estimates with a value of 0.00 logit. Six items spread above item DD1, while five items spread below it. It was realized that item AI2 was the most difficult item among the items in the SRTS instrument with a value of +1.96 logit, while item OD2a was the easiest item to be answered by the students in the pilot study with a value of -2.12 logit. This result was not improbable since item AI2 assessed the students' statistical reasoning in Analyzing and Interpreting Data (reading between data) -a question which is not usually presented in the statistical assessment within the Malaysian Mathematics syllabus. In contrast, item OD2a which is related to the mean concept was exposed  to students as early as fifth grade, which make it easier for Tenth Grade science stream students to solve this question.

Variable Map
The left side of the variable map shows the ability of students. On average (denoted by M in the line), the students were measured to have an ability below the 0.0 logit, which is -0.90 logit to be exact. Besides that, one student (F079), recorded the highest ability with the value of +1.94 logit, but exceeds the T (Two standard deviations) upper boundary, which indicates that this student has a different higher ability compared to the rest. Incidentally, six students exceeded the T lower boundary with the lowest three (F086, F0087, and F091) having recorded the value of -4.42 logit, which indicates that these three possessed the lowest ability among the rest of the students.
Based on the analysis from the variable map, it can be said that student F079 with the highest ability scored higher for all the items in the SRTS instrument. This is because student F079 has a +1.94-logit value, which almost matched the +1.96-logit value for the most difficult item in SRTS instrument, that is, item AI2. Contrarily, students F086, F0087, and F091 were unable to answer all the items since their ability (-4.42 logit) was still far below the easiest item in SRTS instrument, that is, item OD2a (-2.12 logit). Nonetheless, based on the spread of student ability and the spread of the item difficulty, some of the items (items placed above the 0.0 logit) are considered to be quite difficult by the students. Thus, actions will be taken by the researcher to reduce the difficulty of the items so that the items in the SRTS instrument are well targeted for the students in the study.

Unidimensionality
Based on Figure 2, the value for the 'raw variance explained by measures' is 61.9%. According to Sumintono and Widhiarso (2015), a value which is higher than 60% is 'excellent' and it indicates that the SRTS instrument has a strong evidence of unidimensionality, that is, the instrument undoubtedly measured the construct of statistical reasoning. Other than that, the unexplained variance for the 1 st until 5 th contrast is less than 10%, which falls in the ideal range value of less than 15%.

DISCUSSION AND CONCLUSION
Overall, the SRTS instrument has both a very high Cronbach's alpha , and item and person reliability based on the analysis from the Rasch Model. This indicates that the SRTS instrument is an extremely reliable instrument for assessing students' statistical reasoning among the Tenth Grade sciencestream students in rural schools, particularly in Sabah, Malaysia. The high item separation value indicates that the SRTS instrument has a greater spread of items (Klooster, Taal & Laar, 2008). Meanwhile, the high person separation value indicates that the students in the study can be well distinguished into three different abilities that is, high, medium, and low ability. Whether this is also the case for students in an urban school remains undiscovered, but it is suggested that the Rasch analysis on the SRTS instrument be conducted on a sample of students from urban schools.
In terms of validity, the researcher decided to preserve all the items since the items fulfilled at least one of the fit criteria for Outfit MNSQ, Outfit ZSTD, and PTMEA-CORR. Moreover, all of the items have a positive PTMEA-CORR value which indicates that the items move in one direction (Bond & Fox, 2015). On top of that, all the items have an Outfit MNSQ value within the acceptable range which indicates that the items are consistent with the item measurement. Bond and Fox (2007) stated that the value of Outfit MNSQ which is in the acceptable range is considered as good and productive for item measurement. For the person fit, only four students showed misfit, which indicates that the rest of the students provided a meaningful response for the Rasch analysis. Other than that, the SRTS instrument has a strong evidence of unidimensionality based on the result from the Standardized Residual Variance, and thus was an appropriate and legitimate choice of study on the students in the study.