The Effect of Science, Technology, Engineering and Mathematics (STEM) Program on Students’ Achievement in Mathematics: A Meta-Analysis

The positive impact of Science, Technology, Engineering, and Mathematics (STEM) programs on student achievement, attitude, interest, communication skills and problem-solving has alerted the education community to reform instructional approaches in STEM subjects. This meta-analysis study aims to analyze previous studies’ results of STEM program impacting students’ mathematics achievement. The criteria for inclusion of literature in the meta-analysis were: published between 1998 and 2017, employed experimental research design, and reported data necessary for computing effect sizes. Based on the Hedges-g effect size values, three of the studies were categorized into the large effect category (n=3), two studies were classified under the medium effect group (n=2), and twelve studies were considered to be small effect sizes (n=12). The overall weighted average effect size was 0.242 with a corresponding p-value of 0.023 demonstrating that STEM had an impact on student mathematics achievement. However, no evidence was found concerning the impact of STEM program on students’ mathematics achievement based on the three moderator variables: education level, publication source, and length of intervention. Limitations and implications of the study were discussed.


INTRODUCTION
During the 21st century, workforce related to science, technology, engineering, and mathematics (STEM) fields has become increasingly important (Ashford, 2016;Khalil & Osman, 2017;Wilhelm, 2014). Many countries have integrated STEM education into the school curricula providing a meaningful learning environment. According to McCaslin (2015), STEM education is a critical tool for improving students' knowledge and understanding in related fields. Also, the integration of STEM project-based learning embraces the constructivism and cognitive principles in the learning process (McCaslin, 2015). This principle is believed to provide benefits for students because they learn more by actively engaging rather than merely listening and focusing on the critical thinking and understanding problems conceptually. In STEM PBL classrooms, students are accustomed to cooperative learning and discussion, learning by questioning and exploring, investigating into various tasks, and applying the knowledge, they possess (Olivarez, 2012;Olusegun, 2015;Shahali et al., 2017). Classroom environments should be focused on collaboration and exchange of ideas The integration of STEM activities can cultivate student thinking skills which can help students form the ability to analyze, evaluate, make conclusions and arguments correctly and logically about problems to be solved (Chia & Maat, 2018;Dwyer, Hogan, & Stewart, 2014). Tolliver (2016) stated that students need to possess useful innovation and creativity skills in finding solutions to any related problems. STEM integration can create active, creative, critical, and communicative human beings (Bahri, Suryawati, & Osman, 2014;Tolliver, 2016). During STEM activities, students are learning contextually and focusing on the applied knowledge of STEM to solve real-world problems (Berland, Steingut, & Ko, 2014). Hence, meaningful STEM activities are challenging for educators to develop and integrate for raising students' interest and eventually to boost their academic potential in STEM subjects (Shahali et al., 2015).
Improving students' academic achievement is one of the long-term goals of any educational institution (Brown, 2012). Prior research has shown the integration of STEM has a positive impact on elementary, middle, and high school student achievement (Han et al., 2016;Hansen & Gonzalez, 2014;Ing, 2013Ing, , 2014James, 2014;Judson, 2014;McCaslin, 2015;Tolliver, 2016). For example, McCaslin's (2015) experimental studies with fourth-grade schools in Georgia has shown the effects of STEM education on student achievement on number and operation, data measurement, and analysis, geometry and algebra. The assessment showed that there were improved results in the achievement of students following STEM-based learning. Ashford (2016) discovered that the STEM after-school program increased the academic achievement of 75 third, fourth, and fifth-grade students. Similarly, Olivarez (2012) found that a STEM program had a positive impact on 176 eighth grade students' mathematics, science and reading achievement.
Several studies have reported STEM integration can improve student involvement during classroom instruction. Educators who integrate STEM into the learning process encourage students to be active learners. Meaningful activities should include all the STEM disciplines and the real world application so that students can see the connection between the content they are learning with their daily life context (Kuenzi, 2008;Osman & Saat, 2014). In fact, Thomas (2013) stated that more coordinated activities in the STEM could cultivate students' positive attitude in encouraging them to pursue further mathematics. Also, Wong and Wong (2010) claimed meaningful STEM activities do not solely improve the understanding of concepts but increase student interest in these subjects (Kutch, 2011;Lee, 2013).
The integration of STEM in school curricula aims to strengthen the ability of students to be critical thinkers and analytical problem-solvers (Nasarudin, Halim, & Zakaria, 2014) through interactive learning experiences (Fortus et al., 2005). Although STEM positively improved student achievement outcomes, there are still some negative results when integrating STEM activities that are evidenced by James' (2014) study which showed that were no effects using the STEM approach on the mathematics achievement of seventh graders in central Tennessee. This failure might have occurred due to the teacher being less trained in the STEM approach or the lack of school support. Additionally, the study participants were not on the same mathematics level as indicated by the findings. Along the same lines, when applying a specific STEM approach in the classroom, the role of the teacher is essential (McCaslin, 2015). As teachers have a significant role in student learning, they must understand the content, prior knowledge, challenge and support for students to learn actively in building new understanding and being able to solve various mathematical problems (Walker, 2008). A welldesigned STEM approach is needed to modernize classroom instruction to help teachers and students solve problems relevant to the 21st century (Wilhelm, 2014).
Previous studies utilized empirical data that focused on the STEM in schools, starting from primary to high schools (Han et al., 2016;Hansen & Gonzalez, 2014;Ing, 2013Ing, , 2014James, 2014;Judson, 2014;McCaslin, 2015;Tolliver, 2016). All these studies integrated STEM activities in the teaching and learning process to improve student achievement in STEM subjects. One review conducted by Kulturel-Konak, D'Allegro, and Dickinson (2011) focused on gender differences based on STEM programs, but no prior research has examined the effects of the STEM approach on students' mathematics achievement statistically. Therefore, there is a need for a meta-analytic study that discusses the effect of STEM activities on students' mathematics achievement. 3 / 12

The Purpose of the Study
Many scholars believe STEM programs are an appropriate curriculum at primary school to university level because of its interdisciplinary component (Han et al., 2016). Thus, it is essential for us to understand the effectiveness of STEM programs (independent variable) in improving student mathematics achievement (dependent variable). We also investigated the factors (moderator variables) that could affect the incorporation of STEM activities during the learning process of mathematics. This meta-analysis was conducted to answer the following questions:

The Journal Selection Criteria
One of the challenging issues in mathematics education is the integration of STEM for increasing student achievement and to cultivate a positive attitude and interest toward the STEM field (Berlin & Lee, 2005;Brown, 2012;Dejarnette, 2012;Kuenzi, 2008). Many researchers have conducted studies related to the integration of STEM in the classroom during the past 20 years. The findings showed that STEM education is an appropriate curriculum to be integrated into the 21st-century classroom instruction. Hence, this metaanalysis included studies starting from 1998 to 2017 that focused on the impact of STEM on students' mathematics achievement.
The identification of the keywords used in the search process was the first step in conducting a metasynthesis. Keywords used were "Science, Technology, Engineering, and Mathematics," "STEM," "STEM and Achievement." The following databases were included in this meta-analysis study: a) PsycINFO (ProQuest), b) ProQuest Digital Dissertations and Theses Full text, and c) Educational Resources Information Center (ERIC, through EBSCOhost). Also, the search for studies in mathematics education was done manually through the leading journals: Teaching Children Mathematics, Educational Studies in Mathematics, Mathematical Thinking and Learning, American Educational Research Journal, School Science, and Mathematics, For Learning of Mathematics, and Journal for Research in Mathematics Education, Google Scholar. Finally, search for journals was done to locate articles which were included in the university's institutional repository.
The first stage of the search process yielded a total of 5064 articles, but many studies were not suitable for the topic. We next located 134 research sources from dissertations, journals, and proceedings related to STEM education. The last step was to examine in depth every research article thoroughly based on the inclusion and exclusion criteria. The eligibility criteria for the inclusion were (a) the research design (i.e., quasi-experiment with treatment and control groups or pretest-posttest group), b) mathematics achievement as the outcome; c) reporting the integration issues of STEM learning, and (d) statistical data for calculating effect size (e.g., frequency, average, proportion, t-test, standard deviation, coefficient).
Many articles were rejected because a) the outcome measure was not a mathematical topic (n=45; e.g., Richardson, 2016;Robinson et al., 2014), b) the studies were based on pedagogical strategies for teachers (n=38; e.g., DeBiase, 2016; Glavich, 2016), c) they used a non-experimental designs (n=21; e.g., Alumbaugh, 2015;Petersen, 2014), d) they were focused on literature reviews and framework (n=9 e.g., Becker & Park, 2011;Harper, 2010), and e) the articles did not report statistical data (n=4; e.g., Uttal, Miller, & Newcombe, 2013;Walker & Sherman, 2017). After the complete examination of articles, only 17 articles were selected for further analysis as shown in Table 1. Many of the studies were from the United States because STEM education has become a regular part of the U.S. school curriculum in comparison to other countries.

Coding Process
The information derived from each of the studies was coded using a coding sheet developed iteratively based on the information provided. The first researcher coded the articles according to the predetermined information such as author name, publication type, publication year, the study participant, sample size, the instrument used, statistical analysis performed, and treatment type. The coding sheet was the essential instrument of the meta-analysis study for gathering information to compute the effect size and compare the results among studies. Effect size (I would put a citation here) is a calculated statistic based on quantitative information from each study and is essential for the moderator analysis in exploring the variability of the studies. The first researcher frequently discussed with the second researcher throughout the coding process and effect sizes computation. Many articles did not provide complete information; hence we contacted the authors to obtain relevant data for calculating the effect sizes.

Statistical Analysis
Results were obtained from the effect sizes derived from nine published journal articles and eight doctoral dissertations. All studies utilized the experimental design with posttest means for the treatment and control groups. Standardized mean difference effect size, Cohen's d is required to synthesize and compare the mean value of research results between these treatment and control groups (Lipsey & Wilson, 2001). The effect size is the value that can be used to compare studies that used different measurement procedures. The calculation of effect size is based on the mean and standard deviation between two contrast groups. To correct for small sample bias, d was converted to the unbiased estimator g (Hedges & Olkin 1985). The Hedges-g effect size is interpretable and a meaningful value that helps researchers in making assessments across the variety of studies.
The Comprehensive Meta-Analysis 3.0 software (CMA) was utilized to compute the Hedges-g effect sizes, standard error, variance, lower limit, upper limit, Z-value, and a p-value of each study. Besides, the CMA was used to calculate the distribution of homogeneity (Q statistics), I-squared, Tau-squared and moderator analysis that have been identified based on coding sheets (Borenstein et al., 2005). Also, in processing the data analysis, Microsoft Office Excel (Ms-Excel) version 2010 was applied to ease of computation of the prediction intervals.
In the study, we hypothesized three potential variables that might influence students' achievement such as a) education level (elementary, secondary, and university); b) publication source (dissertation and journal); and, c) length of intervention (long term, for program >1 year and short-term, for program < 1 year). The moderator analysis was performed in CMA by comparing the average effect sizes of mathematics achievement in different categories that formed the stages of the moderator. The moderator analysis utilized a mixed effects analysis using a Q test based on the analysis of Variance (ANOVA).

Description of Selected Studies
The present meta-analytic study reviewed 17 articles that were published between 2004-2016 (see Table  1). The studies were published in two countries, the U.S. (n=16) and Malaysia (n=1). The total number of participants in all studies was approximately 137,389 students. On average, studies utilized sample sizes of more than 100 students at elementary (grade 3-8), secondary (grade 6-12), and university levels. The studies covered the mathematics content areas of algebra, geometry, probability, and problem-solving. The STEM program approaches were: Discovery-STEM, STEM-PBL, and VSTops-STEM Module. All studies used the quasi-experiment research design.
In the next section, we discuss results from the first research question: What effect does STEM integration have on improving student achievement in mathematics?. In general, the effect size of the STEM programs on students' mathematics achievement was categorized into three groups of Becker (2000) criteria: 0.2 (small effect), 0.5 (medium effect) and 0.8 (large effect). Results from the analysis are displayed in Table 2 based on the Hedges-g effect size values. The CMA results showed that two studies fall into the large effect category (n=2), two studies fall under the medium effect group (n=2), and 13 studies are considered to have small effects (n=13). Table 2 displays the overall weighted average effect size as 0.242, which shows the mean difference between STEM and non-STEM programs. The 17 studies were sampled based on specific inclusion/exclusion criteria as outlined. The confidence intervals for the difference in means ranged from 0.034 to 0.450 indicating that the mean difference could fall anywhere in this range. This range does not include a difference of zero, which tells us that the true mean difference is probably not zero. Similarly, the Z-value for testing the null hypothesis (that the mean difference is 0) is 2.277, with a corresponding p-value of 0.023 demonstrating that the STEM program has an impact on students' mathematics achievement.

Heterogeneity of Effect Sizes
The observed effect size might vary from study to study, but a certain amount of variation is expected due to sampling error. The Q statistic is a measure of weighted squared deviation providing a test of the null hypothesis that all studies in the analysis share a common effect size wherein the expected value of Q would be equal to the degree of freedom (the number of studies minus 1). For this meta-analysis, the value of Q is 1311.583, (df=16, p < 0.05) where the statistical test is statistically significant indicating that the studies are heterogeneous (the true effect size varies from study to study). The value of I-squared is 98.780 demonstrating that 98% of the variance in observed effects reflects the percentage of variability attributable to true heterogeneity (i.e., over and above sampling error) with an estimate of variance of true effect sizes (in log units), Tau-squared = 0.182 and the estimate of standard deviation of 0.426 in true effect size. The prediction interval of 0.08 to 0.75 is based on the calculation from Microsoft Excel showing the STEM group experienced the effect of the program on mathematics achievement, with an impact ranging from 0.08 unit in some studies to 0.75 unit in others.

Moderator Analyses
The result of significant heterogeneity provides that other variables related to the characteristics of studies might differ either systematically or otherwise within the set of effect sizes. For the second research question: What factors are affecting the effectiveness on STEM integration in learning mathematics?, we explored further the result of the heterogeneity analysis and identified several factors such as the education level, publication source, and the length of the intervention on influencing the STEM program on mathematics achievement.

Level of education
The moderator analysis was run based on seven studies focusing on the STEM program at the elementary level, eight studies at the secondary level, and two studies at university levels. The values of Hedges-g, 95% confidence interval, Z, and p were computed through the CMA (see Table 3) and were described as: The Hedges-g value is 0.156 with a 95% confidence interval of -0.210 to 0.521. The Z-value for a test of the null is 0.833 with p = 0.405. b) Is the STEM program at the secondary level more effective?
The Hedges-g value is 0.314 with a 95% confidence interval of -0.024 to 0.651. The Z-value for a test of the null is 1.820 with p = 0.069. c) Is STEM program at the university level more effective?
The Hedges-g value is 0.259 with a 95% confidence interval of -0.416 to 0.93. The Z-value for a test of the null is 0.753 with p = 0.452.
The subgroup analysis was run for the studies and was grouped based on the level of education. The mean effect was approximately the same for elementary, secondary and university (Hedges-g of 0.156, 0.314, and 0.259 respectively). The test to compare the three levels yielded a Q-value of 0.390 with df = 2 and p = 0.823. The analysis shows that there is no evidence that the impact of STEM programs vary by educational levels. Table 4 displays the results of the moderator analysis based on the publication source (eight dissertation studies and nine journal article studies). The values of Hedges-g, 95% confidence interval, Z, and p were computed through the CMA and are described below: a) Is STEM program research published in a dissertation format more effective?

Publication source
For dissertations, the mean effect size is a mean difference of 0.089 with a 95% confidence interval of -0.249 to 0.426, a Z-value of 0.516 and a corresponding p-value of 0.606. b) Is STEM program research published in journals more effective?  For journals, the mean effect size is a mean difference of 0.376 with a 95% confidence interval of 0.062 to 0.691, a Z-value of 2.348 a corresponding p-value of 0.019. The p-value (<0.05) shows the STEM program published in the journal were more effective than the non-STEM.
The test to compare the two effect sizes (0.089 and 0.376) yielded a Q-value of 1.494 with one df and a corresponding p-value of 0.222. The statistical values indicated that there was no effect for STEM programs on students' achievement in mathematics based on publication source (dissertation versus journal).

Length of intervention
The analysis moderator (see Table 5) was based on the effect sizes that were computed on 12 studies categorized as a long-term program and five studies as a short-term program. The values of Hedges-g, 95% confidence interval, Z, and p were computed through the CMA and was described below: a) Are long-term STEM programs more effective than long-term non-STEM programs?
The Hedges-g value was 0.192 with a 95% confidence interval of -0.065 to 0.448. The Z-value for H0 was 1.465 with a p-value of 0.143. b) Are short-term STEM programs more effective than short-term non-STEM programs?
The Hedges g value was 0.370 with a 95% confidence interval of -0.038 to 0.777. The Z-value for H0 was 1.778 with a p-value of 0.075.
The subgroup analysis was run for studies based on the length of intervention. The mean effect was approximately the same for the long term and short term programs and 0.370). The test to compare the two effect sizes yielded a Q-value of 0.524 with df = 1 and p = 0.469. The analysis shows that there was no evidence that the impact of STEM programs vary by the length of intervention.

DISCUSSION
This meta-analysis provides a standard measure for the outcome of each study and produces an overall weighted outcome for the studies. Studies were selected from electronic databases and a review of relevant references from studies and articles. In this present study, the results demonstrate significant outcomes regarding the effects of the independent variable -STEM program on the dependent variable -students' mathematics achievement. The overall weighted average effect size of 0.242 indicated that STEM programs are educationally important for student achievement in mathematics. When examining individual studies, the analysis shows the majority, 10 of 17 studies yielded statistically significant positive effect sizes between 0.118 and 1.571. These findings illustrate that the STEM program approach utilized in these ten studies might have improved students' achievement in mathematics in some way. Besides, another four studies showed a positive effect but did not show statistically significant results with small effect size values between 0.004 and 0.127, and the remaining three studies produced negative effect size values. Interestingly, the three moderator variables (i.e., education level, publication source, and length of intervention) did not statistically affect STEM and non-STEM programs on students' achievement in mathematics learning.
These meta-analytic results are supportive of previous individual studies that show the effectiveness of STEM programs in improving student mathematics achievement (Han et al., 2016;Hansen & Gonzalez, 2014;Ing, 2013Ing, , 2014James, 2014;Judson, 2014;McCaslin, 2015;Tolliver, 2016). The overall results indicate that STEM programs have a positive impact on students' mathematics achievement. The indicators might be used by teachers to make decisions about whether to integrate a STEM program and to help teachers in developing materials (McCaslin, 2015;Wilhelm, 2014). It is a critical issue for schools and teachers, particularly when considering their budgets for implementing a STEM program. The large positive effect sizes of 1.103 and 1.571 indicate reasonable standards in conducting future research, especially when deciding the effective programs, sample sizes, and research designs.
Several limitations are noted in this meta-analysis study. First, some studies did not provide relevant data for effect size computation and failed to provide detailed information about the STEM program they had conducted in their studies. Also, the results presented in the selected studies were not explicit and not systematic. Thus, data analysis was performed based on available data even after contacting the author. We have excluded studies which the extraction of the information was not possible.
Also, the findings might encourage future researchers to plan comprehensive studies in examining the effect of STEM programs on other learning outcomes for example attitude, interest, parental support, and motivation as well as different moderator variables such as time period and research design.

CONCLUSION
In summary, the results of this meta-analytic study are promising and provide an overall effect for STEM programs on students' mathematics achievement as positive and statistically significant. Policy makers and teachers should utilize this evidence in reforming instructional approaches in a classroom for improving student achievement at all levels. It is hoped that more research will be conducted to answer many unsolved questions and to enhance our understanding of the complex nature of teaching and learning of STEM in providing students with 21st-century skills.

Disclosure statement
No potential conflict of interest was reported by the authors.