Use of Statistical Tools in EFL Research

Ashok Sapkota*


This article explores the use of statistics, the types and its role in English as Foreign Language (EFL) research. In general, it tries to relate how the components (tools) of descriptive and inferential statistics are useful in accomplishing EFL research. It relates current approaches and practices and its role in research. Furthermore, it explores some of the statistical software’s useful in analyzing the quantitative data.


The word ‘statistics’ derives from the modern Latin term statisticum collegiums (council of state) and the Italian word statista (statesman or politician). ‘Statistics’ was used in 1584 for a person skilled in state affairs, having political knowledge, power or influence by Sir William Petty (Brown & Saunders, 2008). Statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. ‘Data’ is the common term used in quantitative research for facts that have been obtained and subsequently recorded and for statisticians, ‘data’ usually refers to quantitative data that are numbers. Quantitative data analysis is a research which is related to the positivistic tradition. It is useful both in small-scale and large scale investigations and research designs, such as: experimental, quasi-experimental, case studies, action research, and correlational research (Cohen, Manion & Morrison, 2011). Understanding statistics is essential as a part of a scientific approach for analyzing numerical data.  Furthermore, Statistics helps to maximize our interpretation and use. It denotes that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient.

Formally statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence (Levin & Fox, 2006). The possibility of confusion lies in the selection of several statistical techniques used on quantitative data and the numerical results from statistical analysis.

The Need of Statistics

Statistics is a quantitative research tool which is used to turn data into information. This is a range of procedures for gathering, organizing, analyzing and presenting quantitative data (numbers) (Mackey & Gass, 2005). It is obvious that society cannot be run effectively on the basis of hunches or trial and error, and that in business, economics, social sciences or in education and much depends on the correct analysis of numerical information (Brown & Saunders, 2008). Decisions based on data will provide better results than those based on intuition or gut feelings.

Much of everyday life depends on making forecasts, and business cannot progress without being able to audit change or plan action. In social science research, when we look at areas such as purchasing, production, capital investment, long-term development, quality control, human resource development, recruitment and selection, marketing, credit risk assessment or financial forecasts or others, it is essential to use of statistics to collect data and analyze them. In research practices, statistics can play a very important role in answering to the research questions in such a manner that is able to quantify, measure, place  a level of confidence on the findings, make an assessment of the contribution each variable (Kumar, 2011). Likewise, in EFL research, when we compare the achievement of the students’ proficiency in different test items, regularity of students, effect of non linguistic factors in language learning, achievement in languages, language skills, etc. (Sapkota, 2012) ; the role of statistics is essential.  The use of statistics can be further presented as following:

  • Statistics helps to quantify or convert the text information into number, makes easy to understand and to make meaning of the information.
  • Statistics helps to display the information in clear form using graph, chart, table, frequency distribution, cross tabulation etc.
  • Statistics also helps to reduce the large number of quantitative data to a small number in a more efficient way
  • Statistical tools such as central tendency, dispersion helps to describe the data. The central tendency gives the central value of a data distribution whereas the dispersion tells the scatterness or variability of individual data from the central value.
  • Statistical tools help us to show the relationship between two or more variables, such as correlation is used which gives you the direction, strength and the significance of relationship between variables.
  • Statistics helps to predict the value of dependent variable in terms of independent variable(s). The regression analysis is the statistical tools which helps to establish the relationship between explained and explanatory variables.
  • In quantitative analysis generalization is made for the population based on the sample results.

Types of Statistics and its Components

Statistics is the study of how to collect, organizes, analyze, and interpret numerical information form data. In general, there are two major types of statistics: descriptive and inferential statistics.

Descriptive statistics is concerned with quantitative data and the methods for describing them. (‘Data’ (facts) is the plural of ‘datum’ (a fact). This branch of statistics is the familiar concept because descriptive statistics are used in everyday life in areas such as government, healthcare, business, sport and education, pedagogical situations (Brown & Saunders, 2008) in order to describe the status or data. Descriptive statistics involves methods of organizing, presenting/picturing and summarizing information from data. It does not concern with inferencing or predicting population parameters, rather it is concerned with enumeration and organization of data (Cohen, Manion & Morrison, 2011). It includes the following statistical components:

  • Measures of frequency (how often a particular behavior or phenomenon occurs)
  • Measures of central tendency: the mode (the most common or frequent score or score obtained by the greatest number of people), the mean (the average score) and the median (the score obtained by the middle person in a rank group of people)
  • Measures of Dispersion (the extent to which items vary from central value);
  • Range (the difference between the highest and lowest score in a data set, i.e. L – S)
  • Interquartile range (the difference between the third and first quartile value of a data set, i.e. inter-quartile range)
  • Mean deviation (the average absolute deviation – average amount by which the items differ from mean/median)
  • The variance (the average of the squared deviations from the mean)
  • The standard deviation (a measure of the dispersal or range of scores, calculated as the square root of the variance)
  • The standard error (the standard deviation of the sample means)
  • The skewness (how far the data are asymmetrical in relation to a ‘normal’ curve distribution)
  • Kurtosis (how steep or flat is the shape of a graph or distribution of data; a measure of how peaked a distribution is and how steep is the slope or spread of data around the peak)

 Inferential (analytical) statistics makes inferences about populations (entire groups of people or firms) by analysing data gathered from samples (smaller subsets of the entire group), and deals with methods that enable a conclusion to be drawn from these data. An inference is an assumption, supposition, deduction or possibility (Brown & Saunders, 2008). Inferential statistics starts with a hypothesis (a statement of, or a conjecture about, the relationship between two or more variables that you intend to study, Levin & Fox, 2006), and investigates whether the data are consistent with that hypothesis. Inferential statistics involves methods of using information from a sample to draw conclusions about the population (Hatch & Farhady, 1982). Statistical inferences are no more accurate than the data they are based on (weakest link).Statistical results need be interpreted by one who understands the methods used as well as the subject matter. Inferential statistics includes the following statistical components:

  • Correlation (measuring the intensity or the magnitude of relationship between two variables, i.e. independent and dependent variables or the direction and strength of the association)
  • T-test (measure of how significant the difference between the two groups’ mean or a group mean from a pre determined mean)
  • Regression (the average relationship between two or more variables)
  • Simple linear regression (predicting the value of one variable from the known value of other variable)
  • Multiple regression (calculation the different weightings of independent variables on a dependent variables)
  • Chi-square test (describes the magnitude of difference between observed frequencies and the frequencies expected under certain assumptions).
  • Analysis of variance (ANOVA)( analyze the differences between group means and their associated procedures (such as “variation” among and between groups)
  • Analysis of variance (ANCOVA)( a general linear model which blends ANOVA and regression and evaluates whether population of a dependent variable are equal across levels of a categorical independent variable)

Current Approaches and Practices: Statistics and EFL Research

The trends of using the statistical analysis or quantitative analysis in EFL research in not a new concept. The use of several statistical tools such as: measures of frequency; particularly percentile, measures of central tendency (mean, median  and mode) are frequently found being used in EFL research mostly in writing theses (see, Kumari, 2012; Shrestha, 2012) at university or conducting mini-research (see, Sharma, 2007; Sapkota, 2012) in Nepal under several universities or research centers. Reviewing the past EFL research or researches on educational issues in Nepal, we can find some doctoral dissertation (see Giri, 2007; Mukundan, 2004) the large scale researches (see, BPEP, 1999; NASA, 2013) using the other statistical tools such as t-test, correlation, regression analysis, F-test, etc. while analyzing the data. This shows that the use of statistical tools differ depending upon the research and the depth we describe or infer the data. Several statistical tools are essential to use to measure things, find the central value of the data, examine relationships, test hypothesis, explore issues, make comparisons to make similarities and differences, draw conclusions about the population based only on sample results, etc in several aspects of EFL research.

 Statistical Software Programs for the Use with Quantitative Data

In the recent times, there are many different software programs designed for the analysis of (descriptive or inferential) quantitative data. The use of such programs helps the research to calculate the data in a more efficient way. This article will simply introduce some of the programs that are most commonly used by EFL researchers.

  1. SPSS. SPSS, which originally stands for Statistical Package for Social Sciences, but now as Statistical Product and service Solution. It is one of the popular quantitative analysis software programs used in EFL research. Basically, it can be used to generate tabulated reports, charts, and plots of distributions and trends, as well as generate descriptive statistics and more complex statistical analyses. It includes data view section and variable view section. The data view is the sheet that is visible when you first open the data editor and contains the data. Data editor provides a convenient spreadsheet – like facility for entering, editing and displaying the content of your data file. The variable view contains information about the variables where we derive the different frequency, central tendency, variation, correlation, etc of respective data. It is also simple and easy to enter and edit data directly into the program. There are a few drawbacks, however, which might not make it the best program for some researchers. For example, there is a limit on the number of cases you can analyze. It is also difficult to account for weights, strata, and group effects with SPSS.
  2. STATA : STATA is another statistical software package created in 1985 by StataCrop. It can be used for both simple and complex statistical analyses. STATA uses a point-and-click interface as well as command syntax, which makes it easy to use. STATA also make it easy to generate graphs and plots of data and results. Analysis in STATA is centered around four windows: the command window, the review window, the result window, and the variable window. Analysis commands are entered into the command window and the review window records those commands (Crossman, 2014). The variables window lists the variables that are available in the current data set along with the variable labels, and the results window is where the results appear.
  3. SAS. SAS, refers to the Statistical Analysis System, is also used by a great deal of businesses because, in addition to statistical analysis, it also allows programmers to perform report writing, graphics, business planning, forecasting, quality improvement, project management, and more. In recent times it is also used in EFL research. SAS is good for analyses that require you to take into account weights, strata, or groups. Unlike SPSS and STATA, SAS is run largely by programming syntax rather than point-and-click menus, so some knowledge of the programming language is required (Crossman, ibid).


Hence, statistics helps to deal with the descriptive as well as the inferential measures of data. It helps to make the large text information into a précised and helps to display the information in a coherent manner using several display devices such as: pie-chart, bar graph, tables, etc. It includes a range of procedures such as gathering, organizing, analyzing and presenting quantitative data (numbers).  In regard to EFL research, there are very limited research which have used multiple statistical tools to verify the data. Mostly we find the data presented using measures of frequency or measures of dispersion in thesis, mini-researches in EFL research practices..



Brown, R.B. & Saunders, M. (2008). Dealing with statistics. England: Open University Press.

Cohen, L, Manion, L. & Morrison, K,(2011). Research methods in education. London: Routledge.

Crossman, A. (2014).  Analyzing quantitative data: Statistical software programs for use with quantitative data (as retrieved from /a/ Computer-programs-quantitative-data.htm)

Educational Review Office (2013). NASA Public (Nepali version summary) Report (As retrieved from

Giri, A.(2007). A study of grammatical errors and their gravity. Unpublished PhD Dissertation., Tribhuvan University.

Hatch, E. & Farhady, H.(1982). Research design and statistics for applied science. Rowley: Newbury House Publishers

Kumar, R. (2011). Research methodology. India: Sage Publications

Kumari, K. (2012). Learning styles adopted by M.Ed students. Unpublished  M.Ed Dissertation., Tribhuvan University.

Levin,J & Fox J.A.(2006). Elementary statistics in social research (3rd ed.). New Delhi: Dorling Kindersley.

Mackey, A. & Gass, S.M. (2005).Second language research: Methodology and design. New Jersey: Lawrence Erlbaum Associates, Publishers.

Mukundan, J. (2004). A composite  for ESL textbook evaluation. Unpublished Doctorial Thesis, University Putra Malaysia, Serdang.

Sapkota, A. (2012). Self-monitoring practices of EFL teachers for their professional development at university level. A mini-research Submitted to University Grants Commission. Kathmandu: UGC.

Sapkota, A. (2013). Research methodology in language education & thesis writing. Kathmandu: Sunlight Publication.

Sharma, B.K. (2007). Plagiarism among university students: Intentional or accidental? In Journal of NELTA. Vol-12, No-1-2. Kathmandu. NELTA

Shrestha, C. L. (2012). Language used in the editorials in journal of NELTA. Unpublished  M.Ed Dissertation., Tribhuvan University.

World Bank. (1999). Nepal – Second Basic and Primary Education Project (BPEP II). Washington, DC: World Bank. (as retrieved from http://documents .worldbank. org/ curated/en/1999/04/442240/nepal-second-basic-primary-education-project-bpep-ii)

(*Ashok Sapkota is a faculty at central department in English Education, Tribuhuvan University, Kirtipur, and Kathmandu Shiksha Campus, Kathmandu, Nepal and a former teacher trainer at British Council, executive member of NELTA central committee, member: South Asian Teachers’ Association, ELTECs/U.K. He is one of the editorial members of NELTA ELT FORUM blog)


One response

  1. Scdcgopal Argha | Reply

    Dear Sir Good evening Thank for sharing. Gopal Pd Panthi


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: