Friday, September 20, 2019

Measures of Dispersion

Measures of Dispersion Summary The measure of central tendency, as discussed in the previous chapter tells us only about the characteristics of a particular series. They do not describe any thing on the observations or data entirely. In other wards, measures of central tendency do not tell any thing about the variations that exist in the data of a particular series. To make the concept, let discuss an example. It was found by using formula of mean that the average depth of a river is 6 feet. One cannot confidently enter into the river because in some places the depth may be 12 feet or it may have 3 feet. Thus this type of interpretation by using the measures of central tendency some times proves to be useless. Hence the measure of central tendency alone to measure the characteristics of a series of observations is not sufficient to draw a valid conclusion. With the central value one must know as to how the data is distributed. Different sets of data may have the same measures of central tendency but differ greatly in terms of variation. For this knowledge of central value is not enough to appreciate the nature of distribution of values. Thus there is the requirement of some additional measures along with the measures of central tendency which will describe the spread of the entire set of values along with the central value. One such measure is popularly called as dispersion or variation. The study of dispersion will enables us to know whether a series is homogeneous (where all the observations remains around the central value) or the observations is heterogeneous (there will be variations in the observations around the central value like 1, 50, 20, 28 etc., where the central value is 33). Hence it can be said that a measure of dispersion describes the spread or scattering of the individual values of a series around its central value. Experts opine different opinion on why the variations in a distribution are so important to consider? Following are some views on validity of the measure of dispersion: Measures of variation provide the researchers some additional information about the behaviour of the series along with the measures of central tendency. With this information one can judge the reliability of the value that is derived by using the measure of central tendency. If the data of the series are widely dispersed, the central location is less representatives of the data as a whole. On the other hand, when the data of a series is less dispersed, the central location is more representative to the entire series. In other wards, a high degree of variation would mean little uniformity whereas a low degree of variation would mean greater uniformity. When the data of a series are widely dispersed, it creates practical problems in executing data. Measure of dispersion helps in understanding and tackling the widely dispersed data. It facilitates to determine the nature and cause of variation in order to control the variation itself. Measures of variation enable comparison to be made of two or more series with regard to their variability. DEFINITION: Following are some definitions defined by different experts on measures of dispersion. L.R. Connor defines measures of dispersion as ‘dispersion is the measure extended to which individual items vary. Similarly, Brookes and Dick opines it as ‘dispersion or spread is the degree of the scatter or the variation of the variables about a central value. Robert H. Wessel defines it as ‘measures which indicate the spread of the values are called measures of dispersion. From all these definition it is clear that dispersion measures more or less describes the spread or scattering of the individual values of a series around its central value. METHODS OF MEASURING DISPERSION: Dispersion of a series of data can be calculated by using following four widely used methods Dispersion when measured on basis of the difference between two extreme values selected from a series of data. The two well known measures are The Range The Inter-quartile Range or Quartile Deviation Dispersion when measured on basis of average deviation from some measure of central tendency. The well known measures are The Mean/average deviation The Standard Deviation and The Coefficient of variation and The Gini coefficient and the Lorenz curve All the tools are discussed in details below one after the other. THE RANGE: The range is the simplest measure of the dispersion. The range is defined as the difference between the highest value and the lowest value of the series. Range as a measure of variation is having limited applicability. It is widely used for weather forecasting by the meteorological departments. It also used in statistical quality control. Range is a good indicator to measure the fluctuations in price change like that of studying the variations in the price of shares and debentures and other related matters. Following is the procedure of calculating range: Range= value of the highest observation (H) – value of the lowest observation (L) or Range = H – L Advantages of Range: Range is the simplest of obtaining dispersion. It is easily understandable and can be interpreted easily. It requires fewer times to obtain the variation in the series. Disadvantages of Range: As it considers only two extreme values, hence it doesnt include all the observations of the series. It fails to tell any thing about the characteristics of a distribution It is having very limited scope of applicability Having no mathematical treatment THE INTER-QUARTILE RANGE OR QUARTILE DEVIATION: A second measure of dispersion is the inter-quartile range which takes into account the middle half i.e., 50% of the data thus, avoiding the problem of extreme values in the data. Hence it measures approximately how far from the median one must go on either side before it can be include one-half the values of the data set. Inter-quartile range can be calculated by dividing the series of observations into four parts; each part of the series contains 25 percent of the observations. The quartiles are then the highest values in each of these four parts, and the inter-quartile range is the difference between the values of the first and the third quartile. Following are the steps of calculating the inter-quartile range: Arrange the data of the series in ascending order. Calculate the first quartile which is denoted as (Q1) by using the formula In case of grouped data the first quartile (Q1) can be calculated by using the formula Where N= number of observations in the series i.e., the sum of frequencies, L = lower limit of the quartile class, p.c.f. = commutative frequency prior to the quartile class, f = frequency of the quartile class and i = class interval. Quartile class can be determined by using the formula. Calculate the third quartile which is denoted as (Q3) by using the formula in case of ungrouped data. In case of grouped data the third quartile (Q3) can be calculated by using the formula Where N= number of observations in the series i.e., the sum of frequencies, L = lower limit of the quartile class, p.c.f. = commutative frequency prior to the quartile class, f = frequency of the quartile class and i = class interval. Quartile class can be determined by using the formula. THE MEAN/AVERAGE DEVIATION: Mean/average deviation is the arithmetic mean of the difference of a series computed from any measure of central tendency i.e., either deviation from mean or median or mode. The absolute values of each observation are calculated. Clark and Schekade opine mean deviation or average deviations as the average amount of scatter of the items in a distribution from either the mean or the median, ignoring the signs of the deviations. Thus the average that is taken of scatter is an arithmetic mean, which accounts for the fact that this measure is often called as mean deviation or average deviations. Calculations of Mean Deviation in case of Discrete Series: In case of discrete series, mean deviation can be calculated through following steps The first step is to calculate the mean or median or mode of the given series Compute the deviations of the observations of the series from the calculated mean or median or mode. This deviation is also denoted as capital letter D and is always taken as mod value i.e., ignoring the plus or minus sign. Take the summation of the deviations (sum of D) and divide it by number of observations (N). In the same way one can calculate mean deviation from median or mode in case of individual series. Calculations of Mean Deviation in case of discrete series: Mean deviation can be calculated in case of discrete series in a little bit different way. Following are some steps to calculate the average mean when the series is discrete. The first step is to calculate the mean or median or mode of the given series by using the formula as discussed in the previous chapter. Compute the deviations of the observations of the series from the calculated mean or median or mode value. This deviation is also denoted as capital letter D and is always taken as mod value i.e., ignoring the plus or minus sign. Multiply the corresponding frequency with each deviation value i.e., calculate f * D. Similarly, one can calculate the mean deviation or average deviation by taking deviations from median or mode. Calculations of Mean Deviation in case of continuous series: The first step is to calculate the mean or median or mode of the given series by using the formula as discussed in the previous chapter. In the second step, get the mid values of the observations (m) Compute the deviations of the observations of the series from the calculated mean or median or mode value. This deviation is also denoted as capital letter D = m mean or median or mode and is always taken as mod value i.e., ignoring the plus or minus sign. Multiply the corresponding frequency with each deviation value i.e., calculate f * D. Take the summation i.e., (sum of D) and divide it by number of observations (N). The formula may be Advantages of mean deviation: The computation process of mean deviation is based on all the observations of the series. The value of mean deviation is less affected by the extreme items. These are three alternatives available with the researcher while calculating the mean. One can consider the mean or median or mode. Hence it is more flexible in calculation. Disadvantages of mean deviation: The practical usefulness of mean deviation is very less. Mean deviation is not having enough scope for further mathematical calculations. Mod values are considered while calculating the mean deviation. It is criticized by some experts as illogical and unsound. THE STANDARD DEVIATION: Standard deviation or other wise called as root mean square deviation is the most important and widely used measure of variation. It measures the absolute variation of a distribution. It is the right measure that highlights the spread of the observation over and around the mean value. The greater the rate of variation of observations in a series, the greater will be the value of standard deviation. A small value of standard deviation implies a high degree of homogeneity among the observations in the series. If there will be a comparison between two or more standard deviations of two or more series, than it is always advisable to choose that series as ideal one which is having small value of standard deviation. Standard deviation is always measures from the mean or average value of the series. The credit for introducing this concept in the literature goes to Karl Pearson, a famous statistician. It is denoted by the Greek letter (pronounced as sigma) Standard deviation is calculated in following three different series: Standard deviation in case of Individual series Standard deviation in case of Discrete series Standard deviation in case of Continuous series All the above conditions are discussed in detail below. a. Standard deviation in case of individual series: In case of individual series, the value of standard deviation can be calculated by using two methods. Direct method- when deviations are taken from actual mean Short-cut method- when deviations are taken from assumed mean 1. Direct method- when deviations are taken from actual mean: Following are some steps to be followed for calculating the value of standard deviation. The first step is to calculate the actual mean value of the observation In the next column calculate the deviation from each observation i.e., find out () where is the mean of the series. In the next column calculate the square value of the deviations and at the end of the column calculate the sum of the square of the deviations i.e., Divide the total value with the number of observations (N) and than square root of the value. The formula will be . Since the series is having individual observations, some times it so happens that there is no need of taking the deviations. In such a case the researcher can directly calculate the value of the standard deviation. The formula for calculating directly is . 2. Short-cut method- when deviations are taken from assumed mean: In practical uses it so happens that while calculating standard deviation by using the arithmetic mean, the mean value may be in some fractions i.e., .25 etc. This creates the real problem in calculating the value of standard deviation. For this purpose, instead of calculating standard deviation by using the above discussed arithmetic mean methods, researchers generally prefer the method of short-cut which is nothing rather calculation of standard deviation by assuming a mean value. Following are some steps that to be followed for calculating standard deviation in case of assumed mean method: The first step is to assume a value from the X values as mean. This mean value is denoted as A. In the next step deviations are to be calculated from this assumed mean as (X-A) and this value is denoted as D. At the end of the same column, the sum of D () is to be calculated. Calculate the square of each observation of D i.e., calculate. The following formula is to be used to calculate standard deviation of the series. where N is the number of observations in the series. b. Standard deviation in case of discrete series: Discrete series are the series which are having some frequencies or repetitions of observations. In case of a discrete series standard deviation is calculated by using following three methods: when deviations are taken from actual mean when deviations are taken from assumed mean Following are the detailed analysis of the above the two methods. 1. When deviations are taken from actual mean: The steps to calculate standard deviation when deviations are calculated from the actual mean are The first step is to calculate the actual mean value of the observation In the next column calculate the deviation from each observation i.e., find out () where is the mean of the series, this can be denoted as D. In the next column calculate the square value of the deviations and at the end of the column calculate the sum of the square of the deviations i.e., Multiply corresponding frequencies of each observation with the value of D2 in the next column. Divide the total value with the number of observations (N) and than square root of the value. The formula will be 2. When deviations are taken from assumed mean: The steps to calculate standard deviation when deviations are calculated from the actual mean are The first step is to assume a mean value from the observations In the next column calculate the deviation from each observation i.e., find out () where A is the mean of the series, this deviation can be denoted as D. In the next column calculate the square value of the deviations and at the end of the column calculate the sum of the square of the deviations i.e., Multiply corresponding frequencies (f) of each observation with the value of D2 in the next column. Use the following formula to calculate standard deviation c. Standard deviation in case of Continuous series: Standard deviation in case of a continuous series can be calculated by using the following steps Calculate the mid value of the series and denote it as ‘m. Assume any value from the mid values and denote it as A Deviations can be calculated from each series i.e., calculate m – A and than divide it with the class interval value (i) i.e., Multiply the corresponding frequencies of each observation with the deviation value and take the sum at the end of the column i.e., calculate In the next column square the deviation values of each observation i.e., calculate Multiply the value of with its frequencies i.e., calculate Use the following formula to get standard deviation. Properties of standard deviation: As tool of variance, standard deviation is used as a good measure of interpretation of the scatteredness of observation of a series. It is a fact that in a normal distribution approximately 68 per cent of the observations of a series lies less than standard deviation away from the mean, again approximately 95.5 per cent of the items lie less than 2 standard deviation value away from the mean and in the same way 99.7 per cent of the items lie within 3 standard deviations away from the mean. Hence covers 68.27 per cent of the items in a series with normal distribution. covers 95.45 per cent of the items in a series with normal distribution and covers 99.73 per cent of the items in a series with normal distribution. Advantage of Standard Deviation: Following are some advantages of standard deviation as a measure of dispersion This is the highest used technique of dispersion. It is regarded as a very satisfactory measure of the dispersion of a series. It is capable of further mathematical calculations. Algebraic signs are not ignored while measuring the value of standard deviation of a series. It is less affected by the extreme observations of a series. The coefficients make the standard deviation very popular measure of the scatteredness of a series. Disadvantages of standard deviation: The disadvantages are It is not easy to understand the concept easily and quickly. It requires a good exercise to calculate the values of standard deviation. It gives more weight to observations which are away from the arithmetic mean. THE COEFFICIENT OF VARIATION: Another useful statistical tool for measuring dispersion of a series is coefficient of variation. The coefficient of variation is the relative measure of standard deviation which is an absolute measure of dispersion. This tool of dispersion is mostly used in case of comparing the variability two or more series of observation. While comparing, that series for which the value of the coefficient of variation is greater is said to be more variable (i.e., the observations of the series are less consistent, less uniform, less stable or less homogeneous). Hence it is always advisable to choose that series which is having less value of coefficient of variation. The value of coefficient is less implies more consistent, more uniform, more stable and of course more homogeneous. The value of coefficient of variation is always measured by using the value of standard deviation and its relative arithmetic mean. It is denoted as C.V., and is measured by using simple formula as discussed below: In practical field, researchers generally prefer to use standard deviation as a tool to measure the dispersion than that of coefficient of variance because of a numbers of reasons (researchers are advised to refer any standard statistics book to know more on coefficient of variance and its usefulness). GINI COEFFICIENT AND THE LORENZ CURVE: An illuminating manner of viewing the Gini coefficient is in terms of the Lorenz curve due to Lorenz (1905). It is generally defined on the basis of the Lorenz curve. It is popularly known as the Lorenz ratio. The most common definition of the Gini coefficient is in terms of the Lorenz diagram is the ratio of the area between the Lorenz curve and the line of equality, to the area of the triangle OBD below this line (figure-1). The Gini coefficient varies between the limits of 0 (perfect equality) and 1 (perfect inequality), and the greater the departure of the Lorenz curve from the diagonal, the larger is the value of the Gini coefficient. Various geometrical definitions of Gini coefficient discussed in the literature and useful for different purposes are examined here. CONCLUSIONS: The study of dispersion will enables us to know whether a series is homogeneous (where all the observations remains around the central value) or the observations is heterogeneous (there will be variations in the observations around the central value Hence it can be said that a measure of dispersion describes the spread or scattering of the individual values of a series around its central value. For this there are a numbers of methods to determine the variations as discussed in this chapter. But it is always confusing among the researchers that which method is the best among the different techniques that we have discussed? The answer to this question is very simple and says that no single average can be considered as best for all types of data series. The most important factors are the type of data available and the purpose of investigation. Critiques suggest that if a series is having more extreme values than standard deviation as technique is to be avoided. On the other hand in case of more skewed observations standard deviation may be used but mean deviation needs to be avoided where as if the series is having more gaps between two observations than quartile deviation is not an appropriate measure to be used. Similarly, standard deviation is the best technique for any purpose of data. SUMMARY: The study of dispersion will enables us to know whether a series is homogeneous (where all the observations remains around the central value) or the observations is heterogeneous (there will be variations in the observations around the central value). Dispersion when measured on basis of the difference between two extreme values selected from a series of data. The two well known measures are (i) The Range and (ii) The Inter-quartile Range. Dispersion when measured on basis of average deviation from some measure of central tendency. The well known measures are (i) The Mean/average deviation, (ii) The Standard Deviation, (iii) The Coefficient of variation and (iv) The Gini coefficient and the Lorenz curve The range is defined as the difference between the highest value and the lowest value of the series. Range as a measure of variation is having limited applicability. The inter-quartile range measures approximately how far from the median one must go on either side before it can be include one-half the values of the data set. Mean/average deviation is the arithmetic mean of the difference of a series computed from any measure of central tendency i.e., either deviation from mean or median or mode. The absolute values of each observation are calculated. A small value of standard deviation implies a high degree of homogeneity among the observations in the series. If there will be a comparison between two or more standard deviations of two or more series, than it is always advisable to choose that series as ideal one which is having small value of standard deviation. Standard deviation is always measures from the mean or average value of the series. The coefficient of variation is the relative measure of standard deviation which is an absolute measure of dispersion. This tool of dispersion is mostly used in case of comparing the variability two or more series of observation. The most common definition of the Gini coefficient is in terms of the Lorenz diagram is the ratio of the area between the Lorenz curve and the line of equality, to the area of the triangle below the equality line. IMPORTANT QUESTIONS: 1. Age of ten students in a class is considered. Find the mean and standard deviation. 19, 21, 20, 20, 23, 25, 24, 25, 22, 26 The following table derives the marks obtained in Statistics paper by 100 students in a class. Calculate the standard deviation and mean deviation. The monthly profits of 150 shop keepers selling different commodities in a city footpath is derived below. Calculate the mean, mean deviation and standard of the distribution. The daily wage of 160 labourers working in a cotton mill in Surat cith is derived below. Calculate the range, mean deviation and standard of the distribution. Calculate the mean deviation and standard deviation of the following distribution. What do you mean by measure of dispersion? How far it helpful to a decision-maker in the process of decision making? Define measure of Dispersion? Among the various tools of dispersion which tool according to you is the best one, give suitable reason of your answer. What do you mean by measure of dispersion? Compare and contrast various tools of dispersion by pointing out their advantages and disadvantages. Discuss with example the relative merits of range, mean deviation and standard deviation as measures of dispersion. Define standard deviation? Why standard deviation is more useful than other measures of dispersion? The data derived below shows the ages of 100 students pursuing their master degree in economics. Calculate the Mean deviation and standard deviation. Following is the results of a study carried out to determine the number of mileage the marketing executives drove their cars over a 1-year period. For this 50 marketing executives are sampled. Based on the findings, calculate the range and inter-quartile range. In an enquiry of the number of days 230 patients chosen randomly stayed in a Government hospital following after operation. On the basics of observation calculate the standard deviation. Cars sold in small car segment in November 2009 at 10 Maruti Suzuki dealers in Delhi city is explained below. Compute the range, mean deviation and standard deviation of the data series. Following is the daily data on the number of persons entered through main gate in a month to institute. Calculate the range and standard deviation of the series. Calculate the range and coefficient of range of a group of students from the marks obtained in two papers as derived below: Following are marks obtained by some students in a class-test. Calculate the range and coefficient of range. By using the direct and indirect method, calculate the mean deviation by using both arithmetic mean and mode from the following data set which is related to age and numbers of residents of Vasundara apartment, Gaziabad. A local geezer manufacturer at Greater Noida has developed a new and chief variety of geezers which are meant of lower and middle income households. He carried out a survey in some apartments asking the expectations of the customers that they are ready to invest on purchase of geezer. Calculate the standard deviation of the series. Calculate median of the following distribution. From the median value calculate the mean deviation and coefficient of mean deviation. Calculate median of the following distribution. From the median value calculate the mean deviation and coefficient of mean deviation. Calculate the arithmetic average and standard deviation from the following daily data of rickshaw puller of Hyderabad City. From the students of 250 candidates the mean and standard deviations of their total marks were calculated as 60 and 17. Latter in the process of verification it is found that a score 46 was misread 64. Recalculate the correct mean and standard deviation. The wage structure paid on daily basis of two cotton factories are derived below. In order to show the inequality, draw the Lorenz curve. Total marks obtained by the students in two sections are derived below. By using the data draw a Lorenz curve. Draw the Lorenz curve of the following data. Find the range and co-efficient of range for the following data set. The height of 10 firemen working in a fire station are 165, 168, 172, 174, 175, 178, 156, 158, 160, 179 cms. Calculate the range of the series. Now let that the tallest and the shortest firemen are get transformed from the fire station. Calculate the range of the new firemen. What percentage change is found in the earlier range and the latter range? Calculate the quartile deviation from the following derived data. Calculate the interquartile range, quartile deviation and its coefficient for the following data series. Calculate the mean deviation from the following data. Calculate the mean deviation from median and mean for the following series. The distribution derived below reveals the difference in age between husband and wife in a community. Based on the data, calculate mean deviation and standard deviation. Calculate th

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.