Calculus Shape, Center, and Spread of a Distribution A population parameter is a characteristic or measure obtained by using all of the data values in a population. A sample statistic is a characteristic or measure obtained by using data values from a sample. The parameters and statistics with which we first concern ourselves attempt to quantify the "center" i.

Note, there are several different measures of center and several different measures of spread that one can use -- one must be careful to use appropriate measures given the shape of the data's distribution, the presence of extreme values, and the nature and level of the data involved. As we consider different measures of center and spread, recall that we really want to know about the center and spread of the population in question i.

As such, we calculate sample statistics to estimate these population parameters. The Shape of a Distribution We can characterize the shape of a data set by looking at its histogram. First, if the data values seem to pile up into a single "mound", we say the distribution is unimodal. If there appear to be two "mounds", we say the distribution is bimodal.

If there are more than two "mounds", we say the distribution is multimodal. Second, we focus on whether the distribution is symmetric, or if it has a longer "tail" on one side or another. In the case where there is a longer "tail", we say the distribution is skewed in the direction of the longer tail.

In the case where the longer tail is associated with larger data values, we say the distribution is skewed right or positively skewed. In the case where the longer tail is associated with smaller or more negative values, we say the distribution is skewed left or negatively skewed. If the distribution is symmetric, we will often need to check if it is roughly bell-shaped, or has a different shape. In the case of a distribution where each rectangle is roughly the same height, we say we have a uniform distribution.

The below graphic gives a few examples of the aforementioned distribution shapes. Measures of Center For interval or ratio level data, one measure of center is the mean. Both values are calculated in a very similar way. In the case of an even number of data values and thus no exact middle , it is the average of the middle two data values. It is not affected by the presence of extreme values in the data set.

However, when there is an even total number of values, there is a complication -- we can't average two ordinal values as we can with ratio or interval-level values to find a "middle value". The two middle ranks are a jack J and a queen Q. What would their average be? Due to the difficulty in answering this question, some texts suggest that for an even-length list of ordinal data, one should instead simply choose the lower of the two middle values to be the median.

If our graph has more data to the left, then we would say that our graph is skewed right. For our donuts survey, it would mean that more people prefer to eat fewer donuts. A good way to remember this is to view the graph as a slide. If you slide down to the right, then it is skewed right and if you slide down to the left, then it is skewed left.

Boxplots often provide information about the shape of a data set. The examples below show some common patterns. Skewed Right : Symmetric : Skewed Left : Each of the above boxplots illustrates a different skewness pattern. If most of the observations are concentrated on the low end of the scale, the distribution is skewed right; and vice versa. If a distribution is symmetric, the observations will be evenly split at the median, as shown above in the middle figure.

Uniform If our survey of people's donut eating habits showed that for each amount of donuts eaten, the same number of people would choose that amount, then our graph will look flat all across the top, then we call it uniform. A uniform shape has no peaks nor is it skewed.

Spread A measure of spread, sometimes also called a measure of dispersion, is used to describe the variability in a sample or population. It is usually used in conjunction with a measure of central tendency, such as the mean or median, to provide an overall description of a set of data. There are many reasons why the measure of the spread of data values is important, but one of the main reasons regards its relationship with measures of central tendency. A measure of spread gives us an idea of how well the mean, for example, represents the data.

If the spread of values in the data set is large, the mean is not as representative of the data as if the spread of data is small. This is because a large spread indicates that there are probably large differences between individual scores. Additionally, in research, it is often seen as positive if there is little variation in each data group.

We will be looking at two measures of spread : Range and Quartiles Range The range is the difference between the highest and lowest scores in a data set and is the simplest measure of spread.

