Unit 4. Descriptive Statistics with Excel
J Toby Mordkoff
Summary. This unit shows you, in a series of short videos, how to calculate descriptive statistics and make simple plots of numerical data using Microsoft Excel.
Prerequisite Units
Unit 1. Introduction to Statistics for Psychological Science
Unit 2. Managing Data
Unit 3. Descriptive Statistics for Psychological Research
Mean and Sample Standard Deviation
The most popular way to define the center and spread of a single set of numerical data is the mean and standard deviation. You can also use the variance for spread, instead of the standard deviation, but then the units of measure are squared, which can be confusing (e.g., what is a squared millisecond?). In any event, we will be calculating the variance on the way to the standard deviation, so both of these will be available.
Here are the formulae that we will use (see Unit 3):
(1)
(2)
(3)
Note that we will be calculating the sample values of variance and standard deviation, so we’re using the formula with n – 1 in the denominator. We will do the calculations on the following data (from Unit 3):
65 | 54 | 79 | 57 | 35 | 14 | 56 | 55 | 77 | 45 | 92 |
Watch the video: https://tinyurl.com/Mean-and-Standard-Deviation
Median and Inter-Quartile Range
The other way to define the center and spread of a single set of numerical data is the median and inter-quartile range (IQR). These are both based on percentiles. The median is 50th percentile; the IQR is the difference between the 75th and 25th percentiles.
Note that there are two different ways to calculate a percentile. One method includes the median (when calculating percentiles other than the 50th); the other method excludes the median. We use the exclusive version of percentiles, because the inclusive version has a tendency to under-estimate the IQR in the population from which the sample was taken. Warning: for many spreadsheets and stats packages, the exclusive version is not the default, so you have to be careful.
For the demonstration, we will use an extremely simple set of data: 1, 2, 3, 4, & 5.
Watch the video: https://tinyurl.com/Median-and-IQR
Numerical Values of Skew and Kurtosis
When center and spread are defined as the mean and standard deviation (or variance), there are parallel definitions of skewness and kurtosis that may be used for the shape of the data. These are also based on the method of moments. The formulae for these are quite complicated, so we will be jumping directly to Excel’s built-in functions. Note that a perfect bell curve, known as the normal distribution, has skew = 0 and kurtosis = 0. A new set of (random) data will be used for this demonstration.
Watch the video: https://tinyurl.com/Skew-and-Kurtosis
Simple Frequency Plots (Histograms) of Numerical Data
The more general approach to skew and kurtosis is to make a plot of the data and simply look for asymmetry (skew) and peakedness (kurtosis). As will be shown, in some cases, the data must be “binned” before being plotted.
Watch the video: https://tinyurl.com/Simple-Histograms