"

Unit 4. Descriptive Statistics with Excel

J Toby Mordkoff

Summary.  This unit shows you, in a series of short videos, how to calculate descriptive statistics and make simple plots of numerical data using Microsoft Excel.

Prerequisite Units

Unit 1. Introduction to Statistics for Psychological Science

Unit 2. Managing Data

Unit 3. Descriptive Statistics for Psychological Research

Mean and Sample Standard Deviation

The most popular way to define the center and spread of a single set of numerical data is the mean and standard deviation.  You can also use the variance for spread, instead of the standard deviation, but then the units of measure are squared, which can be confusing (e.g., what is a squared millisecond?).  In any event, we will be calculating the variance on the way to the standard deviation, so both of these will be available.

Here are the formulae that we will use (see Unit 3):

(1)    \begin{equation*} \ Mean \medspace of \medspace X = \bar {X} = \frac {\sum {X_i}} {n} \end{equation*}

(2)    \begin{equation*} \ Sample \medspace Variance \medspace of \thinspace X = {s_x^2} = \frac {\sum {(X_i - \bar {X})^2}} {n - 1} \end{equation*}

(3)    \begin{equation*} \ Sample \medspace Standard \medspace Deviation \medspace of \thinspace X = {s_x} = \sqrt{s_x^2} \end{equation*}

Note that we will be calculating the sample values of variance and standard deviation, so we’re using the formula with n – 1 in the denominator.  We will do the calculations on the following data (from Unit 3):

65 54 79 57 35 14 56 55 77 45 92

Watch the video:  https://tinyurl.com/Mean-and-Standard-Deviation

Median and Inter-Quartile Range

The other way to define the center and spread of a single set of numerical data is the median and inter-quartile range (IQR).  These are both based on percentiles.  The median is 50th percentile; the IQR is the difference between the 75th and 25th percentiles.

Note that there are two different ways to calculate a percentile.  One method includes the median (when calculating percentiles other than the 50th); the other method excludes the median.  We use the exclusive version of percentiles, because the inclusive version has a tendency to under-estimate the IQR in the population from which the sample was taken.  Warning: for many spreadsheets and stats packages, the exclusive version is not the default, so you have to be careful.

For the demonstration, we will use an extremely simple set of data: 1, 2, 3, 4, & 5.

Watch the video:  https://tinyurl.com/Median-and-IQR

Numerical Values of Skew and Kurtosis

When center and spread are defined as the mean and standard deviation (or variance), there are parallel definitions of skewness and kurtosis that may be used for the shape of the data.  These are also based on the method of moments.  The formulae for these are quite complicated, so we will be jumping directly to Excel’s built-in functions.  Note that a perfect bell curve, known as the normal distribution, has skew = 0 and kurtosis = 0.  A new set of (random) data will be used for this demonstration.

Watch the video:  https://tinyurl.com/Skew-and-Kurtosis

Simple Frequency Plots (Histograms) of Numerical Data

The more general approach to skew and kurtosis is to make a plot of the data and simply look for asymmetry (skew) and peakedness (kurtosis).  As will be shown, in some cases, the data must be “binned” before being plotted.

Watch the video:  https://tinyurl.com/Simple-Histograms