Tools like SAS Visual Analytics and similar use terms like outlier and maximum and I wasn’t sure of the difference.

Today, I was looking at Box plots and while I could see the difference between an outlier and a maximum, I didn’t really know how each is calculated. Here goes….

While mean, medium and percentile are fairly clear, I’m not sure how the **Max**imum and **outlier** are calculated.

SAS defines the maximum and outlier as:

Outlier:an observation outside the lower and upper fences. The fences are located at a distance 1.5 times the Interquartile Range (IQR = Q3 – Q1) above and below the box

Max:maximum data value less than or equal to the upper fence.

Where:

**IQR** (inter quartile range) = Q3 – Q1

**Q1** – 1st quartile (25th percentile). The data must contain a nonmissing value for this quartile.

**Q3** – 3rd quartile (75th percentile). The data must contain a nonmissing value for this quartile.

I’m doing a part-time course in Data Analytics at the University of Brighton. Seems like a good excuse to post some stuff here and see if my site gets a new lease of life.

I’ll probably post things on here that I need to know.

This is an example of a plot of the GA (google analytics) data stored in Json format. The data is plotted using d3.js. Have a look at the gallery. I do not think I have done d3js justice with my line charts.

Using data from google analytics (GA) (see other posts) a Fast Fourier Transform is performed on the time series data. The absolute value is saved for plotting using d3.js

I’m doing a talk at Big Data Brighton on April 25th. I will add the slides after but for now I’ll put up the matlab m file scripts. All at: https://github.com/espeecat/big-data-brighton-april-2013

Slides for Time Series Data and FFT

This shows a single sinusoid and Fourier Transform

This is the example which takes several sinusoids, sums them and shows their Fourier Transforms

This is the filtering example