The purpose of this post is to introduce the use of Q-Q plots for comparing distributions of sampled observational data to standard, reference distributions. For example, we often assume that our sampled observations are normally distributed. A Q-Q plot can quickly show whether this assumption is justified. A Q-Q plot can be used with other distributions beside normal. For example, both log-normal and Poisson distributions are demonstrated below. I also show an estimated density plot, which is similar to a histogram, but is "more robust and easier to read" (R in a Nutshell, Joseph Adler, O'Reilly Media 2009, p.238).
Performance data gathered from a production Oracle database are used to illustrate the technique. Since these data fit none of these standard distributions, the Q-Q plots all look rather strange. To illustrate a Q-Q plot for a good fit, I generate random log-normal data, and compare them to normal and log normal distributions.
A secondary purpose of this post is to show how Q-Q plots can be generated quickly and easily using the R programming language, which is free and open source. The R programming language has become the lingua franca of data science and statistics.
The data and R code I used for this analysis are available for download (links below).