Saturday, 30 December 2017

Excel: do not be fooled by Excel trend lines

Something I see quite often is blind faith in Excel trend lines. Somebody collates some time based data, creates a graph in Excel and then ...Add Trendline.

Sadly these trends can be significantly wrong as easily as they can be right.

One illustration of this is provided by the datasets crafted by Francis Anscombe known as ‘Anscombe’s Quartet’ (see Graphs in Statistical Analysis by F.J.Anscombe. American Statistician. 27 (February1973): 17-21)

The summary statistics for the four datasets (A, B, C, D) are identical:























But plotting the datasets shows that the four are really completely different




















Now add Excel trendlines:




















It is clear from the illustrations that three out of the four trendlines are completely wrong

Putting this another way: our visual inspection of the patterns is far more reliable than Excel's automatic calculation. We should trust ourselves.

In his book 'Show Me the Numbers: Designing Tables and Graphs to Enlighten' (Second Edition. Analytics Press. 2012, pp.224-8), Stephen Few shows  how the trend shown can be affected significantly by the small differences in range of values selected.

So there are two big things to watch out for - the range of data that you decide to trend and the way Excel then attempts to fit a trend line














No comments:

Post a Comment

Crash Course in SQL Part 2: SELECT

The SQL SELECT instruction is the most important single thing to master It is the instruction that actually gives the answers to any que...