Rejecting data (Outliers)

“So unexpected was the hole that for several years computers analysing ozone data had systematically thrown out the readings that should have pointed to its growth.”

New Scientist 31st March 1988

Published 10th April 2017 by Andy Connelly. Updated 10th May 2017.

Introduction

The quote above shows the importance of not rejecting data without a very strong reason. The data may not fit your pet idea, or even the generally accepted scientific theory, but that does not they are wrong.

DISCLAIMER: I am not an expert in analytical chemistry. The content of this blog is what I have discovered through my efforts to understand the subject. I have done my best to make the information here in as accurate as possible. If you spot any errors or admissions, or have any comments, please let me know.

Rejecting data

Before you reject data:

  1. Go back as far as you can and check the readings. Check you haven’t written something down wrong or there isn’t some very obvious error (e.g. a damaged pipette),
  2. Use the checks you have inbuilt into your experiment to look for other possible experimental problems (e.g. check standards)
  3. Repeat the measurement multiple times,
  4. Only reject data if you are absolutely comfortable accepting that it is in error.

If the unexpected point is on a calibration curve then there are various tests you can carry out before rejected points. However, still only ever reject a point if you have really considered all other options. Using a test is defensible and can be used as a reason. However, it DOES NOT tell you whether or not to remove the extreme observation(s) – that is still your judgement.

Examples of tests are (see Figure 1) [1]:

  • Dixon’s Q-test – Is the ratio of the ‘outlier gap’ to the data range.
  • Grubbs’ test – Is essentially a z score that references a modified t table. Very similar to a one-sample t-test.

Conclusion

The Grubbs’ test picks up extreme values earlier than the Dixon test and is recommended by ISO. However, you should choose the test that is most appropriate based on your knowledge of the data.

Comparison of the outlier tests
Figure 1: Comparison of the outlier tests discussed in text. [2]

References

[1] Statistics: A guide to the use of statistical methods in the physical sciences. Roger Barlow, John Wiley & Sons, 1989.

[2] http://webspace.ship.edu/pgmarr/Geo441/Lectures/OPT%201%20-%20Outlier%20Detection.pdf

Further reading

  • Statistics and Chemometrics for Analytical Chemistry, Miller & Miller, 5th ed. Pearson (2005)
  • Data analysis for chemistry: An introductory guide for students and laboratory scientists, Hibbert & Gooding, (2006)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s