Menu

Managing data quality

  By Andy Youell   - Friday 23 November 2018

In my previous article I discussed the myriad of ways in which data can fail and the idea that data quality often seems like more of an art than a science. Managing data quality – in order to achieve fitness for a specified purpose – requires an on-going and systematic approach, often described as a quality management system.

A quality management system provides a structured approach to managing quality, with clearly defined roles, responsibilities and oversight. The most common definition of a quality management system is set out in the ISO9001 standard.

A brief history of quality management

Formal quality management standards first emerged in the defence industry in the 1950s and 60s. In 1979 the British Standards Institute published BS5750 which defined a standard approach to quality management that could be applied to any type of industrial production. The quality standard was adopted by the International Organization for Standardization (which, for some reason, is abbreviated to “ISO”) and the first international quality standard – ISO9000 – was published in 1987. ISO published major revisions in 1984, 2000, 2008 and most recently in 2015 and the art of quality management has evolved and as the standard has become applicable to a broader range of organisations and processes. The modern ISO standards are applicable to any type of production or service provision and can be applied across an organisation or to just a defined system of process within an organisation.

How does this work for data?

Although the detail of the ISO9000 family of standards has changed over the years, the core elements have been stable for many years and cover the key areas of quality management:

  • Establishing a high-level commitment to quality and ensuring that operations are adequately resourced
  • Having a clear specification of the required product or service
  • Having managed delivery/operational processes 
  • Testing outputs
  • Dealing with failures and learning from them

Quality management requires a clear specification of the desired end state in order to execute tests on the outputs against that specification. When dealing with data quality issues the specification and subsequent tests need to cover the full range of ways in which data can fail, and not just simple data format or range tests. Tests should consider the data specification and processing against the desired use. They should test the data you have and test the data that you haven’t got; establishing that something is missing can be one of the most complex areas of quality assurance.   

A quality management system also needs to define how failures are managed. This means having a defined process for putting things right and dealing with the consequences. In many ways failures are the most interesting area of quality management since these provide the opportunity for a quality management system to investigate how the failure occurred and identify how the system can be improved. As the old saying goes, it is not a sin to make a mistake but it is a sin to repeat it.

The case for quality management

Failures in any system generate additional costs and the business case for comprehensive quality management ought to be compelling. However it is often difficult to articulate without a rich understanding of how things fail and what their impact can be. Failures become more expensive the longer they exist without resolution so good quality management will always aim to identify failures at the earliest possible opportunity.

Data often fails in ways we don’t expect and it has a habit of tripping us up when we least expect it; the reputation of a dataset can fall swiftly and that reputation can take a long time to recover. As we come to rely more on data, then a professional approach to the management of data quality becomes increasingly important.

Andy

Andy Youell is a writer, speaker and strategic data advisor. Formerly the Director of Data Policy and Governance at HESA, Andy has been at the leading edge of data issues across higher education for over 25 years. His work has covered all aspects of the data and systems lifecycle and in recent years has focussed on improving the HE sector’s relationship with data. Follow him on Twitter @AndyYouell

More from

Wednesday 18 December 2019
Monday 25 November 2019
Friday 22 March 2019
Monday 07 January 2019
Tuesday 11 September 2018
Wednesday 20 June 2018

Share