Is the data challenge just about being open?

The recent report from the House of Commons Science and Technology Committee on “The big data dilemma” (http://www.publications.parliament.uk/pa/cm201516/cmselect/cmsctech/468/468.pdf) raised some interesting challenges to be addressed.  There is much to commend from the findings, but is the answer really about being looking at data, or are we missing areas that are still to be addressed prior to exploiting the full potential from the information held within ‘big data’?

The rise of dark data

The reported fact that 90% of the data in the world has been created in the past two years is of note, given that a report from Veritas last year (http://www.computerweekly.com/news/4500256309/Lack-of-data-classification-very-costly-to-firms-says-survey?utm_medium=EM&asrc=EM_EDA_49210577&utm_campaign=20151029_BT%20revenue%20up%202%25%20on%20broadband%20and%20BT%20Sport%20Europe_&utm_source=EDA) showing that a typical organisation in the UK has 59% of data it is protecting, maintaining and storing without knowing what information is held within it.  

There is therefore a real issue surrounding data as a whole.  It is envisaged that maintaining 500TB of this “dark data” is wasting around £1m of protection and storage costs.  How can you start to look to exploit the opportunities afforded by big data if you are wasting time and money trying to both protect and extract useful information from it?

Looking at common attributes

One way of looking at uncovering the information hidden within data is to start to look at the attributes we require and standardise them.  We often use common data formats to ensure compatibility, yet they are rarely recorded.  Beyond this, we have common reporting requirements for things such as tax, organisational returns and medical records; yet we appear unable to capture these for reuse in other situations.

When we look at the myriad of questions that suppliers are asked to complete just to prove they are viable, there are improvements that can be made rapidly.  Better still, when we start to look at attributes from a vendor-agnostic standpoint, we often find easier ways to search for similar information types within our unstructured data.

Does open data require open standards?

The answer, of course, is open data; yet we often confuse this to be synonymous with open standards.  The two are very different, and the challenges for governance of open data are arguably more than that required for closed data.  You see, once you open up data sources then people use them; if they cannot be relied upon (or they can be changed without warning) then the consequences can be severe (think about environmental data such as traffic or weather).  Therefore, the requirement to ensure that the information held within open data can be trusted for its accuracy and is available when required is of utmost importance.  This requires a cultural evolution to governance that is currently still very much resident in the “security” mindset that tries to control access; a mindset that is largely redundant for open data.

When we look at open standards, we often look to implement standards (such as ODF) without understanding what issue we are trying to resolve.  This matters, because once we understand the information we are trying to share, we can look to existing technologies and determine if any present a barrier to interoperability of existing systems.

Why is this important?  If you try to create a standards-based approach, current thinking estimates that it will take 5-10 years to deliver (https://ec.europa.eu/digital-agenda/en/open-standards).  Unless the standards address real issues, then the time to adopt these standards is at least two technology cycles.

If we take the Open Document Format as an example, all Office-based platforms are capable of using the Microsoft Office document format (Google have even stated a goal of treating the format as a first-class citizen in it’s products) so interoperability is assured even if you don’t use Microsoft products.

Delays adopting new standards where consensus has already been reached takes focus away from delivering agreement on the key standards that are required for areas such as Smart Cities and Internet of Things.  Not only that, uncertainty hampers investment within the supply chain.

Providing the foundation for open data

So how can we determine where open data and associated standards can provide value?  We only need to look at the example of the New York Mayor’s Office of Data Analytics (MODA), which have advised on the digitisation of London (http://capitalcityfoundation.london/big-data-in-the-big-apple/).  Their key advice is to understand where open data can assist, and reuse what technologies support their chosen approaches to sharing data.

We need to look to ensure that we understand the information we need to share, then select technologies (existing as well as new) that can can support this.  This means defining requirements for format, structure, handling and retrieval of information to allow platforms to be reviewed, reused or created as necessary.

This also requires a change in the governance culture we have in organisations, from one of protection and compliance to risk management and governance.  5   Yet we are told that 71% of all executives view security as impeding their businesses (http://www.theregister.co.uk/2016/02/17/cyber_security/), how do we address the governance culture to ensure that organisations manage information?

Is ethics the correct approach to governance?

It is often thought that we need an ethical approach to managing information, yet this is something that is intrinsic within the legal framework of English and Welsh law through Common Law.  The torts of negligence and confidence play strongly in the context of this debate, so maybe the answer is to use the existing legal system in its entire context to ensure that we drive the behaviours that we expect from those organisations who acquire and share data.

With government appearing to create canonical datasets, maybe the time has arrived to create a layered approach to governance that starts with basic activities derived from the legal obligations placed upon different types of organisation, and then looking towards the information acquired and processed for further requirements to avoid duplication and deliver stability in the compliance regimes?

This layered approach would deliver awareness of not only requirements but the benefits of managing information from the board room down through the organisation to the functions delivering and supporting services.

Posted by

in

,