I was recently asked to speak on the information management and data protection implications of Artificial Intelligence for the Information and Records Management Society (IRMS) Public Sector Group. It was a fascinating discussion exploring how AI is very much in your world, whether you realise it or not.

AI is often seen as something infinitely more complex than what is in our ‘every day’ working or personal lives. When someone says, ‘Artificial Intelligence’ typically you picture a dancing robot or Sophia, the so called ‘worlds smartest robot’. Sophia has even been interviewed on the ITV programme ‘This Morning’ (search for it on YouTube, it’s really weird!).

In actuality however, AI in its full and wide sense is widely used and deployed within organisations. Surprised? Panicking that ‘Skynet’ is taking over? Not to worry, its nothing apocalyptic just yet.

The term ‘Artificial Intelligence’ actually covers a wide range of machine-based learning and functionality (see below).

As you can see, there are a range of services that your organisation may already be using, and it doesn’t even realise it. Especially around speech recognition, productive analytics, scheduling and programmes that run an advanced version of ‘if this then that’.

‘Machine Learning’ as it’s also known, has been coming for some time. Do you remember clippy? The handy little annoying assistant that was in Office 98? Well, he used to monitor what you were doing and give you handy little suggestions. While it was annoying (as it was a very basic form of machine learning) it was, none the less, a form of machine learning.

Machine Learning falls into 2 types of machine-based learning. The first is known as ‘supervised learning’. This is where the ‘machine’ (usually a piece of software) studies how something works and looks to replicate it. It often allows you to collect data or produce a data output from a previous deployment as it knows what you are doing and can replicate it.

The other type is ‘unsupervised machine learning’ where the ‘machine’ is given ‘free reign’ (essentially) to explore how a system/process etc works and can help you find all sorts of known and unknown patterns in the data it collates and reviews. It uses what it learns to determine if a pattern is present and summarise to you for further investigation.

Machine learning, in its most basic form, has huge potential benefit for organisations. From being used to help detect patterns in the climate right the way through to being able to recognise data types and recommend classifications or document controls.

And like most things in life, with great power comes great responsibility and this is where the challenges around such technology starts to present itself.

Starting with Microsoft Office’s ‘Cortana’, in most basic Microsoft Office packages Cortana is included and works with your tools and processes. Cortana was originally the AI from their hit game series ‘Halo’ and has now become the ‘face’ or brand for their AI/voice assistant programme.

In order for Microsoft’s AI (be that Cortana or Viva that was announced recently) it needs access to data within your diary, email, drives, SharePoint sites and individual files in order to determine both your ‘productivity’ but also to give you suggestions on how to manage your own health. Spending too long in front of a screen for example, or even analysing the language used in emails and flagging any stress or aggressive tones. If this AI is scanning all this data, it’s learning from it. In humans, we call that memory stored in our brain, in an AI that’s data stored within its programming.

From your phone that can scan and detect your heart rate and determine if you need medical attention to the ‘mutant’ algorithm in the national exam results programme in 2020 the one thing AI needs to be even remotely effective is data. And data comes with complications.

I believe it was the Borg Queen in the film Star Trek First Contact that said to the android character, ‘Data’:

“you are an imperfect being, created by an imperfect being. Finding your weakness is only a matter of time”.

Star Trek First Contact – Paramount Pictures

For me, this summarises one of the biggest information and data management challenges with AI – poor quality data! For years the Information, Records and Data communities have been highlighting the issues of unstructured data, but now is the rise of the structured data issues. As we rely more and more on structured data to run our lives, we are seeing more and more challenges around both collecting and managing good quality data. From poor system design to just poor data quality with different data types for the same purpose. For example, do all your systems record date of birth in exactly the same way? Seems like a small thing but get that translation between 1 system and another wrong and that becomes a quality issue – unless we put things in place to stop it!

In order for an AI system to ‘learn’ what we do it needs clearly defined and structured data. If what you put in is ‘rubbish’ for want of a better word, either due to lack of care or simply because that data was never designed for such a purpose, then you will get rubbish out (Henry Ford).

The practice of Data Management can help with that. It can challenge and agree controls for the entire data lifecycle so you can get control of your data and make sure it is indeed fit for such purposes. It can also help challenge and route out any issues with your data referred to as ‘data biases’.

A few recent reviews of using predictive analytics or facial recognition software have revealed that, either through unconscious design, or through misleading data, the software has developed a bias based on either race, gender or religion. A bias that, on scrutiny, doesn’t hold up to challenge and review.

For example, in a policing context a police force would (quite rightly) deploy its resources based on where it thinks it would need them. Therefore, out of 10 officers, 9 would cover one area and 1 would cover another. The data would show that crime rates are higher for those in the area with the most officers (at face value) that it would be for the area with one. Unless we factor in for the disparity in resources, and instead look at arrests per officer across the 10 officers as the average (for example), our data will always be misleading. Of course, 9 officers will report more crimes, they are more of them and they cover a wider area. That, in of itself, doesn’t mean X area is worse for crime than Y area.

The more we come to rely on data the more we have to understand the nature of it. Where does it come from? How does it get stored? Shared? Analysed? Reused? How do you account for ‘the human factor’ in data?

Now that’s just from a Data Quality point of view! Now look at this through a Data Protection lense. How can we, for example:

  • Justify under law using such technology to directly affect an individual? Is it ever fair to mandate under the law? Can you even get consent if the person has no idea what the AI will be doing with their data?
  • Without controls, how can we ensure data creep doesn’t occur? How can we ensure transparency on how a decision was reached by the programme?
  • If the programme learns by doing, without controls in place to remove personal data from its memory it can store the memory of X thing that it learned from and factored into its algorithm.

To name but a few areas. And fundamentally, how do we information professionals even begin to assess the risk and provide support to the organisation on something that, for more and more of us, is way out of our comfort zone and experience? Many of us are still getting our head around things like Blockchain and Sharepoint – now this?

So, what can we do about it? Well, in short, all of use that work with information have to widen our skills to understanding data too. The practice of Data Management is very similar to Information and Records Management, so the basics are fairly easy to learn and relate to. And if we get the basics right, that makes some of this otherwise daunting technology, a little more ‘manageable’ to digest and pull apart.

AI is very much here in one form or another. There is a range of resources out there already to help people. While the ICO has been criticised for producing this AI guidance, their guidance and others (see links below) is useful to read and understand. (I just wish they had also focused on the other basics stuff that still requires attention as well as this! But I digress).

I suspect I’ll do some more blogs, research and webinars on this as it is fascinating. The potential for ‘machine learning’ and indeed ‘AI’ for organisations is huge. And the challenges for ensuring we get it right, do things right by our staff, customers, patients etc and avoid some of the pitfalls are equally as huge.

You can view the full recording of the IRMS Public Sector Group webinar where I go into more detail on some of these points via the IRMS YouTube channel.

Lighthouse IG is a data specialist consultancy supporting organisations across various sectors with the data related challenges. With experience with working with organisations of various sizes we can provide practical simple advice to help navigate, what can sometimes be, a complex sea of requirements on how to effectively handle data and information. For more information, see www.lighthouseig.com/services.