How to Adopt Compliant AI

We look at the issues, the risks and the steps needed to remain compliant while adopting AI.

Organisations hurry to implement AI-related features in their products and business tools. No one wants to miss out and little can be done to stop the use of generative AI by the workforce. Earlier this month, Samsung has banned ChatGPT internally amidst a data breach, but only to announce development of its own AI tools.

Only a year ago, the smartest technologies and tools in the market could barely rally interest. Finding use cases for clunky early machine learning models trained on labelled data only providing limited output was not worth the disruption to existing corporate processes.

Over the last six months Artificial Intelligence (AI) has proven generative capabilities opening up vast opportunities for application, embraced by organisations with historic enthusiasm.

So what has changed?

AI as a computer science has the goal to create programs that can perform tasks generally performed by humans and is often associated with machine learning, predictive analytics, natural language processing and other terms.

Some AI has been around for a while, such as facial recognition, security monitoring of networks, transactions or emails, language translation, and speech recognition. Input datasets had to undergo labelling by humans before being used to train the AI model through supervised learning reinforced by a review of outputs. This was time consuming, expensive and dependent on the quality of labelling and human review.

However, recent advancements in neural networks, particularly with the introduction of the transformer deep learning model in 2017, enabled the creation of foundation models such as GPT3 (generative pre-trained transformer), also thanks to access to powerful GPU-powered hardware. Unsupervised pre-training on large unlabelled datasets combined with supervised discriminative training or reinforced training, helped establish a large number of essential parameters (1.6 trillion for GPT4 according to unofficial sources) which enable the model to weigh the significance of each part of input data and provide nuanced outputs.

Today, major providers such as OpenAI, IBM, and Google are offering (or preparing to offer) foundation models which can be adapted to any desired downstream task, to follow instructions, summarise documents, generate novel human-like content, and much more. In addition, there are plenty of resellers who will help customise the models for their clients’ needs.

But how to implement AI in a compliant way?

It seems that currently the answer to this question lies in AI governance which should address the following risks throughout the AI lifecycle including its creation, continuous improvement, operation and use by end users.

Local laws. Whilst the risks will vary in different countries, many GDPR-clone data protection laws around the world will likely be triggered by introducing an AI-based system. Particularly, where the system produces outputs for wider dissemination, the legal risk increases as breaches of laws other than data protection laws could arise.

Input data. It is reported that much of AI creators’ input data was scraped from the internet without a licence. Perhaps the foundation models only exist because the creator acted in breach of local laws or website terms. While this may not have an impact on your organisation (unless the training data results in bias or inefficiencies), any further training and benchmark data used by your organisation to tweak the foundation model, must be processed in compliance with your local data privacy laws, often even if sourced from another jurisdiction.

Data minimisation. AI is trained on vast amounts of data further scaled up with unsupervised learning which no longer requires human input. AI systems designed to train other AI systems brought a leap to efficiency. Nevertheless, an effort should be made to limit training data to what is necessary to achieve the objective, where this is required under local data privacy laws.

Data rights. Individuals might have the right to opt-out from having their data included in a training dataset. Data ethics dictates that an effort should be made to allow individuals to exercise their rights before data is anonymised and turned into a training dataset. The actual data rights will depend on local data privacy laws.

Accuracy risk. AI is good at pattern recognition and reproduction of say text but today’s “narrow AI” is still decades away from any artificial general intelligence (AGI) which one day might critically evaluate its own responses and ensure accuracy. AI should not be used to make wholesale decisions or relied on without a qualified human supervisor.

Auditability or explainability. By its very nature, the hidden layers of a neural net are a black box. Our ability to explain what goes on inside will be difficult. Nevertheless, organisations purchasing AI systems might want to know how the technology works to mitigate legal risks, inform users about it and implement testing and controls to comply with the law. It will be important to discuss these requirements with the AI provider at the outset, as without it the organisation may not be able to comply with local data privacy laws.

Security. Most new technology is vulnerable to attacks. It is important that any AI system is deployed on secure infrastructure without sharing input data with the provider or, if data sharing cannot be avoided, by using secure multiparty computation or homomorphic encryption to safeguard the data.

Anonymisation prior or post ingestion. Your provider might offer stripping your data of any personal identifiers before using it for training. However, doing so at application level means that you already shared the data with the provider who might use the data for its own purposes. Depending on the circumstances, this could be a risk.

Confidentiality & Intellectual Property. When using a public AI model, data input by users might become part of the AI library. Only snippets of data would be kept. For example, GPT3 model requires 800GB to store but was apparently trained on 40TB of data. However, if such input data included confidential client data or trade secrets, it could lead to a data breach, a breach of confidentiality or an infringement of intellectual property.

Breach of intellectual property rights & Indemnity. Some AI platforms may seek an indemnity to safeguard from third party claims arising from your use of their AI. This could be a particular risk if you use the AI to generate content for wider distribution and the AI happens to breach somebody’s copyright.

Contractual risk. Organisations should not accept AI provider terms as some kind of industry standard, as none currently exists. Appropriate commitments in relation to compliance with the law, non-infringement, IP rights in outputs, product liability, transparency, regular benchmarking, issue reporting, continued technical support, liability and other matters, should be sought. Some terms should remain open to renegotiation as compliance requirements are maturing around the world.

Regulated industries. Despite alignment of regulatory stances through the likes of The Digital Regulation Cooperation Forum, inconsistency in regulatory opinions cannot be ruled out as regulators pursue different objectives. Keep up to date with your regulator’s views but also understand if the regulator oversteps its competence.

Automated decisions. Laws such as the UK GDPR set out rules for making automated decisions with significant effect on individuals. Depending on how AI is deployed, a level of human intervention must be ensured if such laws apply to you.

Harms and bias. Given the strength of generative AI models, without setting the parameters of how the AI should be used, there is a risk of harm to vulnerable users more impressionable by the AI’s human-like interactions. Bias will be a problem if the AI system produces offensive or discriminatory outputs or fails to produce an output due to bias. Various laws could be breached by introducing an AI-based service that causes harm to individuals.

Workforce and societal risk. Rolling out AI too quickly could diminish certain professions. The loss of qualified people in a specific sector could mean that there is no one to supervise AI. This could lead to overreliance on AI and our inability to audit and further develop the quality of AI generated outputs.

• Other risks may arise in different circumstances.

AI Governance

Organisations exploring AI should adopt an AI governance programme which will allow them to address all risks arising in the context of their use cases and specific circumstances.

With the growing awareness of the legal and ethical risks, the increasing regulatory interest in the new technology and, generally, the uncertainty of what the near future holds, ignoring the compliance and legal requirements would not seem like a sensible option for any organisation, its officers or shareholders.