The Artificial Intelligence 2023 guide provides the latest legal information on industry use of AI, machine learning, AI regulatory regimes and legislative developments.
Last Updated: May 30, 2023
Legal Issues in AI: An Introduction
Artificial Intelligence (AI) has become integral to today’s business success and requires the adoption of new business and legal practices. The use of this technology is accelerating as companies move from using AI for specific tasks to integrating AI in business operations and thus become AI-enabled companies.
AI-enabled companies use AI internally and externally, tactically and strategically. Internal uses are to reduce costs and to achieve competitive advantage. External uses include monetising data and leveraging a company’s relative power in joint ventures, strategic alliances, and similar business arrangements. Overall, new technology and data analytics platforms are creating fresh ways to leverage corporate assets, deliver new products and services, and create new business opportunities.
This introductory section distinguishes between Generative AI on the one hand and machine learning, deep learning, and neural networks on the other. Machine learning, deep learning, and neural networks are subsets of AI and, for convenience of reference, they will be referred to as “AI” to distinguish these technologies from Generative AI.
AI identifies patterns and correlations in a given dataset in order to generate predictions and insights (and ways to derive insights). Generative AI, in contrast, essentially uses word prediction technology. It creates outputs such as text or images by predicting what is next, based on “large language models” that consist of text, images and other content.
At a high level, Generative AI presents the following issues.
Generative AI is said to generate “hallucinations”– that is, to incorporate fiction or nonsensical features into content. Hallucinations occur because the Generative AI algorithms have learned the given point in an article of a brief at which a footnote or a citation is needed and thus proceed to add footnotes or citations, including some that are not even related to the content in which they are inserted.
Machine learning and other forms of AI ultimately need some bias. If there are no weighting factors, then AI would be essentially the same as a coin toss. AI’s power depends on data.
AI, Data and Due Diligence
Given that it depends on data (and, more importantly, how data is used), AI is best analysed in combination with data. From a business perspective, data is a corporate asset – rather than merely a problem set for privacy and database breaches. From a legal perspective, data issues are multi-dimensional and cross-disciplinary, as they involve different types of data from different sources and are subject to different areas of law simultaneously. This requires cross-functional legal services. A data privacy issue is seldom simply a privacy issue; in most cases, it will also involve IT technology and services, cloud computing, the legal status of digital assets, the structuring of business partnerships, tax issues, etc.
Data has been catapulted into a top business asset class. The following sequence shows how data becomes valuable:
Further, data’s value is determined by how it is used. Data’s meaningfulness and usefulness depend upon the business context – for example, a temperature of 10°C is cold in Fiji and warm in Norway. As the business context changes, so does the meaning and relevance of the data. A business transaction such as a merger, acquisition or divestiture with a new business purpose will create a new context for the data.
In addition to acquiring rights to data, in many circumstances, the acquiring party will also need to acquire AI solutions used by the party that originally controlled the data. This enables the acquiring party to use AI to perform data analytics and other operations.
Due diligence and the structure of agreements will need to change if data is a key business asset in a corporate transaction. Companies should not buy data they will not use, which includes:
In such cases, the acquiring company either cannot use data for its business purpose or is limited in how it can obtain and use the data.
Companies must avoid building a “data museum” – that is, they should not buy legacy data for which there is no future business use. Having such data means that, in the event of a breach or inadvertent misuse under a regulatory regime, liability could be imposed and costs could be incurred (including with regard to personal data) as a result of storing data that is not used in business operations. In this scenario, the company will have effectively bought a lawsuit or regulatory enforcement action rather than a useful business asset. In addition, this data is lacking in meaningfulness or usefulness without the relevant business context.
AI as a Change Agent
As noted, companies use AI strategically to increase business capabilities and enhance their position in the marketplace. Change involves risk, but so does not changing. As previously mentioned, for example, data has a life cycle (especially when used in AI) and keeping outdated data risks greater liability in the event of a data breach. AI requires work with compliance officers in order to effect change. It is easier to build in regulatory compliance than to retrofit after the work is done.
AI as a change agent also requires that the company’s data and AI professionals work in concert with the IT department. For many companies, this combination is a new way of implementing business transformation or, specifically, digital transformation. AI requires data management and a supporting – and often improved – IT infrastructure, with the support of the law department and outside counsel.
AI has also changed how lawyers should draft agreements because AI has changed the ecosystem of technology and data transactions. Effective AI requires an IT infrastructure that delivers data as and when needed. AI uses:
Furthermore, data is not just collected – it is generated (including machine-generated data from machine learning).
The AI ecosystem is a multi-stakeholder, multi-vendor, multi-technology, self-operated and vendor-provided data services environment. To unlock the value of AI, the IT and data ecosystem should not be balkanised but should instead be redesigned to provide increased data interoperability. This, in turn, requires agreements to enable – if not require – co-operation between vendors. By way of an example, corporate law departments should establish and enforce company-wide policies that establish minimum levels of co-operation between vendors and minimise siloed data operations.
Company AI and Data Policies
Data does not manage itself. Algorithms and data analytics need to be monitored for bias. An important function of corporate law departments is creating, enforcing and – as technology and methodologies improve – updating corporate data and AI policies. The policies should combine regulatory compliance with corporate rules for both internal and external use of data and AI analytics. An example of internal use would be to reduce costs and improve the efficiency of operations; an example of external use would be to commercialise data or the results of data analytics.
Company policies should address the following aspects of AI.
Best efforts to discover discriminatory bias
A financial institution that systematically denied mortgages or issued credit cards bearing higher than normal interest rates to people living in a particular neighbourhood – for example, an area associated with ethnic or racial background – would become subject to regulatory scrutiny if said bank based its assumptions about the creditworthiness of an individual on opinions about the neighbourhood rather than on data concerning the specific individual. In such case, the financial institution would be subject to penalties for “redlining” (ie, essentially drawing a red line around a neighbourhood on a map).
The bank could run the same risk if it purchases data from third-party sources to use in making credit terminations and then repurposes it. The third-party dataset might originally have been developed for advertising and could contain classifications of individuals based on such neighbourhoods for advertising purposes. Even though this classification may not be deemed discriminatory for advertising, when the data is repurposed for credit evaluation purposes, the machine learning scoring of individuals would be based on those classifications that could digitally redline – thereby potentially inviting the same regulatory scrutiny. Thus, corporate data policy requires testing to try and uncover discriminatory algorithmic bias.
As part of conducting due diligence of third-party data similar to the foregoing, the corporate policy should require a vetting of the purposes for which a third-party dataset was created so as to determine whether – when repurposed – the dataset will distort the machine learning carried out by the company.
Protecting against disclosure of corporate data that provides a competitive advantage
This means identifying the data when it is in a database. It may also involve establishing access and use privileges in addition to a hierarchy of sensitive data. This can be done on a general corporate level and also created for specific projects in order to address the rights and responsibilities of team members on a particular project.
Protecting against reverse engineering
This involves protecting against disclosing data that, while uncontroversial at face value, may in fact enable a third party to “reverse engineer” the data to discover a proprietary technique – for example, a bank’s trading strategies, a pricing model, or even the identity and sources of critical components for company proprietary products when those components provide a technology or competitive advantage. This may include protecting the company’s supply chain.
Preventing premature disclosure
Companies should prevent premature disclosure of patentable inventions that could prejudice the ability to obtain patent rights.
Structuring the use of proprietary algorithms
The use of proprietary algorithms should be structured to protect the company in joint ventures, strategic alliances and other business partnerships where today’s partner can be tomorrow’s competitor.
Establishing corporate rules
Corporate rules should be established in relation to the following:
Approval process for terms of IT acquisition agreements
This addresses the point made previously concerning AI’s need for an IT infrastructure that meets specific requirements. This part of the policy is to ensure the terms of the agreements with different vendors meet those requirements.
Corporate requirements for licensing company data to third parties
If regulated data is involved or the company is in a regulated industry, the policy is a combination of regulatory compliance, the company’s determination of how it achieves regulatory compliance, and additional corporate policies beyond those dictated by regulatory requirements.
Determining the company personnel to be involved in evaluating requests for proposal from AI and IT vendors and in selecting vendors
As a practical matter, this means involving the chief data officer, the chief analytics officer, the chief digital officer, or others in the company’s data department. These data professionals are not part of the IT organisation and include data modellers, data architects, data integrity officers and data governance officers.
Subpoenas
Companies should establish procedures and escalation paths for responding to subpoenas, as well as establishing AI and data controls to protect against unnecessary disclosure of documents.
Data life cycle management
Companies should determine how long to store data in different categories, including with regard to regulatory compliance and corporate use of data, as well as its value in external monetisation.
AI and Data Issues in Mergers, Acquisitions and Divestitures
Rights to corporate assets are an important aspect of mergers, acquisitions and divestitures. Machine learning algorithms and data are valued corporate assets and, therefore, now play an important role in these transactions.
In divestitures, both the business of the spin-off company and the company divesting the spin-off company will have needs for the data; however, the data rights will initially reside with the divesting company. Issues to be addressed include:
These issues are compounded when the data resides in one database or when data needed by both companies resides in multiple databases in different company buildings or different geographic locations. This is further complicated by the lack of clear legal standards for data ownership. Additional issues include:
Dividing a common database raises the question of which company has exclusive rights to a specific dataset. If this is contested, an arbitrator (or panel of arbitrators) or other neutrals can be designated to decide the allocation. A multiparty panel may be needed to provide the expertise in data science and industry knowledge.
The same issues occur in an acquisition, but in reverse. Issues concern the AI technologies and data that the acquired company is able to provide to the acquiring company and whether the acquiring company needs additional licences or rights from third parties to use the technology and data used by the acquired party.
In both divestitures and acquisitions, a transition service agreement may be required when one of the companies needs to use the IT infrastructure of the other to conduct machine learning for a temporary period or have the other party host data. Transition service agreements used in outsourcing provide a model for how to structure this arrangement. In each case, special due diligence is required to cover the overlapping rights provided under the governing agreements and controlling law.
How “Solid” Internet Specification Applies to AI
“Solid” is an abbreviation of “social linked data” and refers to a set of internet specifications developed by Sir Tim Berners-Lee, the inventor of the World Wide Web, in collaboration with the Massachusetts Institute of Technology. It is a Web3 “web decentralisation project” designed to give individuals more control over which persons and things access and use their data. “Things” refers to the applications on the internet. In this sense, Solid is designed to “fix” the World Wide Web, where individuals currently have limited control over how their data can be used.
Solid makes use of “pods” (personal online data storage). Pods are storage areas controlled by individuals and function as secure personal web servers for data. Each pod has access rules that are specific to it. Individuals have the right to grant or revoke access to data in their pods (an individual can have more than one pod). Any person or application that accesses data in a pod uses a unique ID. Solid’s access control system uses these IDs to check whether an entity or internet application has the right to access and use data in the pod.
The connection between AI and Solid is that an individual can use AI to determine which data to load into the pod. The individual controls the machine learning algorithm and can change algorithms and thus the data loaded into the pod. The algorithm can be trained to screen for data features to be included and excluded from the pod. Given that a pod controls access and use of the data, it indirectly controls the use of a third-party AI to which the pod owner has granted use rights. A related issue is how the individual gains the right to use machine learning algorithms to perform analytics and determine the data to be loaded in the pod.
Proposed Licensing Paradigm: “Decision Rights”
As a practical matter, it is often difficult for parties to a transaction to reach an agreement on ownership of data because the scope of ownership and its status with regard to IP rights is unclear under the present state of the law. A party is often concerned that, by assigning ownership rights, it will be giving up rights it may need in the future. Accordingly, parties focus on sharing data and the scope of use rights under sharing arrangements.
If we shift the focus from ownership to data use, which is often the real issue involved, then we need a legal framework that governs the scope of use and sharing with particularity in order to protect both providers and users of datasets.
The author proposes “Decision Rights” as that legal framework. Decision Rights is a licensing model that defines the purpose of conducting analytics and the use of the results in terms of decisions that can be made based on them. The model also provides the entity controlling the data with a mechanism that grants (and enforces) rights to the same data to different users for different purposes, thus enhancing data monetisation and revenue generation.
Decision Rights protects against regulatory sanctions by putting boundaries on the data use that constrain the use rights on downstream parties. Under a Decision Rights framework, those entities owning or controlling a database would grant a set of rights defined by the decisions that can be made and, if desired, limit the rights to a business unit or even specific individuals. This framework applies to all industries. The following example concerns a digital healthcare scenario.
Hospital No 1 wishes to learn the leading indicators of malnutrition in young children. Hospital No 2 has collected health data from young children in the course of providing medical care. Hospital No 2 can license such health data to Hospital No 1 by defining the scope of use as the right to make decisions only regarding malnutrition. This readily enables Hospital No 2 to grant licences to other entities for other purposes – that is, for the purpose of making other decisions. By way of an example, Hospital No 2 could license all of or some of the child health data to a pharmaceutical company that wishes to create a patient cohort for a clinical trial.
AI, the Internet of Things and Cybersecurity
Connected devices – also referred to as the internet of things (IoT) – generate rich datasets for machine learning. Connected devices, by their nature, introduce cybersecurity risks. These “smart” devices are often relatively simple (eg, a device whose function is to serve as a monitor) and, as such, are subject to cyber-attacks. Moreover, given that these devices are – by definition – connected, cybersecurity risks need to be addressed on three levels.
A device itself (or the IoT of which it is a part) can be an avenue for a cyber-attack, including on the company’s larger IT systems.
In addition, a cyber-attack carries the risk of an intentional malicious change to the data that will adversely affect data analytics and the decisions a company makes based upon them. In this sense, a data cyber-attack can be a form of corporate sabotage.
The Outlook
AI considerations in healthcare and autonomous vehicles will converge. In both fields, AI decisions can lead to bodily injury or even death. AI is used to determine whether it is a shadow or a pedestrian and whether a tumour is malignant or benign. Solutions to hacking risks, enhancing data integrity and uncovering the weight given to specific factors by machine learning models to open the AI “black box” to analysis will be shared, upgraded, and used in both fields because of the overlapping requirements of automotive engineers and physicians. Automobiles will enhance patient care, while healthcare will improve driver safety.
Accelerating advances in AI will accelerate innovation in how companies conduct business but also requires enhanced IT equipment and new technology agreements in order to provide a new type of integration between data, the IoT and machine learning. Machine learning will lead to corporate use of more sophisticated analytics in order to provide deep learning and neural networks.
AI brings great power, but also risks arising both from algorithmic bias and the difficulty of discerning the weight that AI technologies assign to different factors when generating predictions and other output on which businesses rely.
AI as a change agent impacts on how companies staff innovation projects. An AI project requires integration of the skills and services of a company’s data professionals and IT staff in ways that are different from other corporate technology projects. This change is one of the ways in which companies using new AI technology and analytics platforms require cross-functional teams in order to succeed in becoming AI-enabled.