Last Updated May 28, 2024


Trends and Developments


Stirling & Rose is a first-of-a-kind legal and commercial advisory practice specialising in emerging technology, including artificial intelligence, digital assets, tokenisation, Web 3.0, metaverse, smart legal contracts, data and privacy. The firm has a long track record in some of the most complex transactions and applications of digital law in the market, and has been influential in policy-making and regulation internationally. It serves investors, platform providers, entrepreneurs, financial institutions and governments on the legality and regulation of new digital assets and alternative finance.

The Artificial Intelligence (AI) Economy – Transforming Data Into Profitable Assets

Data is the indispensable resource underpinning an AI-driven economy. Accurate, fair and lawful AI needs vast amounts of high-quality data, with trusted provenance and clear legal attribution. 

The opportunity to capture new value driven by data is emerging in the burgeoning AI economy. Organisations who act early and position themselves to capture this advantage stand to benefit, perhaps significantly. 

Data, the “digital representation of acts, facts or information and any compilation of such acts, facts or information including in the form of sound, visual or audio-visual recording” (European Data Act Article 2(1)), is arguably the most undervalued, underutilised and uncommercialised resource possessed by organisations in this AI age. 

While the world is awash with data, its practical usability and value realisation is predominantly limited to a select group of leading AI enterprises. Data is rarely recognised on corporate balance sheets, and data management is typically viewed as an unavoidable operational cost to the business. 

Data-savvy organisations are poised to reverse this historical trend, transforming data into lucrative, renewable assets and repositioning data management from a financial liability into a centre of profit generation. In 2022, Elon Musk paid USD44 billion to acquire Twitter (now “X”) describing it as the “digital town square” of data; and in 2024, Reddit’s content-licensing deal for AI training purposes is reportedly worth USD60 million annually.

The potential for value realisation in the burgeoning data economy required to support a voracious AI ecosystem has never been more compelling. 

Indispensable Role of Data in AI

Data is the enabling resource of machine learning (ML), a sub-branch of AI where machines learn the patterns inherent in large swathes of data (“training data”) to create AI models. ML is comparable to teaching a child through examples and experience rather than by providing explicit instructions. The quality of those examples and experience – ie, the quality of the data – directly and materially affects the accuracy and performance of the resultant AI model. 

AI enabled by big data (data with massive volume, velocity and variety) demonstrates the significant leaps in AI capability exhibited by multi-modal foundation models such as GPT-4, Claude 3 and Gemini Alpha. 

Without data, AI cannot be developed. Without fresh, current data, existing AI applications fall into the error and obsolescence of model drift. Without contextually relevant, representative data, AI struggles to generalise training data into real-world settings, resulting in accuracy errors, fairness failures, and reinforcement of existing harmful biases and discrimination. These issues expose organisations to reputational and legal consequences under civil laws and regulation. Responsible data is a precondition for responsible AI. 

End of “Free” AI Training Data and Dawn of the New Data-Infused AI Economy

To date, large tech companies have extracted “free” data sourced from the internet for foundational AI model training. This practice is now subject to legal scrutiny, with human creators asserting rights to compensation in numerous multi-billion-dollar lawsuits, including:

  • New York Times v OpenAI;
  • Authors Guild v OpenAI; and
  • class action lawsuits against Stability AI, Midjourney and DeviantArt.   

Foundational model providers have responded by offering liability protection to users against copyright infringement lawsuits, or by differentiating their generative AI product as being trained only on data where the creators have expressly consented. 

Some organisations now favour procurement of AI solutions trained on expressly permissioned, lawfully provenanced data sets. As environmental, social and governance (ESG) aspects continue to gain prominence and momentum, this trend will increase. 

The data governance imperative is further bolstered by the recently adopted EU AI Act, which requires providers of general-purpose AI systems caught by the Act to “draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model” (Article 53, paragraph 1(d). Further, providers of high-risk AI systems must:

  • govern “data collection processes and the origin of data, and in the case of personal data, the original purpose of the data collection” (Article 10, paragraph 2(b)); and 
  • implement “systems and processes for data management, including data acquisition, data collection… data retention and any other operation regarding the data that is performed before and for the purpose of the placing on the market or the putting into service of high-risk AI systems” (Article 17, paragraph 1(f)). 

Despite the absence of comprehensive AI-specific regulation elsewhere, numerous jurisdictions have adopted principles of responsible AI development, such as fairness, transparency and accountability, confirming that ethical sourcing and curation of training data is foundational for ensuring ethical AI development.

The era of unrestricted internet scraping appears to be drawing to a close, leading to a scenario where the significance and value of authenticated data is poised to rise, perhaps exponentially. 

Who might benefit from this increase in data value?

Though Valuable, Data Is Not Recognised As Property

Despite the frequently expressed yet erroneous statement “I own my data”, data is not recognised as property in numerous legal systems around the world (save for statutory recognition of databases in certain jurisdictions). This includes the EU, the USA, Singapore and Australia. This lack of recognition is significant because, were data to be recognised as property, data holders would possess powerful proprietary rights capable of being enforced against the world at large. They do not.

In the absence of explicit property rights, the legal mechanisms to control data are a nuanced interplay of:

  • intellectual property (IP) rights (eg, copyright claims in recent litigation); 
  • statutory characterisation of databases as property; 
  • common law principles of trade secrets and confidentiality; and 
  • regulatory obligations relating to privacy (eg, the General Data Protection Regulation (GDPR)) and data portability (the Australian Consumer Data Right and the European Data Act). 

Against this complex legal backdrop, clear evidence of data provenance and legally authenticated rights to control data materially enhance data value. With appropriate infrastructure and systems, verifiable documentation, authentication or other cogent evidence detailing the data’s origin, its life cycle and authenticity may be obtained as follows: 

  • directly from the data source without the involvement of any other party (eg, on-premises data collection utilising edge computing);
  • indirectly from a complex lineage evidenced by numerous intermingled contractual arrangements and legal terms (eg, provider of sensor, user of sensor, landholder on which sensor is located, connecting infrastructure, cloud storage);
  • by exercise of data portability rights established by legislation (eg, the Australian Consumer Data Right and the European Data Act); and
  • under explicit contractual arrangements (eg, licensing, data-sharing). 

Entities seeking to enhance the commercial value of data may do so by addressing the following three key legal concerns:

  • clear evidence of legal entitlement to confer the data rights sought; 
  • clear evidence confirming data quality is as described – for example, evidence that human-created data is authentic and not synthetic (ie, machine-generated) or that weather data has been authentically collected from a specified location; and
  • ability to identify, gather evidence and enforce any potential breaches by the counterparty of any contractual restrictions imposed on the data rights.

AI Training Data – Quality Authentication

The quality of AI training data, including characteristics such as accuracy, completeness, fidelity and interoperability, has always been a fundamental focus for AI development. The EU AI Act sharpens the focus on training data quality by requiring training data to “take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting within which the high-risk AI system is intended to be used” (Article 10, paragraph 4). Where AI systems are trained and tested on data reflecting these specific settings, “they are presumed to be in compliance with these requirements (Article 42, paragraph 1). This is separate to any conformity assessment of the high-risk AI system under Article 43.

Training, validation and testing data sets are required to be “relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose” (Article 10, paragraph 3). 

Isolation Requirements for Testing Data Sets – a Niche Commercialisation Opportunity

More specifically, AI accuracy metrics (Article 15, paragraph 3) required under the EU AI Act rely on isolation of the testing data to ensure that it does not include any data previously utilised for training the AI. It is only this isolation of the testing data that enables it to be “used for providing an independent evaluation of the AI system in order to confirm the expected performance of that system before its placing on the market or putting into service” (Article 3(32)). 

The necessity of quarantined testing data unveils a new commercial opportunity for data holders who do not wish their data to be used to train AI models, but who are open to using such commercial opportunities as a dedicated resource for testing purposes. 

Beyond the scope of regulatory mandates, opportunities exist for institutions to curate their own testing data sets, enabling them to rigorously evaluate multiple AI models and to select the most effective solution for their specific needs. Data provenance is essential for instilling confidence in testing sets used for AI evaluation, ensuring the integrity and reliability of the testing data. 

Data holders are increasingly opting for contractual arrangements to bring clarity to data rights and obligations. This is particularly the case with contractual arrangements between users and sophisticated providers of devices connected to the internet of things (IoT). 

IoT and the European Data Act

The proliferation of IoT devices and products connected to the internet underpins an exponential increase in data available for AI training. By 2025, 79.4 zettabytes of data is expected to be generated by an estimated 41.6 billion IoT devices (IDC). IoT sensors capture multifarious data from a vast array of sources – expansive industrial operations, satellite networks, vehicles, health applications and consumer white goods in ordinary households. The creation, collection and accumulation of IoT data involves numerous players – at its simplest level, the provider of the IoT device and the user. In the absence of contractual arrangements, identifying the party who lawfully controls this data is legally fraught. 

Data control arrangements are well established in contracts between original equipment manufacturers (OEMs) of sophisticated machinery that incorporates IoT devices and their customers. Big data generated from the equipment used by large numbers of customers is captured and utilised by the OEM to train AI models capable of improving equipment performance and revenue. The data benefit typically flows to the OEM under contractual provisions entitling OEMs to control data collection and access. This OEM control of data has the potential to lock in customers, including for costly aftermarket services. 

In order to enhance fairness in the digital economy and to “prevent the exploitation of contractual imbalances that hinder fair access and use of data”, the EU passed the European Data Act, to ensure that “users of a connected product or related service in the EU can access in a timely manner the data generated by the use of that connected product or related service” (Recital 5). 

In addition, data holders must “make data available to… third parties of the user’s choice in certain circumstances” (Recital 5). This data-sharing requirement enables customers to shift more easily between device providers and to investigate alternatives beyond the OEM for after-market services.   

OEMs captured by this Act must implement technical arrangements to facilitate the data-sharing mandated by this regulation. These required technical arrangements offer the opportunity to deploy data infrastructure and digital solutions for addressing the data provenance issue. 

Ten Key Actions to Capture Data Value in the AI Age

In today’s increasingly AI-fuelled ecosystem, opportunities for leveraging value from data abound. Even where organisations do not pursue data commercialisation, positioning an organisation to defend against any allegations of data misuse or regulatory non-compliance are important risk-management strategies. 

Ten key actions for organisations to consider are set out below.

  • Educate the organisation on the value of data to AI development, both as training data for organisational AI and as commercialisable assets. This education should include how data value can be enhanced by increased accuracy, completeness, fidelity, interoperability, provenance and legal certainty (ie, quality), together with information about regulatory requirements.
  • Reconsider data strategy to position the organisation to capture new opportunities for data value in an AI-infused ecosystem. 
  • Refresh the risk-control matrix with a data-focused lens to capture and implement control mechanisms for new and emerging data-related risks, such as non-compliance with new regulations and claims of data misuse (such as breach of data-use restrictions). Arguably, the greatest risk is a failure to capture, use and commercialise data for competitive advantage. 
  • Amend standard procurement contracts to clearly clarify control of data created by the contract. Ensure that the organisation has clear rights of use and commercialisation, and add standard provisions aimed at enhancing the quality of the data created or collected under the contract. Create a data-focused procurement philosophy where opportunities to procure data or increase quality data are appropriately balanced against associated costs.
  • Evaluate all contractual arrangements with a data-focused lens to ensure that adequate rights of control are granted. Be aware of the increasing risk of contractual use restrictions. Consider implementing a “by exception only” policy on accepting use restrictions. 
  • Ensure that IP rights in data are clearly granted to the organisation created under employment contracts, professional and other service contracts, and procurement and commercial contracts, to guarantee a strong, documented defence against any claims of infringement. 
  • Review consents to use personal data against expected and edge use in the context of the new data economy, changing regulatory requirements and ethical considerations. 
  • Stay abreast of regulatory developments such as the EU AI Act and the EU Data Act. Even if the organisation is not captured by the jurisdictional reach of these regulations, they provide valuable guidance on the possible direction of local regulation. Whether these acts will have the Brussels effect of being complied with on a global scale as de facto global regulations remains to be seen.
  • Invest in infrastructure, and in associated configurations and connections, to facilitate cogent evidence of the origin and life cycle of data, including any downstream licensing or assignments of data rights. 
  • Establish data-sharing initiatives with trusted collaborators, to create enhanced data sets with increased variety, volume and value for use in AI training, validation and testing. 

As organisations navigate and capitalise on the new AI economy, data quality and legal authentication emerge as indispensable requirements for harnessing the potential of data for training AI models and realising data on corporate balance sheets. Organisations that proactively address these aspects will effectively navigate the complexities of the AI age, and will unlock unprecedented opportunities for innovation, competitive advantage and a fairer, ESG-aligned sharing of the digital economy. The journey ahead is fraught with challenges, but is also replete with opportunities for those prepared to navigate the complexities of data with diligence, foresight and a commitment to lawful and ethical data practices. 

Trends and Developments


