The Artificial Intelligence (AI) Economy – Transforming Data Into Profitable Assets
Data is the indispensable resource underpinning an AI-driven economy. Accurate, fair and lawful AI needs vast amounts of high-quality data, with trusted provenance and clear legal attribution.
The opportunity to capture new value driven by data is emerging in the burgeoning AI economy. Organisations who act early and position themselves to capture this advantage stand to benefit, perhaps significantly.
Data, the “digital representation of acts, facts or information and any compilation of such acts, facts or information including in the form of sound, visual or audio-visual recording” (European Data Act Article 2(1)), is arguably the most undervalued, underutilised and uncommercialised resource possessed by organisations in this AI age.
While the world is awash with data, its practical usability and value realisation is predominantly limited to a select group of leading AI enterprises. Data is rarely recognised on corporate balance sheets, and data management is typically viewed as an unavoidable operational cost to the business.
Data-savvy organisations are poised to reverse this historical trend, transforming data into lucrative, renewable assets and repositioning data management from a financial liability into a centre of profit generation. In 2022, Elon Musk paid USD44 billion to acquire Twitter (now “X”) describing it as the “digital town square” of data; and in 2024, Reddit’s content-licensing deal for AI training purposes is reportedly worth USD60 million annually.
The potential for value realisation in the burgeoning data economy required to support a voracious AI ecosystem has never been more compelling.
Indispensable Role of Data in AI
Data is the enabling resource of machine learning (ML), a sub-branch of AI where machines learn the patterns inherent in large swathes of data (“training data”) to create AI models. ML is comparable to teaching a child through examples and experience rather than by providing explicit instructions. The quality of those examples and experience – ie, the quality of the data – directly and materially affects the accuracy and performance of the resultant AI model.
AI enabled by big data (data with massive volume, velocity and variety) demonstrates the significant leaps in AI capability exhibited by multi-modal foundation models such as GPT-4, Claude 3 and Gemini Alpha.
Without data, AI cannot be developed. Without fresh, current data, existing AI applications fall into the error and obsolescence of model drift. Without contextually relevant, representative data, AI struggles to generalise training data into real-world settings, resulting in accuracy errors, fairness failures, and reinforcement of existing harmful biases and discrimination. These issues expose organisations to reputational and legal consequences under civil laws and regulation. Responsible data is a precondition for responsible AI.
End of “Free” AI Training Data and Dawn of the New Data-Infused AI Economy
To date, large tech companies have extracted “free” data sourced from the internet for foundational AI model training. This practice is now subject to legal scrutiny, with human creators asserting rights to compensation in numerous multi-billion-dollar lawsuits, including:
Foundational model providers have responded by offering liability protection to users against copyright infringement lawsuits, or by differentiating their generative AI product as being trained only on data where the creators have expressly consented.
Some organisations now favour procurement of AI solutions trained on expressly permissioned, lawfully provenanced data sets. As environmental, social and governance (ESG) aspects continue to gain prominence and momentum, this trend will increase.
The data governance imperative is further bolstered by the recently adopted EU AI Act, which requires providers of general-purpose AI systems caught by the Act to “draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model” (Article 53, paragraph 1(d). Further, providers of high-risk AI systems must:
Despite the absence of comprehensive AI-specific regulation elsewhere, numerous jurisdictions have adopted principles of responsible AI development, such as fairness, transparency and accountability, confirming that ethical sourcing and curation of training data is foundational for ensuring ethical AI development.
The era of unrestricted internet scraping appears to be drawing to a close, leading to a scenario where the significance and value of authenticated data is poised to rise, perhaps exponentially.
Who might benefit from this increase in data value?
Though Valuable, Data Is Not Recognised As Property
Despite the frequently expressed yet erroneous statement “I own my data”, data is not recognised as property in numerous legal systems around the world (save for statutory recognition of databases in certain jurisdictions). This includes the EU, the USA, Singapore and Australia. This lack of recognition is significant because, were data to be recognised as property, data holders would possess powerful proprietary rights capable of being enforced against the world at large. They do not.
In the absence of explicit property rights, the legal mechanisms to control data are a nuanced interplay of:
Against this complex legal backdrop, clear evidence of data provenance and legally authenticated rights to control data materially enhance data value. With appropriate infrastructure and systems, verifiable documentation, authentication or other cogent evidence detailing the data’s origin, its life cycle and authenticity may be obtained as follows:
Entities seeking to enhance the commercial value of data may do so by addressing the following three key legal concerns:
AI Training Data – Quality Authentication
The quality of AI training data, including characteristics such as accuracy, completeness, fidelity and interoperability, has always been a fundamental focus for AI development. The EU AI Act sharpens the focus on training data quality by requiring training data to “take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting within which the high-risk AI system is intended to be used” (Article 10, paragraph 4). Where AI systems are trained and tested on data reflecting these specific settings, “they are presumed to be in compliance with these requirements (Article 42, paragraph 1). This is separate to any conformity assessment of the high-risk AI system under Article 43.
Training, validation and testing data sets are required to be “relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose” (Article 10, paragraph 3).
Isolation Requirements for Testing Data Sets – a Niche Commercialisation Opportunity
More specifically, AI accuracy metrics (Article 15, paragraph 3) required under the EU AI Act rely on isolation of the testing data to ensure that it does not include any data previously utilised for training the AI. It is only this isolation of the testing data that enables it to be “used for providing an independent evaluation of the AI system in order to confirm the expected performance of that system before its placing on the market or putting into service” (Article 3(32)).
The necessity of quarantined testing data unveils a new commercial opportunity for data holders who do not wish their data to be used to train AI models, but who are open to using such commercial opportunities as a dedicated resource for testing purposes.
Beyond the scope of regulatory mandates, opportunities exist for institutions to curate their own testing data sets, enabling them to rigorously evaluate multiple AI models and to select the most effective solution for their specific needs. Data provenance is essential for instilling confidence in testing sets used for AI evaluation, ensuring the integrity and reliability of the testing data.
Data holders are increasingly opting for contractual arrangements to bring clarity to data rights and obligations. This is particularly the case with contractual arrangements between users and sophisticated providers of devices connected to the internet of things (IoT).
IoT and the European Data Act
The proliferation of IoT devices and products connected to the internet underpins an exponential increase in data available for AI training. By 2025, 79.4 zettabytes of data is expected to be generated by an estimated 41.6 billion IoT devices (IDC). IoT sensors capture multifarious data from a vast array of sources – expansive industrial operations, satellite networks, vehicles, health applications and consumer white goods in ordinary households. The creation, collection and accumulation of IoT data involves numerous players – at its simplest level, the provider of the IoT device and the user. In the absence of contractual arrangements, identifying the party who lawfully controls this data is legally fraught.
Data control arrangements are well established in contracts between original equipment manufacturers (OEMs) of sophisticated machinery that incorporates IoT devices and their customers. Big data generated from the equipment used by large numbers of customers is captured and utilised by the OEM to train AI models capable of improving equipment performance and revenue. The data benefit typically flows to the OEM under contractual provisions entitling OEMs to control data collection and access. This OEM control of data has the potential to lock in customers, including for costly aftermarket services.
In order to enhance fairness in the digital economy and to “prevent the exploitation of contractual imbalances that hinder fair access and use of data”, the EU passed the European Data Act, to ensure that “users of a connected product or related service in the EU can access in a timely manner the data generated by the use of that connected product or related service” (Recital 5).
In addition, data holders must “make data available to… third parties of the user’s choice in certain circumstances” (Recital 5). This data-sharing requirement enables customers to shift more easily between device providers and to investigate alternatives beyond the OEM for after-market services.
OEMs captured by this Act must implement technical arrangements to facilitate the data-sharing mandated by this regulation. These required technical arrangements offer the opportunity to deploy data infrastructure and digital solutions for addressing the data provenance issue.
Ten Key Actions to Capture Data Value in the AI Age
In today’s increasingly AI-fuelled ecosystem, opportunities for leveraging value from data abound. Even where organisations do not pursue data commercialisation, positioning an organisation to defend against any allegations of data misuse or regulatory non-compliance are important risk-management strategies.
Ten key actions for organisations to consider are set out below.
As organisations navigate and capitalise on the new AI economy, data quality and legal authentication emerge as indispensable requirements for harnessing the potential of data for training AI models and realising data on corporate balance sheets. Organisations that proactively address these aspects will effectively navigate the complexities of the AI age, and will unlock unprecedented opportunities for innovation, competitive advantage and a fairer, ESG-aligned sharing of the digital economy. The journey ahead is fraught with challenges, but is also replete with opportunities for those prepared to navigate the complexities of data with diligence, foresight and a commitment to lawful and ethical data practices.
Level 4/11 York St
Sydney NSW 2000
Australia
+61 1800 178 218
info@stirlingandrose.com www.stirlingandrose.com