How AI Risks Widening the Justice Gap

Published: July 17, 2023 • AI

The rapid development of artificial intelligence systems like ChatGPT is raising urgent concerns about their ability to provide accurate legal advice. As the president of the UK’s Supreme Court recently noted, some litigants relying solely on AI tools have submitted completely fictional claims, inadvertently misleading the court. At first glance, limiting public use of legal AI systems seems a pragmatic response to prevent spreading misinformation. However, this overlooks a deeper systemic issue underpinning these tools’ development – the lack of open access to raw legal data that is hindering their potential.

The root of inaccurate or misleading AI legal advice stems from insufficient training data. Experts widely agree that training AI systems on large corpora of relevant legal documents, such as past court judgments, legislation, case law, and legal analysis can dramatically improve the accuracy and nuance of their outputs. However, in England and Wales, access to the most comprehensive collections of these materials is tightly controlled by private legal publishers, who charge substantial premiums to subscribers for access. The raw data trapped behind these paywalls vastly outstrips what is freely available to the public. For example, a 2022 study found that only around half of all judicial review judgments are accessible on BAILII, a widely used free legal database.

This asymmetry of access between paid legal data and freely available public data entrenches existing inequalities within the legal system. Simply put, the most powerful and sophisticated legal AI systems will be developed primarily by well-resourced groups who already hold substantial legal data – private publishers, insurance companies, and bulk litigants. Those unable to afford legal help or data access, such as individual citizens and underfunded non-profits, will be left even further behind as AI augments the capabilities of already dominant parties.

The outsized advantage conferred by pre-existing access to legal data is evident in Thomson Reuters’ recent acquisition of Casetext for $650 million. Casetext is a relatively small legal AI firm, which has only recently begun specializing in natural language generation models like ChatGPT. Yet Thomson Reuters executives justified the massive pricetag primarily on the basis that Casetext was “one of a very few sets of companies out there that also have access to data.” This demonstrates how holding scarce legal data can translate directly into disproportionate influence in shaping the future development of legal AI, even beyond direct revenue generation.

Access to justice and access to legal data are becoming increasingly intertwined in the age of AI. While clause 40 of the Magna Carta prohibits selling or denying justice directly, selling exclusive access to the raw outputs of the legal system – judgments, decisions, materials – grows ever more consequential. If we want to truly harness the potential of AI to increase access and fairness within the law, enabling open public access to raw legal data is the place to start. New urgency is needed to mandate publishing the full text of all judgments, or risk further entrenching inequality by allowing legal AI to be dominated by a select few.

Beyond judgments, legislation and case law should also be opened up as machine-readable structured data, not just digitized PDFs of printed pages. Data standards like Akoma Ntoso enable linking legal concepts across documents, which is vital for training sophisticated AI. The UK has already begun standardizing legislation this way. However, the majority of legislation worldwide remains locked in print-centric PDFs that are difficult to computationally analyze at scale.

Enabling public access to raw legal data aligns with the UK’s commitment to open justice, which upholds that transparency and scrutiny are vital for a just legal system. However, publishing select “significant” cases on a new government site still omits a full quarter of recent judgments. Unless urgent action is taken, this risks creating a two-tiered system, where only those who can pay for legal data gain the full context.

Similarly, bulk access fees for historical cases from the British and Irish Legal Information Institute (BAILII) database pose barriers. Waiving these for non-profits would enable valuable public interest AI applications. Continued dependency on voluntary databases like BAILII is precarious, as they may not withstand competition from private alternatives. Properly funding free public access to legal data is critical.

We must also be cautious about private legal AI systems amplifying embedded biases, or having unintended impacts on due process. But rather than prohibiting access, the solution is intentional stewardship. Establishing representative oversight bodies and external audits could help ensure AI systems meet public interest goals around fairness, accountability, and transparency.

Legal AI is at a crossroads that will shape its trajectory for years to come. Without urgent action to guarantee open public access to legal data, these technologies risk primarily benefiting groups that already hold power. If we believe in equal access to justice, publishing all primary legal materials must be the baseline. The alternative is an increasingly uneven legal landscape, where AI exacerbates rather than resolves existing inequalities. The choice we make today will determine whether legal AI remains an elite tool, or truly expands access for all.

How AI Risks Widening the Justice Gap

Try a live AI governance workroom

Related Resources