Towards Legal AI in Hong Kong: Intelligent Judgement Prediction with Large Language Models Using Hong Kong Legal Data

Researchers:
KIT, Chunyu
Duration:
-

The primary goal of this project is not only to launch, but more importantly, to promote and enhance legal AI in Hong Kong (HK). Specifically, it will explore the feasibility and practicality of intelligent judgment prediction with large language models (LLMs) using available HK legal data, based on which we will move forward to addresses other challenging issues of applying LLMs to the legal domain, such as the scarcity of labeled data, and the interpretability and reliability of judgment prediction. For this purpose, we will start with our first task to develop HK legal benchmark datasets (using the legal data we have acquired by web crawling) for training and testing LLMs for such tasks as similar case retrieval and judgment prediction at the core of legal AI. This project is of significant impact on various stakeholders, especially potential users of legal AI, including legal practitioners, law firms, judicial bodies, policymakers, government departments and the general public. It will apply the state-of-the-art natural language processing (NLP) technologies to collect, process, and analyze legal texts from HK courts and legislation, in hopes of training machine learning models and LLMs with nice performance for implementing an online judgment prediction platform for practical use, as the major contribution of this project to advancing legal AI and its applications in HK and beyond. Aiming at achieving new knowledge and insights about legal AI, this project is proposed to address three intertwined research questions: How to undertake judgment prediction (1) as multi-class classification with hierarchical labels (each consisting of law article, charge, and penalty) and (2) as automated text generation by LLMs using unlabeled data of available judgments, and (3) how to achieve and ensure the demanded interpretability and reliability of the prediction. Procedurally, we will carry out data preparation on the very large volume of existing judgments by a humanmachine mutually assisted strategy, in order to obtain necessary annotated data to enable the two comprehensive experiments we plan to conduct, one for (1) and the other for (2). Based on the experiments, we will move on to testing the effectiveness of various prompting approach to tackling the interpretability in judgment prediction. Specifically, we will attempt two approaches for a sound solution to the interpretability problem, one to assess predication results using retrieval augment generation and external integrated databases with related legal knowledge, and the other to develop prompt engineering and chain-of-thought for generation of explanations.