Using AI for automatically searching through legal documentation

The need for solutions to quickly search through legal documents has rapidly increased during the last years, this due to the vast amount of legal documents available through electronic means. Searching through text is an area where AI has shown great success and this article will give insight on how to apply AI driven search methods for finding relevant information in legal documents.

... how much time should it take to get informed?

02/ The problem

The legal system relies on accurate information retrieval, in some jurisdictions legal professionals are even ethically obligated to be fairly informed about relevant legal documents. Information retrieval in legal documents is however complicated and standard methods often do not apply, this due to the fact that law rarely has an inherent taxonomy. Having legal professionals search through the documents by themselves would on the other hand be extremely time consuming, this highlights the need for using NLP techniques and deep learning to automatically search through legal documents.

03/ The opportunity for deep learning

Modern NLP techniques can learn to understand the meaning of a piece of text, instead of just looking at the words within it. These techniques are valuable in many areas and can very well be leveraged to search for relevant information within legal documents. The technique we are looking to use in this use case is called text similarity, and can be used for automatically searching through selected legal documents and returning the sentences with the most similar meaning, even if the sentences do not share a common word. Similarity search can hence be used as the engine while searching through legal documents.

04/ Platform model to use

This use case can preferably be solved using text similarity with the Universal Sentence Encoder model at the Peltarion platform. 

05/ How does the model work?

In short a text similarity search for this use case will work by comparing a search term against all sentences in your legal documents, with the aim of returning the sentences which are most similar to your search term. Since the model is trained to understand the meaning of the text, the model is hence also capable of producing numerical representations of the text. Where similar text has numerical representations close to each other by means of numeric distance. Hence by creating numerical representations for each sentence in your legal documents, you can then compare a search term's numerical representation against all other sentences, and return the ones which are closest in distance to the search term, i.e the most similar sentences.

06/ Data requirements

To build this solution you need the legal documents that you aim to search through, and the amount  hence depends on how much data you want to search through.

07/ Model performance and success

For this use-case we are using the Universal Sentence Encoder which is already pre trained on similarity, hence we can solve this case without further training of the model. Therefore it will not be any number quantifying how well the model performs. To quantify the performance of a similarity model, labeled data is needed, however we do not require labeled data to solve this use case. Instead we can get an understanding of how well the model is performing by using the test deployment, which you can find in the deployment view. Enable the api, press the test deployment link, run a couple of queries and qualitatively evaluate your model.

08/ Where to learn more

Want to learn more about building an AI based search solution, checkout our tutorial on finding similar google questions where you learn how to apply text similarity to search for the most similar google question. 

Or follow this link to learn even more about text similarity on the Peltarion platform.