Langchain 101: Extract structured data (JSON)

Based on the medium’s new policies, I am going to start with a series of short articles that deal with only practical aspects of various LLM-related software.

Photo by Marga Santoso on Unsplash

The Tutorial

In this tutorial, we will learn how to extract structured data from free text. Let's get some data.

# Get some text https://arxiv.org/abs/2308.03279 abstract

inp = """Large language models (LLMs) have demonstrated remarkable \
generalizability, such as understanding arbitrary entities and relations. \
Instruction tuning has proven effective for distilling LLMs \
into more cost-efficient models such as Alpaca and Vicuna. \
Yet such student models still trail the original LLMs by \
large margins in downstream applications. In this paper, \
we explore targeted distillation with mission-focused instruction \
tuning to train student models that can excel in a broad application \
class such as open information extraction. Using named entity \
recognition (NER) for case study, we show how ChatGPT can be distilled \
into much smaller UniversalNER models for open NER. For evaluation,\
we assemble the largest NER benchmark to date, comprising 43 datasets \
across 9 diverse domains such as biomedicine, programming, social media, \
law, finance. Without using any direct supervision, UniversalNER \
attains remarkable NER accuracy across tens of thousands of entity \
types, outperforming general instruction-tuned models such as Alpaca \
and Vicuna by over 30 absolute F1 points in average. With a tiny \

Visit Now

Tags: Data Langchain