Build Your Own SQL Analyst Bot

One of the challenges of using LLMs (Large Language Models) in a business context is getting the model to answer factually and accurately about your company’s data. One possible solution is Retrieval Augmented Generation (RAG) using a vector database to populate the prompt context (see my post: Q&A With Your Docs: A Gentle Introduction to Matching Engine + PaLM). This works well for semi-structured data like text files and PDFs. But what if you wanted to retrieve data from a stuctured data source? What if we had our LLM use the results of an analytic database query? That’s what we are going to explore in this post.

Using Google’s new Codey APIs, announced at Google I/O earlier this year, we’ll build a system that:

  1. Converts the user’s natural language question to a SQL statement
  2. Runs that SQL statement against an analytic database
  3. Uses the query result to answer the user’s original question

We’ll also discuss prompt tuning as well as some of the shortcomings and limitations of a system like this.

In this how-to, I will be querying the NYC Citibike public dataset.

While I will explain each code module in order, the full code is provided at the end of this tutorial.

Click Here

Tags: Analyst SQL