Embedding Postgres schemas in GPT

Everyone from your investors to your other investors want to know how you’re planning on integrating GPT into your product. The why and the how aren’t important; those can be written later, by GPT, once it’s in your product. For now, let’s just add GPT.

Speccing

Our product, Canvas, is a data analysis tool for people who don’t know SQL (and also those who do). You build your model in the Canvas UI and we write the SQL for you. Canvas also includes a SQL editor that lets you inspect and edit the SQL we generate, as well as write your own queries.

GPT can write SQL, too. If we let our users write their question in plain English, and have GPT write the SQL to get the answer, well then we could save a lot of UI real estate.

Implementing

OpenAI has a nice, well-typed Node SDK. You can get started just by reading Readme and the TS docs. We’ll use the completions API, which is self explanatory. You know, GPT.

Besides adding some boilerplate GraphQL, implementation takes just a few lines:

async writeSql(text: string): Promise<string | undefined> {
  const apiKey = this.configService.get<string>('OPENAI_API_KEY');
  const configuration = new Configuration({
      apiKey,
  });
  const openai = new OpenAIApi(configuration);
  const prompt = `Write a SQL query to do the following: ${text}`;
  const completion = await openai.createCompletion({
  	  prompt,
      model: 'text-davinci-003', // best model with largest token size
      max_tokens: 500, // Cap the response size
	  n: 1, // we'll only use the "best" completion
  });
  return completion.data.choices[0].text;
}

A couple notes here.

First, models. Choose the LLM that should process your prompt. The upshot here is you usually want text-davinci-003 (the best/slowest/priciest) but can get away with using cheaper/faster models (shallower neural networks) for many tasks.

There's another family of models optimized for coding tasks. Ideally we would use one of those here. However, it's still in beta and I’ve mostly gotten inferior results so far. This model seems to prefer writing in Python even when asked for SQL.

Next we have to talk about tokens. Using GPT effectively means optimizing your token usage. A token roughly corresponds to 4 characters. Your token budget depends on the model. For text-davinci-003 the limit is 4000 tokens between your prompt and the response. max_tokens caps the amount of tokens used in the response, then you figure out how many you have left for the prompt. If you exceed the limit, you get an unforgiving error; GPT doesn't give freebies, and you pay by the token.

Testing

Let’s try it out with a simple query.

> "Get all the customers"

who have not made any order

SELECT customers.*
FROM customers
LEFT JOIN orders
	ON customers.customer_id = orders.customer_id
WHERE orders.customer_id IS NULL;

Hmm, doesn't look great.

I guess GPT thought our question could use some spicing up so it added a predicate and included this in English
GPT doesn’t know about our tables, so its making up its own
The SQL is incorrect; this should be customers.order_id is null

Note that GPT confidently returns an incorrect answer even though it lacks almost all the necessary information to do so correctly. GPT has no notion of asking a follow-up question.

Reading the docs to debug this odd response, I see there’s a temperature parameter that controls how free-wheeling GPT should be in its response. 0 is a total square and 1 is a guy at Burning Man (see: hallucinations). The default is 1, drunk confidence guy mode.

Let’s try again with the temperature cooled to 0:

> "Get all the customers"

SELECT * FROM customers;

Very cool and very square. But GPT still doesn’t know the schema or the actual table name so this query won't run.

Iterating

We need to tell GPT more information about the data we’re working with; what schemas have what tables, and what columns those tables have.

The only mechanism GPT offers for this is the prompt, so we’ll have to do what the kids call prompt engineering - a fancy phrase for cramming the most information possible into a limited amount of space.

We can generate some terse, plain English descriptions of our tables to add to the prompt:

Schema STRIPE has tables: 
  CHARGE with columns [ID, AMOUNT, CREATED, CURRENCY, CUSTOMER_ID, ...]
  CUSTOMER with columns [ID, NAME, CREATED, SHIPPING_ADDRESS_STATE, ...]
	...

At this point I realized our big goal was probably unrealistic given the 4000 token limit. Each table schema uses about 250 characters. If we leave space for the user input and the response, we'll only have room for about 60 table schemas. Bummer.

Iterating

We do have some users with fewer than 60 tables, so maybe we can still get this working for them. We add the schema information to the prompt:

const prompt = `Schema STRIPE has tables: 
                CHARGE with columns [ID, AMOUNT, CREATED, CURRENCY, CUSTOMER_ID, ...]
                CUSTOMER with columns [ID, NAME, CREATED, SHIPPING_ADDRESS_STATE, ...]
								...
					      Write a SQL query to: ${text}`;

Running the same query with the new context, GPT returns:

> "Get all the customers"

SELECT * FROM STRIPE.CUSTOMER;

Looking good and running successfully for the first time! Let’s try another:

> "Get all charges from customers created in the past year"

SELECT c.ID, c.AMOUNT, c.CREATED, c.CURRENCY, c.CUSTOMER_ID
        FROM CHARGE c
        INNER JOIN CUSTOMER cu
        ON c.CUSTOMER_ID = cu.ID
        WHERE cu.CREATED > date('now', '-1 year');

I’ve noticed GPT prefers to enumerate columns even when * would do. Did I mention OpenAI charges by the token?

This looks good but does not run.

I'm not sure where date('now', '-1 year') is valid syntax, but it's not in Snowflake so we should specify the warehouse.
The schema needs to be specified before each table reference
GPT thinks the CREATED column is a date (reasonable) but this is actually a VARCHAR that needs to be cast. Ideally we'd tell GPT the type of every column, but, once again, token management.

Let's clarify this in the prompt and try another query:

> "Get the three users from the US who have paid the most money since February 2022"

SELECT c.name, c.email, SUM(ch.amount) AS total_amount
        FROM STRIPE.CUSTOMER c
        INNER JOIN STRIPE.CHARGE ch
        ON c.id = ch.customer_id
        WHERE c.shipping_address_country = 'US'
        AND date_part(MONTH, CAST(ch.created AS DATE)) >= 2
        AND date_part(YEAR, CAST(ch.created AS DATE)) >= 2022
        GROUP BY c.name, C.email
        ORDER BY total_amount DESC
        LIMIT 3;

SQL nitpicks aside, this is incredible. My vague, poorly written question about my own business was answered instantly by a machine. We live in exciting times.

Shipping

We considered a few angles when deciding whether to go live with this feature.

Correctness

Even after tuning, GPT continues to output non-functional SQL. For more complicated queries there is almost always at least one syntax error such as:

Repeated columns in a join not disambiguated
Interval, extract, and other functions used with incorrect syntax
Column referenced that did not exist

We could tune some of these away, but every instruction added has to be paid for with the removal of some schema information.

Worse than errors, GPT can also return SQL that runs but is incorrect. For example, if you ask GPT to sort a join table by an ambiguous column CREATED it will pick one of the columns at random.

Token limits

The other real killer is the token limit. Without being able to add more contextual information this feature wouldn’t be useful for most of our customers. And I'd like to have even more information in each table schema. Without knowing the types of each column, GPT is prone to using functions invalid for a given type.

Stability

While developing I frequently received BAD_REQUEST responses from OpenAI. Unfortunately that’s about all the error information you get from the most verbose product on the planet. Just like humans, apparently AI dislikes reporting on its failures. So far as I can tell these were always due to using too many tokens.

Besides that the API was remarkably reliable and performant. No concerns here.

Price

OpenAI charges per token. We're using 4k tokens per query on DaVinci, meaning each query costs almost one cent. I'm sure this price will come down quickly, but for now that's prohibitive - especially with a low success rate.

Decision

Unfortunately, for the time being this is only ready to ship in private preview. Let me know if you’d like to try it out.

Thinking

I love GPT and I'm looking forward to GPT-4 and beyond. I haven't been so awed by technology since I held the first iPhone.

From a developer usability standpoint, I'm looking for a way to give more context to the model, and a way to ascertain the model's confidence in its responses. I'm also hoping the conversational mode of ChatGPT - wherein the model can remember your previous prompts - makes its way to the (official) API.

If you're thinking about any of these problems or implementing GPT in your own product I'd love to talk.

Footnotes

GPT does, in fact, give freebies.
Being less snarky, a temperature of 1 makes sense for more artistic endeavors like writing a sonnet. In this case, we want correctness and concision so 0 makes sense.
Fine tunes (in beta) provide a way to load a ton of example prompts and responses to GPT and save them in your own model at a lower cost structure than normal tokens. I only spent a couple hours working with these. My impression is that they’re useful for tuning the final layer of the model (”output in this format”), but not so useful in teaching the model other facts about the universe.