How Do LLMs Do?

A practical introduction to how we make large language models take actions in the real world

In conversations with founders, engineers, and enthusiasts of ChatGPT-enabled academic dishonesty, one point of confusion about LLMs that frequently surfaces is how, on a technical level, services that give LLMs access to information from databases, Wikipedia, or even the Weather Channel actually work. How do you take an AI-powered chatbot that takes in words and spits out words and give it access to the internet, databases, or anything else of personal or economic interest? Over the next few thousand words, let’s break this down. This post is intended to be accessible to a non-technical audience. We’ll be using the terms prompt1, tokens2, training3, and fine-tuning4. If you’d like a definition for any of those, follow the little numbers.

Summary

There are only two possible ways for an LLM to know anything:

  1. It’s in the training data
  2. It’s in the prompt

Alright, thanks for reading. See you next week.

What?

Every time any LLM-based product answers questions, or generates output using any information at all, the software behind the scenes is using one or both of the above two strategies to give the LLM access to that information. I’ll explain the most common techniques for both by way of a few scenarios, but first, an aside on how LLMs are trained that’ll be relevant later on.

Training

The most common way an LLM is trained involves four steps. A useful metaphor is that an LLM’s “brain” consists of a fixed structure, and a rather long collection of numbers (“parameters”) that slot into that structure as everything the brain knows and can do, almost like downloading your brain as a long list of numbers. These numbers are adjusted directly to “train” the brain. GPT-3.5-turbo, one of the two LLMs behind ChatGPT, is expected to have 154 billion adjustable parameters.

  1. Take your text data, and generate pairs of prompts and desired responses from the LLM for each, like so:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    Input: "What's the square root of 4?"
    Output: "The square root of 4 is 2"
    Input: "Why can't humans fly?"
    Output: "Human chest muscles are too small and weak relative..."
    Input: "Great to meet you!"
    Output: "It's great to meet you too!"
    Input: "Who are you"
    Output: "I'm ChatGPT, an AI chatbot built to help you with various..."
    ...
    
  2. Feed each input to the LLM, and measure how different the LLM’s output is from the desired output in the training data

  3. Use math to figure out which parameters in the LLM’s “brain” are responsible for most of the difference between the desired output and the actual output of the LLM, and then add or subtract just a bit from each parameter to try to correct this error

  4. Repeat with more data

In the process of adjusting the LLM’s parameters to close the gap between the output it gives for a particular input and the desired output, the math behind LLM training ends up encoding a version of the patterns within the training data in its parameters. How facts translate into LLM parameters is poorly understood and an active area of study, but suffice it to say that your brain does something similar in encoding information within its equivalent to parameters.

The Executor

The key to the whole system is a piece of software that interfaces between the natural language input and output of the LLM and the rigid API’s of the software world. If there exists a way with code to turn on some lights, or fetch some data, or to do anything else useful, the LLM can’t do it by itself. It needs a layer of software that can translate the LLM’s output into a series of API calls, and translate the API’s response into a response the LLM can understand.

Examples

“ChatGPT, when is William Shakespeare’s birthday?”

This one is all “strategy 1” above. When OpenAI trained GPT-3.5-turbo and GPT-4 (the LLMs underlying the ChatGPT product), somewhere in all the training data was at least one reference to Shakespeare’s birthday. When the LLM was trained, at least one of the input-output pairs in the training data mentioned the date of his birth, and the resulting adjustments to the LLM’s parameters encoded this information within the LLM.

“Siri, what’s Johnny Appleseed’s phone number?”

While Siri isn’t yet LLM based as of the time of this writing, it almost certain will be very soon. Here’s how it’ll answer the above question.

(Behind the scenes)

Who What do they say?
User (to Siri) Siri, what’s Johnny Appleseed’s phone number?
Executor Hey LLM, a user asked the following question:

“Siri, what’s Johnny Appleseed’s phone number?”

You’ll likely need to use a tool to answer it, so here’s a list of tools I can use for you. If you want to use a tool, put the name of the tool in brackets (e.g. {Calculator}), otherwise simply respond with plain text to reply directly to the user. Do not use brackets anywhere else in your output:

- Calculator
- Calendar
- Contacts
- Notes
- Weather
LLM {Contacts}
Executor I see you want to use contacts. You can take any of the following actions. To pick an action, put its name in brackets (e.g. {Add Contact}), otherwise simply respond with plain text to reply directly to the user. To go back, use the {Go Back} action. Do not use brackets anywhere else in your output:

- Add Contact
- Edit Contact
- Delete Contact
- Retrieve Contact

- Go Back
LLM {Retrieve Contact}
Executor To find a contact, put some search text in brackets (e.g. {Philip}), and I’ll give you a list of matching contacts to choose from. Otherwise simply respond with plain text to reply directly to the user. To go back, search for {Go Back}. Do not use brackets anywhere else in your output
LLM {Johnny Appleseed}
Executor Here are the matching contacts. To pick one, put its number in brackets. To get the next 10 contacts, respond with {Next}, otherwise simply respond with plain text to reply directly to the user. To go back, search for {Go Back}. Do not use brackets anywhere else in your output:

1. Tommy Baynard
2. Bobby Parsnip
3. Danny DeVito
4. Sammy Solomon
5. Ricky Pinkus
6. Johnny Appleseed
7. Joey Barbossa
8. Billy Taylor
9. Frankie Fortress
10. Tommy Applefarmer
LLM {Johnny Appleseed}
Executor Invalid Input.

Here are the matching contacts. To pick one, put its number in brackets. To get the next 10 contacts, respond with {Next}, otherwise simply respond with plain text to reply directly to the user. To go back, search for {Go Back}. Do not use brackets anywhere else in your output:

1. Tommy Baynard
2. Bobby Parsnip
3. Danny DeVito
4. Sammy Solomon
5. Ricky Pinkus
6. Johnny Appleseed
7. Joey Barbossa
8. Billy Taylor
9. Frankie Fortress
10. Tommy Applefarmer
LLM {6}
Executor Here is the requested contact. To go back, respond with {Go Back}, otherwise simply respond with plain text to reply directly to the user. Do not use brackets anywhere else in your input:

Name: Johnny Appleseed
Phone: (123) 456 7890
Address: 1 Min Strot Lane, Arlington, GA 39813
LLM (123) 456 7890
Executor Just to confirm, you put the following in plain text:

“(123) 456 7890”

Are you sure you want to say this to the user? Say {Yes} or {No}, or respond with {Go Back} to go back. Otherwise, simply respond with plain text to reply directly to the user. Do not use brackets anywhere else in your input.
LLM {Yes}
Siri (to User) (123) 456 7890

You may notice the similarity between the above interaction and working through a phone tree. It turns out this is actually how we give access to tools: we have a software-defined phone tree (sometimes referred to as the “executor”), and the LLM takes the role of “user” talking to it. The software administering the phone tree, like real-world phone trees, just does dumb text matching. It checks at each step if the result from the LLM is in its list of available options, and has a number of hardcoded functions that actually do whatever the LLM asks for.

The executor is ultimately responsible for interacting with the outside world on behalf of the LLM; and the LLM is responsible for choosing from the options provided at each step to produce the desired result. The biggest difference is, this entire process can take place in seconds or less, so to the user it can feel nearly instantaneous. LLM’s are also less likely than most humans to get frustrated by the process, and most are quite forgiving of being put on hold.

“Tesla, turn on the AC”

This is where we start to see the real power of LLMs put to work. The executor need not only read information, it can execute any function requested by the LLM, provided it has a function to run. The executor is responsible for taking the LLM’s output and translating it into a command that the AC can understand, and the LLM is responsible for choosing from the options provided at each step to produce the desired result.

(Behind the scenes)

Who What do they say?
User (to Car) Tesla, turn on the AC
Executor Hey LLM, a user asked the following question:

“Tesla, turn on the AC”

Pick a subsystem or resource to respond to this command from the following list. To select a subsystem or resource, put its name in brackets (e.g. {Suspension Control}), otherwise simply respond with plain text to reply directly to the user. Do not use brackets anywhere else in your output:

- Engine Control
- Transmission Control
- Brake/Traction System
- Suspension Control
- Telemetry
- Climate Control
- Infotainment System
- Cabin Controls
- Internet Search
LLM {Climate Control}
Executor I see you want to use climate control. You can take any of the following actions. To pick an action, put its name in brackets (e.g. {Set Temperature}), otherwise simply respond with plain text to reply directly to the user. To go back, use the {Go Back} action. Do not use brackets anywhere else in your output:

- Set Temperature
- Set Fan Speed
- Set Mode
- Set Airflow Direction
- Set Airflow Recirculation
- Set Airflow Defrost
- Set Airflow Rear Defrost

- Go Back
LLM {Set Mode}
Executor To set climate control mode, choose from the following options. To pick an option, put its name in brackets (e.g. {Fan Only}), otherwise simply respond with plain text to reply directly to the user. To go back, use the {Go Back} action. Do not use brackets anywhere else in your output:

- Heating
- Cooling
- Fan Only
- Go Back
LLM {Cooling}
Executor Just to confirm, you would like to change the climate control mode from “Fan Only” to “Cooling”? Say {Yes} or {No}, or respond with {Go Back} to go back. Do not use brackets anywhere else in your input.
LLM {Yes}
Car AC Turns On

Any action that can be expressed as a function can be executed by the executor, and the LLM can be trained to choose from a list of options provided by the executor to produce the desired result. This is the core of how LLMs can be used to control any system that can be controlled by a computer.

Caveats

As you can imagine, this is a slightly oversimplified depiction of things, but not by much. While human-facing phone trees tend to be optimized to minimize labor cost for customer service agents, and then perhaps frustration if the provider is feeling generous, LLM-facing phone trees are optimized around a different set of constraints, roughly:

  • Minimizing risk of making a mistake that could upset a user or cost the company money or reputational damage
  • Minimizing the compute cost for the LLM, which in turn means minimizing the amount of text input to and output from the LLM

Less important but still relevant:

  • Minimizing the number of round trips between the LLM and the phone tree
  • Maximizing the amount of useful data that can be acquired from this process

Nearly every element of this system can be optimized in myriad ways to meet these aims. There are papers describing innovations from replacing full-text responses to the phone tree ({Retrieve Contact}) with random tokens ({Q}) to save tokens and decrease bias5, to incorporating caching processes that allow for intelligently skipping steps that were recently performed and cached, like so:

Who What do they say?
User (to Siri) Siri, what’s Johnny Appleseed’s phone number?
Executor Hey LLM, a user asked the following question:

“Siri, what’s Johnny Appleseed’s phone number?”

You’ll likely need to use a tool to answer it, so here’s a list of tools I can use for you. If you want to use a tool, put the name of the tool in brackets (e.g. {Calculator}), otherwise simply respond with plain text to reply directly to the user. Do not use brackets anywhere else in your output:

- Calculator
- Calendar
- Contacts
- Notes
- Weather
Executor Helper By the way LLM, this sounds very similar to a previous user request, “Siri, can you get me Jimmy Robbins’ number?”, resolved with the following steps:

1. Choose a tool → {Contacts}
2. Choose an action → {Retrieve Contact}
3. Provide argument for action → {Jimmy Robbins}
4. Choose from results → {1}
5. Output to User → (402) 322 2748
6. Confirm → {Yes}

If you’d like to skip ahead to just after one of these steps, put the number of the step in brackets (e.g. {4} for “Choose from results”). Otherwise, continue following the instructions above.
LLM {2}
Executor To find a contact, put some search text in brackets (e.g. {Philip}), and I’ll give you a list of matching contacts to choose from. Otherwise simply respond with plain text to reply directly to the user. To go back, search for {Go Back}. Do not use brackets anywhere else in your output
LLM {Johnny Appleseed}

The above is a relatively simple example of a caching process, but it’s not hard to imagine more sophisticated ones that can cleverly reuse previous lookups and produce intelligently rewritten full conversation histories from cached previous interactions. While it’s still early days for this type of tooling, this is a fairly active area of research, and we’re likely to see a lot of progress in the coming months and years.

Final Questions

What if the LLM makes a mistake?

That depends on whether the LLM feeds an invalid response to the phone tree (like replying {Johnny Appleseed} for the contact above when the phone tree was looking for a number from 1 to 10), or a valid but incorrect response to the phone tree (like replying {5} instead of {6}).

If the response is invalid, in the most basic case the phone tree simply asks again, but more advanced implementations can use the same types of tricks that human phone trees go through, like “Did you mean…” as we’ve already got software that does that.

If the response is valid but incorrect, the best defense is ample access to “Go Back” and asking for confirmation before doing anything that could make a user unhappy, like deleting a contact or spitting out the wrong number. You can see this in the last few steps above.

But how did the LLM know the answer at the end?

When you feed text to an LLM, you don’t feed it just the most recent message, you feed it the entire conversation, or some version of it. The LLM has access to anything you include in the prompt, and its job is simply to generate the next message. So the LLM can see the answer just by looking at the rest of the conversation in its input.

What if the user’s request involves multiple steps?

In the simplest case, the LLM is clever enough to know to say “{Go Back}” a few times at the end to start over with a different tool, and as mentioned above, it can see everything prior in the conversation. As long as all the information it needs is somewhere in the conversation, the LLM has a good shot at finding it and using it correctly.

In more complicated cases, the phone tree might first ask the LLM to construct a strategy, and then perform some degree of handholding to walk the LLM through the steps in its strategy. For situations that require even more demanding logical flows, a number of logical frameworks exist to allow the LLM itself to take control of the process of arriving at the desired result, rather than letting the phone tree dictate the flow. Self-Ask and ReAct (not to be confused with the very popular frontend framework of the same name) are two well-known frameworks in this category.

Here’s what a ReAct phone tree looks like, for the curious. Note how the Executor finishes its messages with an Observation and the start of a Thought, and the LLM picks up where it left off, completing the Thought and then providing the next Action to take.

Who What do they say
User (to LLM ReAct Agent) Who is the current president of France, and what is the last letter of his birth month?
ReAct Executor Answer the following questions as best you can. You have access to the following tools:

google_serper: A low-cost Google Search API.Useful for when you need to answer questions about current events.Input should be a search query.
Calculator: Useful for when you need to answer questions about math.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google_serper, Calculator]
Action Input: the input to the action
Observation: the result of the action
… (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: Who is the current president of France, and what is the last letter of his birth month?
Thought:
LLM I need to look up the current president of France
Action: google_serper
Action Input: current president of France
ReAct Executor Observation: Emmanuel Macron
Thought:
LLM I need to determine the last letter of his birth month
Action: google_serper
Action Input: Emmanuel Macron birth month
ReAct Executor Observation: December
Thought:
LLM I now know the final answer
Final Answer: Emmanuel Macron, December - r
Result (to User) Emmanuel Macron, December - r

What about OpenAI’s Function Calling API?

Function calling is a framework for reducing the amount of work needed to build the executor. It provides the LLM with a list of “functions”, or specific actions the LLM can take, and then has the LLM choose from that list and provide whatever inputs are necessary to run the function. It’s then up to the user to actually execute the function. It’s a relatively simple framework, telling the engineer that the LLM requested to use function check_weather and leaving it to the engineer to identify and run the correct code to check the weather given the LLM’s inputs. There exist a number of alternative frameworks that provide more or less of the planning and execution sides of the executor, but they’re mostly just different flavors of phone tree.


  1. Prompt: The text fed into an LLM as input ↩︎

  2. Tokens: The smallest, most granular unit of text you can feed an LLM. For technical but mostly irrelevant reasons, LLMs generally enforce a limit to the number of characters you can feed them for a single request, measured in tokens. Each token corresponds to around 4 characters. For the curious, see here ↩︎

  3. Training: The process of taking a LLM (or any machine learning model really) and adjusting its “brain” to push how it’ll respond to a given input in a desired direction ↩︎

  4. Fine-tuning: Doing additional training on an already trained model ↩︎

  5. This is called “symbol tuning” and is described further in this paper↩︎

comments powered by Disqus