LLMs in LangChain - Part 1. Conceptual

Table of Contents

I. Large language models (LLMs)
II. LangChain
III. Chat Models
IV. Core concepts of LangChain
Reference

I. Large language models (LLMs)

Language Modeling (LM) is a key technique for enhancing the language comprehension abilities of machines. Essentially, LM aims to model the generative likelihood of word sequences to predict the probabilities of future (or missing) tokens. The evolution of LM has been extensively studied in the literature, progressing through four major stages [1]:

Statistical Language Models (SLMs):
- Emerged in the 1990s, these models rely on statistical learning methods. The core idea is based on the Markov assumption, predicting the next word using the most recent context.
- SLMs with a fixed context length n are known as n-gram language models (e.g., bigram and trigram models).
- SLMs have significantly improved performance in information retrieval (IR) and natural language processing (NLP).
Neural Language Models (NLMs):
- These models use neural networks, such as multi-layer perceptrons (MLP) and recurrent neural networks (RNNs), to characterize the probability of word sequences.
- A notable advancement was the introduction of distributed word representations, creating a word prediction function based on aggregated context features (distributed word vectors). This led to a unified, end-to-end solution for various NLP tasks.
- The development of word2vec further simplified the process by using a shallow neural network to learn these representations, proving effective across numerous NLP applications.
Pre-trained Language Models (PLMs):
- The initial approach, ELMo, used a bidirectional LSTM (biLSTM) network to pre-train context-aware word representations, which were then fine-tuned for specific tasks.
- The advent of the Transformer architecture and self-attention mechanisms led to BERT, which pre-trains bidirectional language models on large-scale unlabeled data, producing highly effective semantic features.
- This pre-training and fine-tuning paradigm has inspired numerous studies and led to the development of models like GPT-2 and BART, each introducing different architectures or improved pre-training strategies.

Source: https://github.com/thunlp/PLMpapers/tree/master

Large Language Models (LLMs): This category encompasses very large-scale PLMs with massive parameter sizes trained on extensive text data. LLMs excel in various NLP tasks due to their large-scale models and vast training datasets.
- GPT-3, with approximately 175 billion parameters, is trained on about 45 terabytes of text data.
- GPT-4, though not officially detailed, is estimated to have 1.7 trillion parameters.
- PaLM features 540 billion parameters.
- LLaMA 3 offers models ranging from 7 billion to 65 billion parameters, trained on trillions of tokens.

The trends of the cumulative numbers of arXiv papers that contain the keyphrases “language model” (since June 2018) and “large language model” (since October 2019), respectively [1]

Moreover, LLMs need to be integrated with applications and sometimes specific data sources to work effectively; they are not standalone solutions. For instance, OpenAI's ChatGPT is a conversational AI application fine-tuned for dialogue using the GPT-3.5 or GPT-4 language models, depending on the version. On the other hand, OpenAI offers a commercial API that provides direct access to their LLMs, enabling developers to use these models in their own applications. Therefore, GPT-3.5 or 4.0 itself is a language model, not an application.

Another example of fine-tuned LLMs is Meta's Llama 2 model family, which includes a base model, a variant fine-tuned for dialogue (Llama-2-chat), and a variant fine-tuned for coding (Code Llama).

II. LangChain

LangChain [2] is an open source framework for building applications based on large language models (LLMs). LangChain provides tools and abstractions to improve the customization, accuracy, and relevancy of the information the models generate.

For example, developers can use LangChain components to build new prompt chains or customize existing templates. LangChain also includes components that allow LLMs to access new data sets without retraining.

Source: LangChain tutorial #5: Build an Ask the Data app [2]

III. Chat Models

Chat Models are essential to LangChain, functioning as language models that use chat messages as both inputs and outputs, unlike models that use plain text.
LangChain integrates with various model providers (such as OpenAI, Cohere, Hugging Face) and offers a standard interface for interacting with these models.
LangChain supports models in synchronous, asynchronous, batching, and streaming modes, and provides additional features like caching.
The OpenAI Chat Completions API includes three message types [4]
- SystemMessage: Sets the assistant's behavior.
- AIMessage: Represents the assistant's message.
- HumanMessage: Represents the user's message or prompt.

# Chat Completions API Messages: System, Assistant and Human

from langchain.schema import(
    SystemMessage,
    AIMessage,
    HumanMessage
)
messages = [
    SystemMessage(content='You are an experience and only response in 3-4 sentences.'),
    HumanMessage(content='Explain the concept of System, Assistant and Human prompt in LLMs.')
]

output = llm.invoke(messages)
print(output.content)

# Output:
# In LLMs, the System prompt refers to the initial input that sets the context for generating text. The Assistant prompt is additional information or instructions provided to guide the language model in producing the desired output. The Human prompt is the feedback or correction given by a human to guide the model towards more accurate or relevant responses. Together, these prompts help shape the output of the language model to better meet the user's needs.

IV. Core concepts of LangChain

LLM components [4]
1. LLM wrappers are simply an intermediate that allows one to connect to all popular LLMs.
2. Prompt templates
  - Prompt refers to the inputs to the model
  - Prompt templates are a way to create a dynamic prompts for LLMs
3. Indexing refers to ways to structure documents/texts so that LLMs can best interact with them for extracting useful information to work with.
4. Memories can be used to save conversation history and feed it along with new questions to LLM so multi-turn natural conversation chat can be implemented.
Chains combines multiple components together to solve a specific task and build an entire LLM application
Agents facilitate interaction between the LLM and external APIs, enabling tools for LLM

Check out the next article for more details.

Reference

[1] Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.

[2] Introduction

[3] LangChain tutorial #5: Build an Ask the Data app

[4] Chat Completions API

[5] Learn the Language of Large Language Models: A Simple Introduction to Accessing LLM using Langchain