Understanding Semantic Search — (Part 9: Introduction to Generative Question Answering, Prompt Engineering, LangChain Library, and more!)
TABLE OF CONTENTS:
- Introduction to Generative Question Answering
- Prompt Engineering
- LangChain Library
- Which question answering is better? (Extractive vs. Generative)
Introduction to Generative Question Answering:
In part 0 of the series, I introduced different types of question answers, including extractive and generative. Extractive Question Answering (EQA) systems find the answer to a question within a given text. For example, if the question is "What is the capital of India?" an EQA system might identify the text "Delhi" as the answer. In contrast, generative question-answering (GQA) systems generate an answer not explicitly stated in the text. For example, if the question is "What is the meaning of life?" a GQA system might generate an answer like "The meaning of life is to find your purpose and live it to the fullest."
In part 2 of the series, the introduced retriever and reader architecture for EQA. EQA systems work by first identifying the relevant passages of text that are likely to contain the answer to the question (Retriever). Once these passages have been identified, the system extracts the answer from the text (Reader). This approach is relatively straightforward but can be limited in its ability to answer complex questions.
Retriever and Reader Architecture:
Retriever in GQA systems is similar to the retriever in EQA. The EQA reader treats each relevant retrieved segment or passage as mutually exclusive. However, the GQA readers might use different relevant segments to predict an accurate answer. The reader in GQA is a transformer-based neural network like GPT-3 or GPT-4 (decoder) trained on a massive dataset of text and code. It takes the encoded question and retrieved texts to understand the meaning of the text, identify entities and relationships, and generate text. Below is an example of GQA.
What is the meaning of life?
Relevant retrieved segments (documents or passages or sentences):
1. There is no one answer to the meaning of life.
2. The meaning of life is subjective and varies from person to person.
3. The meaning of life is a question that philosophers and theologians have pondered for centuries.
Answer generated by a reader after summarizing retriver output:
The meaning of life is different for everyone. Some people find meaning in their relationships, while others find meaning in their work or hobbies. There is no right or wrong answer to the meaning of life. It is up to each individual to find purpose in life.
Prompt Engineering for Question Answering:
Prompt engineering is a process of creating natural language instructions and passing them to large language models for higher performance. It is a new way to interact with and program language models. It can help generative reader models generate an answer more consistent with the user's expectations without updating the model weights. The prompts can provide models with information about the user's background knowledge and interests. It can be a valuable tool for improving the accuracy of models for question answering.
There are different types of prompts. Below are the types of prompts:
- Example Prompts: Providing question-and-answer examples to the model can help the model understands in what format output is generated. For instance, for questions asking about capital cities. Sample prompts include "France is the capital of Paris" and "Capital of Japan is Tokyo."
- Constraint Prompts: Providing constraints on the desired answer can help the model generate more accurate and informative answers. For example, for the same query, a constraint prompt can be "capital must be a place" or "capital must be a city."
- Query Modification Prompts: These prompts modify the query to make it more specific or to provide additional context. For example, if you are asking for the capital of France, you could give the query modification prompt as "What is the capital of France during World War 2," and it can generate the answer as "During the German occupation (WW2, 1940–1944), the capital of France was Vichy."
- Multi-hop Prompts: Multi-hop prompts are a type of prompt used to answer multi-hop questions. Multi-hop questions require the model to access multiple pieces of information to answer. For a user query, "What is the country's capital that is home to the Eiffel Tower?" multi-hop prompt can be, "In which country is Eiffel Tower located?". The answer to the prompt is France, and this information can help to find the capital answer easily.
One might wonder when to do fine-tuning vs. when to do prompt engineering. Fine-tuning language models requires thousands of samples (depending on data and problem). However, if only a few training samples exist, the correct prompts can help the question-answering model generate more accurate and informative answers.
Quick and brief introduction to LangChain Library for Generative Question Answering applications:
LangChain is an open-source framework that provides a high-level API for interacting with large language models. It can create various applications, including chatbots, question-answering systems, and text generators. Chains are an essential component in LangChain that combines multiple language models. In our case, the retriever, summarizer (condenses retrieved outputs and sends them to the reader), and reader models can be chained together to get an accurate answer. Moreover, a reader need not be one model. It can also be a chain of language models or tools with respective prompts to combine them.
Agents are another essential component of LangChain. Some user queries might be complicated and require additional mathematical processing.
Let's say we want to know for how many years Paris was the capital of Frances. The above picture shows the list of capitals for France. The agent selects a language model trained on mathematical equations as the reader for this question to predict the appropriate and accurate answer. At the same time, it can use another language model for a different query.
An agent can use specific models or tools for the respective situation, task, or user query.
Learn more about LangChain here.
Which question answering is better? (Extractive vs. Generative QA)
The best approach to question answering depends on the specific question and the type of text available. For simple questions that can be answered with a single fact, extractive QA is often sufficient. However, generative QA is often more effective for complex questions requiring reasoning or inference. Below are some of the differences between EQA and GQA.
Overall, EQA and GQA systems have different strengths and weaknesses. EQA systems are simpler, faster, and more accurate but unsuitable for all question-answering tasks. GQA systems are more complex, slower, and less accurate but can handle a broader range of question-answering tasks.
Post chatGPT and Bard, people realized the potential of Generative AI to revolutionize how we interact with computers. As GQA systems become more powerful and efficient, they will become an essential tools for anyone needing to access and understand information. By generating answers to questions not explicitly stated in the text, GQA systems can provide us with more comprehensive and informative solutions.
However, some challenges must be addressed before GQA can be widely adopted. One challenge is that GQA systems can be computationally expensive to train and run. Another challenge is that GQA systems can be susceptible to bias and errors based on the data it was trained. Most importantly, compliance and governance of these models will remain the biggest challenge for companies and industries.
In the next series of articles, I will explain different generative models and their architecture in detail. Stay tuned for more articles in the Understanding Semantic Search Series! (Learn more about other pieces in the series here)