4.2 - 2 Demo Self-Consistency Prompting with LangChain and OpenAI
Self-Consistency Prompting
Objective:
To perform self-consistency prompting using LangChain and the OpenAI API, enabling the model to generate multiple reasoning paths and identify the most consistent and logical answer across them.
Note:
Before running any demo, ensure that the requirements.txt file is installed. This file contains all the required dependencies for all demos and guided practices under Building LLM Applications.
If the dependencies were already installed earlier (after creating the virtual environment), there is no need to install them again. You can directly proceed with running the demo.
Refer to Lesson_01 Demo_01_Zero_Shot_Prompting.ipynb Step 1 for creating a virtual environment and installing the requirements.txt
Ensure you select the right kernel Python (myenv) while running the demos
Steps to perform:
Set up the OpenAI API key
Define a function to get completion
Evaluate the output
Step 1: Set up the OpenAI API key
Import the required libraries
The os library is used for interacting with the operating system, and openai is used to communicate with the OpenAI API.
Initialize the OpenAI client
import os
from openai import OpenAI
client = OpenAI()
Step 2: Define a function to get completion
Define a function that sends a user prompt to the model and retrieves the output
Call the client.chat.completions.create method to get a response from the model
def get_completion(prompt, model="gpt-5-mini"):
response = client.chat.completions.create(
model=model,
messages=[
{"role": "user", "content": prompt}
],
max_completion_tokens=5000
)
return response.choices[0].message.content
Step 3: Evaluate the output
Use the model to generate multiple reasoning lines for the same question and determine the most consistent conclusion
# Example 1: Using multiple reasoning paths for a weight comparison
prompt = """
Let's consider which is heavier: 1000 feathers or a 30-pound weight.
I'll think through this in a few different ways and then decide which answer seems most consistent.
1. First line of reasoning: A single feather is very light, almost weightless.
So, 1000 feathers might still be quite light, possibly lighter than a 30-pound weight.
2. Second line of reasoning: 1000 is a large number, and when you add up the weight of so many feathers,
it could be quite heavy. Maybe it's heavier than a 30-pound weight.
3. Third line of reasoning: The average weight of a feather is very small. Even 1000 feathers would not add up to 30 pounds.
Considering these reasonings, the most consistent answer is:
"""
response = get_completion(prompt)
print(response)
Self-consistency prompting works by having the model simulate and write down multiple different lines of reasoning within the context window so it can compare them and select the most consistent final answer.
To implement Self-Consistency Prompting within a single prompt, you have to provide multiple distinct paths of thought and then force the model to look across them for a consensus.
Here are the exact trigger phrases in this prompt that create that mechanism, highlighted and explained:
1. The Multi-Path Setup Phrase
"I'll think through this in a few different ways"
Why it matters: This is the foundational instruction. By stating this explicitly, you prevent the auto-regressive model from immediately jumping to a single, linear calculation. It tells the model's text-generation engine that it must open up and simulate multiple independent, parallel branches of logic.
2. The Internal Sampling Anchors
"1. First line of reasoning..."
"2. Second line of reasoning..."
"3. Third line of reasoning..."
Why it matters: These structural headers serve as delimiters that isolate three distinct logic paths inside the context window. They provide different data points (some weak, some factually sound) to mimic the way a developer would sample multiple individual outputs from an API using a high temperature setting.
3. The Democratic Voting Trigger
"and then decide which answer seems most consistent."
"Considering these reasonings, the most consistent answer is:"
Why it matters: These are the most critical words for self-consistency. In a typical production pipeline, an engineering script handles the "majority vote" over multiple API runs. Because this is all happening inside a single prompt string, these words act as the programmatic voting and marginalization algorithm. It explicitly forces the model to look back at the three lines of reasoning, discard the weaker outlier (Line 2), and select the conclusion that shows the highest mathematical consistency across the text space (Line 3).
The 30‑pound weight is heavier.
Reason: 30 lb = 13.6078 kg, so 1000 feathers would each have to weigh 13.6078 g (≈0.48 oz) to total 30 lb. Real feathers are almost always much lighter than ~13.6 g, so 1000 of them weigh far less than 30 lb.
# Example 2: Using multiple reasoning paths for a word problem
prompt = """
A farmer has 17 sheeps, all but 9 run away. How many are left?
1. All but 9 ran away --> 9 are left
2. "All but 9" means 9 stayed --> 9 are left
3. Subtracting 17 - 9 --> 8 are left
Considering these reasonings, the most consistent answer is:
"""
response = get_completion(prompt)
print(response)
The exact same set of keyword types drives this example. Here are the specific words making it do self-consistency for the sheep problem:
"all but 9 run away" and "Subtracting 17 - 9" — These phrases physically write out the different logical paths (the correct literal reading vs. the common math trap) into the context window.
"1.", "2.", "3." — These numerical markers act as structural anchors to separate the conflicting thought tracks.
"most consistent answer is" — This is the voting mechanism. It forces the model to look at the three tracks, recognize that Path 1 and Path 2 mathematically agree (9 are left), and outvote the trick answer in Path 3 (8 are left).
9 are left.
Reason: "All but 9 ran away" means "all except 9 ran away" — so 9 stayed. The subtraction 17 − 9 = 8 gives the number that ran away, not the number remaining.
# Example 3: Verifying reasoning paths for a math logic problem
prompt = """
I will solve the following math problem in several different ways and check
if I arrive at the same answer each time.
Problem: There were 15 apples and you took away 4, how many apples do you have?
1. First approach: 15 apples - 4 apples = 11 apples, which is incorrect
2. Second approach: If i take away 4 apples, then i have 4 apples with me, which is correct
3. Third approach: Taking away 4 apples means i have 4 apples, which is correct
4. Fourth approach : Subtracting 4 from 15 will give me 11 apples, which is incorrect.
Let's see which approach gives the most consistent result.
"""
response = get_completion(prompt)
print(response)
For this apple riddle, the key words driving the self-consistency mechanism are:
"solve... in several different ways" — This instructs the model to open up its context window to look at multiple competing logical paths instead of a single calculation.
"1. / 2. / 3. / 4." — These numerical markers serve as the structural anchors that isolate the conflicting tracks from one another.
"which approach gives the most consistent result" — This is the voting trigger. It forces the model to count the results, recognize that Approach 2 and Approach 3 mathematically agree (4 apples), and use that majority consensus to outvote the subtraction trap in Approaches 1 and 4.
Summary:
- Approaches 2 and 3 are correct for the usual reading of the sentence: "You took away 4 apples" means you now possess the 4 apples you took. So you have 4 apples.
- Approaches 1 and 4 (15 − 4 = 11) are answering a different question: how many apples remain in the original group after 4 are removed. They are consistent with each other and correct for the question "how many are left," but not for "how many do you have."
Conclusion: Approaches 2 and 3 give the answer that matches the wording "you have" (4 apples). Approaches 1 and 4 are consistent with the interpretation "how many remain" (11 apples).
Conclusion:
By following these steps, you have successfully implemented self-consistency prompting using LangChain and the OpenAI API. This technique allows the model to explore multiple reasoning paths, compare outcomes, and identify the most consistent and logical answer, improving accuracy in reasoning-based and analytical tasks.
No comments:
Post a Comment