The cover image features the Macintosh 128K, a personal computer introduced by Apple Inc. in 1984 that changed the computer industry. It incorporated the iconic 32-bit Motorola 68000 processor, 128 KB of RAM, and a modest black & white built-in display.

Furthermore, it integrated amazing technology inventions directly from Xerox PARC: the mouse and the first of her class Graphic User Interface (grandmother of present GUIs).

With over 70,000 units sold, the Macintosh played a pivotal role in Apple's success. A curious fact, the launch event was even more successful than the Mac itself. With a budget of US$1.5 million, the renowned Ridley Scott directed a television commercial in a clear allusion to Orwell's iconic novel Nineteen Eighty-Four that became a masterpiece and watershed event.

Almost forty years later, without a renowned film director or a television commercial, just a simple web application, OpenAI took a yet experimental Language Model and opened it to the world. The rest of the story is in the headlines: ChatGPT reached 100 million monthly active users on January 23, just two months after launch, making it the fastest-growing consumer application in history (faster than TikTok and Instagram).

Now world's attention is on the AI industry, and with advances happening every week, this year promises to be a pivotal moment for the field. The best part is you are still on time to join the AI field and be part of this revolutionary moment in human history.

Prompt Engineering 101 is a post series designed and written to unveil the principles and techniques of prompt engineering, the art of crafting clear and effective texts, to prompt language models and get exactly what you are looking for. This series covers prompt engineering for a wide range of generative models, including ChatGPT and other text-to-text models. Also explores text-to-image models like Stable Diffusion or Midjourney, and delves into additional aspects of LLMs such as hallucinations, privacy and security issues and more...

This is post #2 of the series, where we will cover advanced techniques such as Chain-of-Thought and Self-Consistency, to master prompt crafting. I hope you enjoy it!

Recap on Prompt Engineering
Chain-of-Thought Prompting
Chaining Prompts
Self-Consistency Method
Role Prompting
Lost in Tokenization
Five Tips & Tools to Master Prompt Crafting

Recap on Prompt Engineering

In the first part of this series on Prompt Engineering, we delved into the intuitions of this art and achieved a formal definition. Essentially Prompt Engineering is an iterative process of designing and optimizing clear and specific prompts for a Large Language Model to ensure that it generates relevant, accurate, and coherent responses.

Next, we examined three principles for crafting effective prompts. We demonstrated that the models produces better responses when given clear and specific instructions (the first principle) and discussed several tactics for achieving this. We then established that LLMs benefit from computational time and covered some tactics to force models to reason before rushing to a conclusion (the second principle). Ultimately, we investigated the balance between specificity and creativity (the third principle), experimenting with the parameters Temperature and Top P to explore this trade-off.

You can read a more in-depth definition of Prompt Engineering and the fundamentals in the link below (lots of prompt examples included!!).

https://hackernoon.com/prompt-engineering-101-i-unveiling-principles-and-techniques-of-effective-prompt-crafting?embedable=true

Chain-of-Thought Prompting

Massive scale LLMs like GPT-3 or PaLM have exhibited an impressive capacity for natural language understanding and proved to be extraordinarily effective in tasks such as extracting information from texts and generating responses in a consistent human style. Even so, LLM demonstrated to be quite robust performing unknown tasks when a few shot examples are included in the prompt. This technique, popularized as few-shot prompting by Brown et al. (2020), probed to increase model performance on several benchmarks. Not to mention it saves money and time in fine tuning the model to a new specific domain.

However, studies evidenced that models like these abruptly drop their performance on common reasoning tasks or math quizzes. Despite the ability to quote the complete ancient Greek epic poem The Odyssey, these models struggle with school fundamental problems in logic and math.

So what options do we have? Do we send GPT-3 for a season to an elementary school? Fortunately, there is a cheaper and less embarrassing alternative. Can you imagine persuading the principal of your local primary school to accept ChatGPT in the middle of the term?

Similar to how humans approach complex problems by breaking them down into simpler sub-problems and following a logical reasoning line, we can instruct a Language Model to do the same. This approach was explored by Wei J et al. in Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022). It demonstrated impressive results on several benchmarks, confirming Chain-of-Thought (CoT) as a solid approach to improve LLMs performance on common reasoning tasks.

In the image below (taken from Wei J's article), an LLM model rushes to an incorrect conclusion while calculating the number of apples remaining in a cafeteria. This happens even when a similar reasoning problem using tennis balls is provided as part of the context.

Nevertheless, when a step-by-step process to solve the problem is included into the context (CoT), the model can accurately arrive at a valid solution.

With chain of thought prompting, language models of sufficient scale (~100B parameters) can:

Decompose multi-step problems into intermediate steps, which means that additional computation can be allocated to problems that require more reasoning steps.
Provide an interpretable window into the behavior of the model, suggesting how it might have arrived at a particular answer and providing opportunities to debug where the reasoning path went wrong.
Extend the application field to math word problems, common sense reasoning, and symbolic manipulation.

This method can be used providing reasoning examples to the model (few-shot) or no example (zero-shot). Let's see both flavors in practice with a real application example.

1. Zero-Shot Chain of Thought

Imagine we are developing a new shopping application for Walmart, with the groundbreaking feature of comparing and selecting what product you should buy based on the prices and attributes of different brands' alternatives.

To illustrate the problem, let's focus on a reduced list of all the options in Bar Soap that Walmart has in its stores. As you can see, we have packs from 1 bar to 14 bars and a variety of brands and prices (from cheap options to expensive options).

Products 🛒

---
Dove Men+Care 8 bars pack
$ 9.99

---
Dove Beauty Bar 4 bars pack
$ 6.47

---
Dove Beauty Bar 1 bars
$ 1.47

---
Dove Beauty Bar 14 bars pains
$ 16

---
Yardley London Soap Bar (Pack of 10)
$ 19.99

---
Dr. Squatch All Natural Bar Soap for Men, 5 Bar Variety Pack
$46.45

To determine which option is more convenient, we can calculate the price per bar on each alternative (unit price). Next, select the cheapest option. Following this reasoning, we discover that Dove Beauty Bar 14 bars is the cheapest option with a unitary price of $1.14 (-22% per bar than Dove Beauty Bar 1 bars).

Reasoning 🧠

- Dove Men+Care 8 bars pack: $1,24875 / unit
- Dove Beauty Bar 4 bars pack: 1,6175 / unit
- Dove Beauty Bar 1 bars: 1,47 / unit
- Dove Beauty Bar 14 bars pains: $1,142857 per / unit
- Yardley London Soap Bar (Pack of 10): 1,999 / unit
- Dr. Squatch All Natural Bar Soap for Men, 5 Bar Variety Pack: $9,29 / unit

Let's see if GPT-4 is clever enough to solve this problem using a simple prompt. Also, with the spirit of making more clean and easy-to-read examples, we will separate the system instructions from the rest of the input using Python string literals.

Prompt 💬

prompt = f"""
You will be provided with a list of various product presentations delimited by <>. Each product presentation is accompanied by its price. Your task is to identify which is the best option based on the price.

<{products}>
"""

Response 🤖

The best option based on price is Dove Beauty Bar 1 bars for $1.47.

The model chose the cheapest option based on the total price instead of calculating the unitary price and identifying a money-saving opportunity by buying more bars. It is worth pointing out some ambiguity in the question. What does the best option mean? Does it mean the cheapest or the one that maximizes long-term money savings?

Let’s rewrite the prompt instructing the model to follow a chain of thought to solve this problem adding the magic words “reasoning step-by-step”. Also, explicitly underline the objective of finding a money-saving opportunity.

Prompt 💬

prompt = f"""
You will be provided with a list of various product presentations delimited by <>. Each product presentation is accompanied by its price. Your task is to identify which presentation offers the best money-saving opportunity, reasoning step-by-step:

<{products}>
"""

Response 🤖

The first step is to compare price per unit for products of the same type (i.e. bars of soap). 

- Dove Men+Care 8 bars pack: $9.99 ÷ 8 bars = $1.25 per bar
- Dove Beauty Bar 4 bars pack: $6.47 ÷ 4 bars = $1.62 per bar
- Dove Beauty Bar 1 bars: $1.47 per bar
- Dove Beauty Bar 14 bars pack: $16 ÷ 14 bars = $1.14 per bar
- Yardley London Soap Bar (Pack of 10): $19.99 ÷ 10 bars = $1.99 per bar
- Dr. Squatch All Natural Bar Soap for Men, 5 Bar Variety Pack: $46.45 ÷ 5 bars = $9.29 per bar

Based on this analysis, the best money-saving opportunity is the Dove Beauty Bar 14 bars pack, which offers the lowest price per bar at $1.14.

Note that with the new prompt, the model gives a correct answer and at the same time, we have a step-by-step reasoning that is easier to debug and find potential errors. Even more, it is quite impressive how adding the magic words "reasoning step-by-step" to the prompt, creates a significant difference on the model output. What’s the trick?

The trick is forcing the model to generate a step-by-step reasoning chain (chain-of-thought) instead of merely outputting a numeric or boolean value (final result). By adopting this strategy, we initially guide the model towards logical reasoning outputs, enabling the model to base its responses on similar problems encountered during training. Secondly, we assist the model by requesting it to break down larger problems into smaller, manageable ones. The model is then required to produce coherent results for simple sub-problems, such as identifying prices and the number of bars in each package, calculating the unit price, and finally, performing comparisons. Furthermore, because of how auto-regressive language models like GPT-3 or GPT-4 generate sequences token by token, where each new token is generated based on all the previously generated tokens, the chain-of-thought itself has a strong positive influence in the generation of a final coherent result.

In other words, if, through valid reasoning, we reach a point halfway to the correct solution, our chances of arriving at this solution are greater than they were when we first started reasoning about the problem.

2. Few-Shot Chain of Thought

While in Zero-Shot Chain-of-Thought we leave the model free to elaborate a coherent and valid chain of reasoning that leads to a correct result, a second flavor called Few-Shot Chain-of-Thought, provides some examples of a problem together with a reasoning answer, as part of the prompt. In this way, a form of reasoning is imposed on the model, limiting the possible responses.

Back to our Walmart application, suppose we want to identify which presentation of Coca Cola soda has the best price per fluid ounce ratio. To do this, we will incorporate into the prompt a similar chain of reasoning, to find the best presentations of Lay's potato chips.

Prompt 💬

prompt = f"""
Q: Walmart has the following presentations for Lay's Classic Potato Chips: a 13 oz bag for $4.98, an 8 oz bag for $2.76, and a pack of x10 units 1 oz each for $6.82. Which presentation has the lowest price by ounce?

A: Presentation 1 costs 4.98 so has a price of 4.98/13 = 0,383076923 per ounce. Presentation 2 costs 2.76 so has a price of 2.76/8 = 0,345 / per ounce. Presentation 3 costs 6.82 and includes 10 packages so has a price of 6.82/(1 x 10) = 0.682 per ounce. In conclusion, the cheapest presentation is the second one with a price of 0.345 per ounce.
 
Q: Walmart has the following presentation for Coca-Cola Soda:
an x24 pack of 12 fl oz cans for $11.86, an x12 pack of 12 fl oz cans for $7.46, and an x6 pack of 16.9 fl oz bottles for $4.98. Which presentation has the lowest price by fluid ounce?
"""

I encourage you to do your own math and find the best deal. You will find that the model is correct. Without doubts, I will definitely have the ChatGPT app by my side on my next trip to the store!!

Response 🤖

A: The 24-pack contains a total of 24 x 12 = 288 fluid ounces and costs $11.86, so the price per fluid ounce is 11.86/288 = $0.041 per fluid ounce. The 12-pack contains a total of 12 x 12 = 144 fluid ounces and costs $7.46, so the price per fluid ounce is 7.46/144 = $0.052 per fluid ounce. The 6-pack contains a total of 6 x 16.9 = 101.4 fluid ounces and costs $4.98, so the price per fluid ounce is 4.98/101.4 = $0.049 per fluid ounce. Therefore, the 24-pack has the lowest price per fluid ounce.

As incredible as it may seem, this simple trick allows the model to infer a similar chain of reasoning and achieve even more accurate results than with the previous technique. Also, notice that we are not hard-coding a step-by-step procedure to solve the problem. Therefore, this approach should be, in theory, flexible enough to solve any similar challenge, only relying on the reasoning skills of the model.

Chaining Prompts

No rule forces us to resolve a problem with a single prompt. In fact, the more logic and complexity we put into a single prompt, the more chance of confusing the model. Chaining Prompts is a simple but effective strategy to address complex problems. The main idea is to break down the problem into smaller tasks, each one solved by a specific prompt. By chaining each prompt and using the result of the previous one as input for the next one, the final result is achieved.

With this in mind, let's go back to our Walmart application and implement a complex user flow as a chain of prompts. We have so far compared the prices of various products such as toilet soaps, snacks, and soft drinks. Now, we are set to explore a product that, personally, gives me a headache every time I visit the supermarket. Calculating its price per unit seems like rocket science: toilet paper! 🤣🤣🤣

Below is a reduced list of the options that Walmart has in Toilet Paper.

Products 🛒

---
Quilted Northern Ultra Plush Toilet Paper, 6 Mega Rolls
Each Mega Roll has 255 3-ply sheets.

$ 6.93

---
Quilted Northern Ultra Soft & Strong Toilet Paper, 18 Mega Rolls.
18 mega toilet paper rolls, each mega roll has 295 2-ply sheets

$ 19.82

---
Angel Soft Toilet Paper, 36 Mega Rolls.
With 320 2-ply sheets on every Mega Roll.

$ 24.12

---
New Angel Soft Toilet Paper, 36 Mega Rolls.
With 484 2-ply sheets Per Roll

$ 45.44

---
Scott 1,000 Toilet Paper, 12 Rolls, 1000 Sheets per Roll.
With Scott 1,000 Toilet Paper, you get 12 rolls of 1000 1-ply sheets.

The problem to be solved is the same one that we have focused on during this post: determining which product presentation is the most cost-effective. Furthermore, we desire to improve the application capabilities by generating additional choices, for example considering buyers with limited storage space who cannot purchase large quantities, as well as those who require a larger volume of this product. Additionally, it would be great to put the product recommendations as a part of a personalized recommendation given by a virtual shopping assistant powered by AI.

Rather than using a single prompt to solve the entire task, the goal is to break it down into several sub-tasks with specific prompts for each one and chaining the outputs. So, let's start by designing a prompt to calculate the prices of each product per unit of measure, so we can compare them directly.

For the first step or task in the chain, we will use the Zero-Shot Chain of Thought. This will instruct the model to return the prices per unit of measure for each product in the input, reasoning step by step. The output generated here will later serve as input for the next prompt. For this reason, it is essential to request a structured output format. In this case, a JSON list will be enough.

Prompt 💬

prompt = f"""
You will be provided with a list of various product presentations delimited by <>. Each product presentation is accompanied by its price. Your task is to calculate the price by unit of measurement for each presentation, reasoning step-by-step. Finally, return a list in JSON format with the title of the product, original price and unit price using the title, total_price and unit_price keys respectively.

<{products}>
"""

Response 🤖

Let's start by breaking down the cost for each roll and then calculating the cost per unit, in this case, the cost per sheet.

For the Quilted Northern Ultra Plush Toilet Paper:
* The product costs $6.93 and contains 6 mega rolls. Therefore, each mega roll costs $6.93 / 6 = $1.155.
* Each mega roll contains 255 sheets. Therefore, the cost per sheet is $1.155 / 255 = $0.00453 approximately.

For the Quilted Northern Ultra Soft & Strong Toilet Paper:
* The product costs $19.82 and contains 18 mega rolls. Therefore, each mega roll costs $19.82 / 18 = $1.1011 approximately.
* Each mega roll contains 295 sheets. Therefore, the cost per sheet is $1.1011 / 295 = $0.00373 approximately.

For the Angel Soft Toilet Paper:
* The product costs $24.12 and contains 36 mega rolls. Therefore, each mega roll costs $24.12 / 36 = $0.6694 approximately.
* Each mega roll contains 320 sheets. Therefore, the cost per sheet is $0.6694 / 320 = $0.00209 approximately.

For the New Angel Soft Toilet Paper:
* The product costs $45.44 and contains 36 mega rolls. Therefore, each mega roll costs $45.44 / 36 = $1.2622 approximately.
* Each mega roll contains 484 sheets. Therefore, the cost per sheet is $1.2622 / 484 = $0.00261 approximately.

For the Scott 1,000 Toilet Paper:
* The product costs $45.44 and contains 12 rolls. Therefore, each roll costs $45.44 / 12 = $3.7867 approximately.
* Each roll contains 1000 sheets. Therefore, the cost per sheet is $3.7867 / 1000 = $0.00379 approximately.

Now, let's format this into JSON:

[
    {
        "title": "Quilted Northern Ultra Plush Toilet Paper, 6 Mega Rolls",
        "total_price": "$6.93",
        "unit_price": "$0.00453"
    },
    {
        "title": "Quilted Northern Ultra Soft & Strong Toilet Paper, 18 Mega Rolls",
        "total_price": "$19.82",
        "unit_price": "$0.00373"
    },
    {
        "title": "Angel Soft Toilet Paper, 36 Mega Rolls",
        "total_price": "$24.12",
        "unit_price": "$0.00209"
    },
    {
        "title": "New Angel Soft Toilet Paper, 36 Mega Rolls",
        "total_price": "$45.44",
        "unit_price": "$0.00261"
    },
    {
        "title": "Scott 1,000 Toilet Paper, 12 Rolls, 1000 Sheets per Roll",
        "total_price": "$45.44",
        "unit_price": "$0.00379"
    }
]

This JSON data provides each product's title, total price, and the calculated unit price.

I encourage you to undertake the exercise of manually calculating the prices per sheet for each of the toilet paper presentations. If you do so, you will confirm that the model output is correct. However, the model output also contains the reasoning process (since we ask the model to reason step-by-step). In other words, before continuing we need to extract the JSON list from the rest of the text. Fortunately for us, we can use an LLM to achieve this!

Let's write a simple prompt to perform this text extraction. Additionally, let's instruct the model to sort the options by unit price, from the cheapest to the most expensive. This step will be extremely useful later in the prompt chain.

Prompt 💬

prompt = f"""
You will be provided with a text delimited by <>. This text contains a JSON list with information about Walmart products. Your task is to extract that list and return only this list in JSON format. Each JSON list item contains the key "unit_price", which is a number. Before returning the list, sort it in ascending order by the key "unit_price".

Here is an example of a list item:
{ "title": "Toilet paper", "total_price": "$2.99", "unit_price": "$0.0045" }

Remember to return the list without any additional text or explanation, just the list in JSON format.

<{response_prompt_1}>
"""

Response 🤖

[
  {
    "title": "Angel Soft Toilet Paper, 36 Mega Rolls",
    "total_price": "$24.12",
    "unit_price": "$0.00209"
  },
  {
    "title": "New Angel Soft Toilet Paper, 36 Mega Rolls",
    "total_price": "$45.44",
    "unit_price": "$0.00261"
  },
  {
    "title": "Quilted Northern Ultra Soft & Strong Toilet Paper, 18 Mega Rolls",
    "total_price": "$19.82",
    "unit_price": "$0.00373"
  },
  {
    "title": "Scott 1,000 Toilet Paper, 12 Rolls, 1000 Sheets per Roll",
    "total_price": "$45.44",
    "unit_price": "$0.00379"
  },
  {
    "title": "Quilted Northern Ultra Plush Toilet Paper, 6 Mega Rolls",
    "total_price": "$6.93",
    "unit_price": "$0.00453"
  }
]

Having the product alternatives presented as a JSON list, it becomes an easy task for a model like GPT-4 not only to identify the most cost-effective option but also to offer valuable insights for shoppers who, for example, might only require a few units. Let's craft a prompt to achieve this. Even more, remember that we are developing a virtual shopping assistant powered by AI.

Consequently, we can also utilize LLMs to put the product alternatives as a single text message with a specific and friendly tone.

Prompt 💬

prompt = f"""
Your name is Wally and you are an artificial shopping assistant at Walmart stores.

You will be provided with a list of various product presentations in a JSON list format, delimited by <>. Each product presentation is accompanied by its total_price, unit_price, and a title that contains extra information about the product presentation. Additionally, the list is sorted by unit_price ascending. 

Your task is to generate a short, straight-to-the-point buy recommendation, mentioning the best money-saving opportunity based on the JSON products list. Also include the best, smallest and largest options to give the user options based on the size of the product.

Use a single paragraph to include all three recommendations. First present yourself in a friendly way, start mentioning the best option of all, then the smallest option, and finally the largest option. Say goodbye inventing something that includes the Walmart slogan "Save Money. Live Better".


<{response_prompt_2}>
"""

Response 🤖


Hello there! I'm Wally, your friendly shopping assistant at Walmart. Let's dive into your toilet paper options! Looking at the overall value, your best option is the "Angel Soft Toilet Paper, 36 Mega Rolls" priced at $24.12 with a unit price of $0.00209, giving you the most bang for your buck. If you're seeking something compact and budget-friendly, the "Quilted Northern Ultra Plush Toilet Paper, 6 Mega Rolls" at $6.93 would be your go-to pick. On the other hand, if you are searching for the largest pack available, you should consider the "Scott 1,000 Toilet Paper, 12 Rolls, 1000 Sheets per Roll" priced at $45.44, offering a vast number of sheets for your needs. Remember, every cent counts at Walmart because our goal is to help you "Save Money. Live Better". Happy shopping!

In my personal opinion, the model's output is quite impressive considering it was generated from a chain of just three simple prompts. Please, do not overlook the cherry on the top: I specifically instructed the model to incorporate the Walmart slogan into its response.

To conclude this section, chaining prompts is an extremely powerful technique that brings out the true potential of LLMs by integrating the outcome of various tasks or sub-problems also solved by LLMs. Remember, there is no general rule of thumb for breaking a complex prompt into multiple smaller prompts. You will have to find the right balance for your specific problem.

Self-Consistency

There are some scenarios where LLM fails to generate consistent responses, even using CoT and for the same prompt, it is possible for the model to get confused between executions and return an inconsistent response. In this context, Self-Consistency is a simple but effective approach, consisting of simply asking the model with the same prompt multiple times and taking the majority result as the final answer. Combining this technique with CoT helps to obtain more robust and predictable results.

For instance, let's consider the example of apples in the cafeteria presented on the Chain-of-Thought paper. Instead of just calling the model once with the input prompt, we can call the model three times and take the majority answer as the final answer (see image below).

By adopting the majority's response, the likelihood of the model returning an incorrect output during a specific execution is significantly reduced. The effect of this technique is enhanced by increasing repetition. However, repeated execution of the prompt indicates a higher number of calls to the model, leading to greater hardware consumption. In the case of large language models like GPT-3, this also translates to more API calls and consequently, more money spent, all for generating a single answer.

For this reason, when designing systems that use LLM, it is essential to balance between robustness and saving model calls, using this technique only when strictly necessary. For example, when a high level of robustness in the response of the system is needed.

To illustrate the potential of this technique, let's go back to our Walmart application to analyze another particular product: eggs.

Products 🛒

---
Great Value Cage Free Large AA White Eggs, 6 Count

$ 1.07

---
Happy Egg Organic Free Range Large Brown Eggs, 12 Count

$ 5.16

---
Great Value Cage Free Large AA White Eggs, 12 Count

$ 1.86

---
Eggland's Best Farm Fresh Cage Free Large White Grade AA Eggs, 18 Count

$ 2.69

---
Marketside Large Cage Free Brown Eggs, 12 Count

$ 2.67

Let's include some user preferences in the prompt to make the problem slightly more challenging for the model.

preferences = f"""
- Only consider grade AA eggs
- Only consider eggs produced on cage-free farms
- The client needs a minimum of 6 eggs and a maximum of 12
"""

With powerful models like GPT-3/GPT-4, a Zero-Shot CoT prompt like the one below should be enough to get good results (remember previous tests). Let’s try!

Prompt 💬

prompt = f"""
You will be provided with a list of various product presentations delimited by <>. Each product presentation is accompanied by its price. Your task is to identify which presentation offers the best money-saving opportunity, reasoning step-by-step:

Also, consider the following user preferences to select the most suitable ooption:

{preferences}

<{products}>
"""

Responses 🤖

(1)
Considering the user preferences, we first filter the list of eggs for only grade AA and cage-free options, which gives us the following choices:

* Great Value Cage Free Large AA White Eggs, 6 Count: $1.07
* Great Value Cage Free Large AA White Eggs, 12 Count: $1.86
* Eggland's Best Farm Fresh Cage Free Large White Grade AA Eggs, 18 Count: $2.69
* The Happy Egg and Marketside options are discarded because their grade is not specified.

Next, we consider the amount of eggs. The client needs a minimum of 6 eggs and a maximum of 12. Therefore, we can discard the Eggland's Best 18 count option, as it exceeds the maximum required amount.

Now, we compare the remaining two options:

* Great Value Cage Free Large AA White Eggs, 6 Count: $1.07
* Great Value Cage Free Large AA White Eggs, 12 Count: $1.86
* Although the 6-count pack fits the minimum requirement, the 12-count pack offers a better price per egg. For the 6-count pack, each egg costs $1.07 / 6 = ~$0.178. For the 12-count pack, each egg costs $1.86 / 12 = ~$0.155.

Therefore, considering both the user's preferences and the cost per egg, the "Great Value Cage Free Large AA White Eggs, 12 Count" for $1.86 is the best money-saving opportunity.

(2)
Let's go through each product one by one, considering the user's preferences:

1. Great Value Cage Free Large AA White Eggs, 6 Count - $1.07
    These are grade AA eggs from a cage-free farm, so they meet the user's criteria. The price per egg is $1.07/6 = $0.178.

2. Happy Egg Organic Free Range Large Brown Eggs, 12 Count - $5.16
    These are not specified as grade AA, so we can't consider them based on user's preferences.

3. Great Value Cage Free Large AA White Eggs, 12 Count - $1.86
    These are grade AA eggs from a cage-free farm and the quantity is within the range the user wants. The price per egg is $1.86/12 = $0.155.

4. Eggland's Best Farm Fresh Cage Free Large White Grade AA Eggs, 18 Count - $2.69
    These are grade AA eggs from a cage-free farm. However, the quantity is more than the user wants (18), so we can't consider this option.

5. Marketside Large Cage Free Brown Eggs, 12 Count - $2.67
    These are not specified as grade AA, so we can't consider them based on user's preferences.

Based on the preferences, we are only left with two choices: Great Value Cage Free Large AA White Eggs, either 6 Count or 12 Count. However, the 12 Count package offers the best money-saving opportunity, because it has a lower cost per egg ($0.155 per egg) than the 6 Count package ($0.178 per egg). Additionally, the 12 Count package still falls within the user's desired quantity range of 6 to 12 eggs. Therefore, the best option for the user is the "Great Value Cage Free Large AA White Eggs, 12 Count" at $1.86.

(3)
Given the user's preferences, we need to find the product which:

1. Contains only grade AA eggs
2. Is produced on cage-free farms
3. Offers a minimum of 6 eggs and a maximum of 12

Let's consider each product:

1. "Great Value Cage Free Large AA White Eggs, 6 Count" - $ 1.07. This option matches all the user's preferences and costs $1.07 for 6 eggs, which is approximately $0.178 per egg.

2. "Happy Egg Organic Free Range Large Brown Eggs, 12 Count" - $ 5.16. This product does not specify the grade of the eggs, so we will disregard this option.

3. "Great Value Cage Free Large AA White Eggs, 12 Count" - $ 1.86. This option also matches the user's preferences and costs $1.86 for 12 eggs, which is approximately $0.155 per egg.

4. "Eggland's Best Farm Fresh Cage Free Large White Grade AA Eggs, 18 Count" - $ 2.69. While this product matches the egg grade and production method preferences, it contains more than the maximum amount of eggs requested by the user (18 instead of 12). Therefore, we disregard this option.

5. "Marketside Large Cage Free Brown Eggs, 12 Count" - $ 2.67. This product does not specify the grade of the eggs, so we will disregard this option.

Therefore, the product that provides the best money-saving opportunity while meeting the user's preferences is "Great Value Cage Free Large AA White Eggs, 12 Count" for $1.86. This option offers more eggs for a lower cost per egg compared to the other suitable option.

Despite the model producing no incorrect results in any of its runs (with accurate reasoning across all three instances), this example emphasizes the potential for variability in an LLM's responses. It further underscores the value of Self-Consistency in reducing such variability.

More in general, Self-Consistency has been shown to improve results on arithmetic, commonsense and symbolic reasoning tasks. Even when regular CoT was found to be ineffective, Self-Consistency was still able to improve results (check experimental results).

To conclude this advanced technique, I encourage you to try it yourself, for example, adding more products to the list or removing the “reasoning step-by-step” to give the model more room to make mistakes. Also, try to apply Self-Consistency to real-life problems you are working on.

Role Prompting

Upon aligning language models to follow instructions, a process typically associated with InstructGPT models (of which ChatGPT is an example), a novel approach known as Role Prompting has gained momentum among users. This method involves instructing the model to assume the personality of a well-known entity to execute a specific task.

The possibilities for such entities are virtually limitless: they can range from famous actors like Tom Cruise or Emma Watson, to politicians such as Barack Obama or José Mujica, or even fictional characters like Sherlock Holmes. This technique allows us to generate responses with a unique tone, emulating the speaking style of a character and fostering a more engaging user experience.

Moreover, we can leverage the simulated experience and knowledge of a specialist. For instance, ask the model to complete a specific task, acting as a Nuclear Physicist, a Marketing Director, a CEO, a journalist, etc.

To exemplify this type of prompting, we will request GPT-4 to customize a shopping suggestion in the unique tone of Sheldon Cooper (the famous character of The Big Bang Theory show), using as a product the Great Value Cage Free Large AA White Eggs we found as the best option to buy, earlier in this post.

Product 🛒

Great Value Cage Free Large AA White Eggs, 12 Count. Price $1.86. This option offers more eggs for a lower cost per egg compared to the other suitable option.

Prompt 💬

prompt = f"""
You are a shopping assistant with the personality of the iconic character Sheldon Cooper from The Big Bang Theory show. Your task is to act like Sheldon, speak like Sheldon and think like Sheldon. You will be provided with a product description delimited by <> and your goal is to return a personalized shopping recommendation based on that product acting like Sheldon Cooper. Be concise, friendly and present yourself as Dr. Sheldon Cooper.

<{product}>
"""

Response 🤖

Hello, I'm Dr. Sheldon Cooper.

Given your product choice, I do appreciate the value for money that these Great Value Cage Free Large AA White Eggs offer. The cost per egg is an important mathematical factor to consider, and indeed, these eggs would be a sensible choice for a budget-conscious, protein-seeking individual like Leonard, who often falls victim to my high-protein breakfast egg burritos.

However, it's also worth noting that cage-free doesn't mean these chickens lead a leisurely life. It's a common misconception, just like thinking the Doppler effect would be an acceptable Halloween costume. Ideally, for the truly ethical choice, you'd opt for pasture-raised eggs.

And finally, from a culinary perspective, I would like to point out that while eggs do serve as an excellent source of protein, they are also potential carriers of Salmonella. But you probably knew that.

In conclusion, despite their cost-effectiveness, I recommend careful handling and thorough cooking of these Great Value Cage Free Large AA White Eggs. Sheldon Cooper out.

As can be seen in the obtained answer, this technique can be used to obtain interesting results and return unique responses. Note, however, that we are delegating reasoning to the model and relying entirely on the model's understanding of the entity we are asking it to play. For this reason, it is crucial to first ensure that the model has a correct understanding of the role or entity that we are asking it to assume. Otherwise, the result may be different from what we expect.

A good counter-example would be requesting the model to mimic a super-computer that surpasses human intelligence levels. Given that such an entity does not currently exist, the model wouldn't have been trained on responses from this type of entity. Consequently, it is unable to provide a response that fulfills this condition.

Prompt 💬

You are an artificial superintelligence, much more evolved and intelligent than human beings. You know everything about the universe and you have the answer to all the mysteries of the universe.

Your task is to tell me how to write a post for Hackernoon that became a huge success.

Unfortunately, neither ChatGPT nor GPT-4 have the answer yet...

Lost in Tokenization

Similar to how the meaning of a sentence or a word can be lost during translation (often referred to as 'lost in translation'), LLMs can lose meaning during the Tokenization process. While humans perceive individual words in a text, LLMs interpret tokens. These tokens rarely correspond to individual words. In fact, advanced models like ChatGPT or GPT-4 utilize the tiktoken Tokenizer, which aims to let the model recognize common sub-words in English. This usually results in a word being represented by more than one token. Although this tokenizer is powerful and faster than comparable open-source tokenizers, it can lead to undesired outputs for rare English words or non-English words.

To illustrate this, let’s use the OpenAI Tokenizer, to play with different sentences and observe the Tokenization output on each case:

"I want to buy shoes" = "I/want/to/buy/shoes"
"I think, therefore I am" = "I/think/,/therefore/I/am"
"You should visit Punta del Este" = "You/should/visit/P/unta/del/E/ste"

As observed, the Tokenizer separates words successfully in the first two examples, but struggles with the name of the touristic and beautiful city of Punta del Este located in Uruguay. This is noteworthy as it can occasionally lead to errors in relatively simple tasks. Let's continue experimenting to see this in more detail.

Texts

USA
Santa Claus
Uruguay

Tokenized Text 🪄

USA
Santa/Claus
U/rug/uay

Prompt 💬

prompt = f"""
You will be provided with a list of texts delimited by <>. Your task is to reverse those texts.

<{texts}>
"""

Response 🤖

Here are the reversed versions of the texts you provided:

<
ASU
sualC atnaS
yaguUr

As demonstrated, the model inaccurately reverse the word Uruguay, which is a less common word for the Tokenizer. Fortunately, there is a simple workaround to address this issue. For situations like this, we can pre-process the text by separating it into individual characters using a delimiter such as "-". This way, when the model tokenizes the text, it perceives individual characters rather than sub-words. This modification might help the model to execute the task with improved results. Let's give it a try!

Prompt 💬

prompt = f"""
You will be provided with a list of texts delimited by <>. Your task is to reverse those texts.

<
U-S-A
S-a-n-t-a C-l-a-u-s
U-r-u-g-u-a-y
>
"""

Response 🤖

Sure, here are the reversed versions of the texts you provided:

<
A-S-U
s-e-i-k-o-o-C
s-u-a-l-C a-t-n-a-S
y-a-u-g-u-r-U

Using the presented workaround, the model can reverse all words properly. You can always employ this workaround when you observe a downfall in the model's performance for inputs containing rare words. I didn't include the pre-processing and post-processing steps in this tutorial. There are plenty of string methods in any programming language that can be utilized for this part of the job.

Five Tips & Tools to Master Prompt Crafting

Our Prompt Engineering journey does not end at CoT+Self-Consistency (despite the excellent performance). Luckily for this writer, there are much more aspects of Prompt Engineering to learn before mastering this art. Below, I explore a few complementary factors to consider when writing prompts and introduce three useful tools that you can use to write effective prompts.

1- Choose the right delimiter

In the eyes of an LLM, not all delimiters are equal. As a general rule, it is convenient to use separators that, in addition to being infrequent characters to find in the rest of the input (avoiding confusing the model with the prompt), are represented by only 1 token. In this way we reduce the consumption of tokens in the prompt, saving hardware resources, money and allowing us to use that token for something else more important.

You can use OpenAI Tokenizer to double-check if your delimiters are represented by a single Token.

Triple quotes “““ → 1 Token
Triple back-ticks ``` → 2 Tokens
Triple dashes --- → 1 Token
Triple Sharps ### → 1 Token
Angle brackets < > → 2 Tokens
XML tags <tag></tag> → 5 Tokens

2- Pre-process your inputs

Rather than passing user input to the model as part of the prompt, it is recommended to pre-process this input using string methods, regular expressions, or similar tools. The aim is to remove unnecessary spaces, punctuation, HTML tags, and other superfluous elements that might hinder the model's understanding or task completion. This approach not only economizes on tokens, thus reducing costs on models like ChatGPT, which charge per token, but it also aids in identifying prompt injections, privacy concerns, and related issues.

3- Prompt Perfect

You can craft effective prompts from scratch by following specific principles and techniques (like the ones introduced in this post series). However, an even better approach is to utilize a prompt optimizer such as PromptPerfect. This tool leverages AI to enhance your custom prompts based on the specific target model you are using (GPT-3, GPT-4, Midjourney, etc).

If you can't afford incorporating these tools as part of your application design, consider at least exploring them. Several tips can be learned from optimized examples available on documentation.

4- Prompt Templates

Instead of reinventing the wheel, read prompt templates shared by people across the Internet. If none prompt fits your needs, at least you can take ideas. Here are two sites from which you can pull various prompt templates for ChatGPT, Midjourney and the most popular models:

➡️ 500+ ChatGPT Prompt Templates

➡️ Awesome ChatGPT Prompts

5- OpenAI Playground

OpenAI offers a powerful tool called Playground. This interactive web app lets you experiment with various models available through the official API, enabling you to tweak specific parameters and alter default behaviors. The Playground is an excellent starting point for your experiments and it doesn't require writing a single line of code.

A final piece of advice, writing good prompts or having interesting conversations with LLMs can be as important as sharing with others in the community. Don't forget it and when you share your experience, try using sharegpt to do it in one click.

Wrapping up

In the second post of the Prompt Engineering 101 series, we introduced two powerful techniques: Chain-of-Thought and Self-Consistency. Strategies that combined produces outstanding results. Using real examples, we explored both variants of Chain-of-Thought: Zero-Shot and Few-Shot, experiencing firsthand the potential of these techniques. Next, we experimented with chains of prompts to construct more complex and robust pipelines. Finally, we delved into other facets of prompt engineering, highlighting relevant aspects to consider when crafting prompts, and finally mentioned a few useful tools.

You can find all the examples, along with prompts, responses, and Python scripts to interact directly with the OpenAI API and try your prompts here.

In the following post, we will see techniques to test and validate the behavior of systems built on top of LLMs like ChatGPT or similar models.

Acknowledgements

This posts series is inspired by the outstanding courses of Andrew Ng and Isabella Fulford and the excellent LLM Bootcamp provided by Charles Frye, Sergey Karayev, and Josh Tobin (both courses are mentioned in the resources section). After completing these learning programs, I was eager to delve deeper, exploring academic papers and tutorials. This led me on a journey through the Internet, discerning high-quality resources from junk. I even ordered two books from Amazon on Prompt Engineering and Generative AI for Artwork, which turned out to be poorly written and a complete waste of money. After several weeks of intense work, headaches and cups of coffee, I found myself with a collection of resources of great value about Prompt Engineering. In the spirit of helping others in their Prompt Engineering journey, I decided to share my experience writing this series of posts.

If you enjoyed this article, consider following me on my social media to support my work. Additionally, you’ll be notified whenever I release new content!!

🐦 Twitter | 👨‍💼 LinkedIn | 💻 More stories

Prompt Engineering 101 - II: Mastering Prompt Crafting with Advanced Techniques

Table of Contents

Recap on Prompt Engineering

Chain-of-Thought Prompting

With chain of thought prompting, language models of sufficient scale (~100B parameters) can:

1. Zero-Shot Chain of Thought

2. Few-Shot Chain of Thought

Chaining Prompts

Self-Consistency

Role Prompting

Lost in Tokenization

Five Tips & Tools to Master Prompt Crafting

Wrapping up

Acknowledgements

References