Prompt Engineering 101 - I: Unveiling Principles & Techniques of Effective Prompt Crafting

Written by eviotti | Published 2023/05/29
Tech Story Tags: machine-learning | artificial-intelligence | ai | chatgpt | programming | prompt-engineering | tech | technology

TLDRPrompt Engineering 101 is a blog series meticulously designed to unpack the principles and techniques of Prompt Engineering. It encompasses everything from fundamental concepts to advanced techniques, and expert tips, covering various models from ChatGPT to Stable Diffusion and Midjourney. In the first post of this series, we delve into the basic principles and strategies for crafting clear and specific prompts suitable for text-to-text models such as ChatGPT and other auto-regressive language models.via the TL;DR App

The cover image features the Commodore PET (Personal Electronic Transactor), a revolutionary machine launched in 1977 and a presage to a new era of personal computing. Despite its modest 1MHz microprocessor, 8KB of RAM, and built-in monochrome monitor, it marked a pivotal shift, taking computers from a do-it-yourself computer assembly hobbyist culture to any middle-class family (considering that it cost the equivalent of USD 3500 now).

GPT-3 and later models like ChatGPT or StableDifussion, have a similarly transformative effect on AI. Now you don't need to be an engineer well-versed in linear algebra to have access to powerful Machine Learning models; just the know how to write clear instructions.

Even so, from the eyes of many people, ChatGPT and Generative AI is still a magic box that achieves incredible things in ways that no one understands, leading many people to stay away from this technology. Surprisingly, the same thing happened with early personal computers like the Commodore. Nevertheless, there is nothing magical and nothing to fear about Language Models; these are just statistical models that predict the next most probable token based on a context input. Statistical models are a very powerful tool for many common tasks.

Prompt Engineering 101 is a post series designed and written to unveil the principles and techniques of prompt engineering, the art of crafting clear and effective texts, to prompt language models and get exactly what you are looking for. Furthermore, this series is not restricted to autoregressive language models like ChatGPT, but rather aims to address Prompt Engineering as a general technique to any Generative models, including text-to-image models such as Stable Diffusion or Midjourney. In addition, it addresses limitations of Language Models like hallucinations, prevention techniques, privacy and security concerns, and more…

On the other hand, there are a variety of tutorials on Prompt Engineering: blogs, boot camps in LLMs, reading lists, etc. This series does not pretend to be the best source or an ultimate reference. Far from it! It merely intends to offer a complementary, practical, and easy-to-read look to this subject.

In post #1 of this series, we cover basic principles and tactics for writing clear and specific prompts for text-to-text models (like ChatGPT). I hope you enjoy it!

Table of Contents

  1. What is Prompt Engineering
  2. Principles of Prompt Engineering
  3. Five Tips & Tricks to Make your Prompt Effective
  4. Wrapping Up
  5. References

What is Prompt Engineering?

From intuition, a prompt is nothing more than a text that goes inside a Language Model (LM), and “Prompt Engineering” is the art of designing that text to get the desired output. Some people joke that Prompts are like magic spells: a collection of words that follows bizarre and complex rules to achieve the impossible.

Despite the lovely metaphor, there is nothing magical about Prompt Engineering. Language Models are just statistical models for text, and prompt engineering is just a technique to design the input text that contains the precise tokens (for example, instructions and context) that orient the model in the right direction to produce a good result (for example more tokens or an image). In addition, this technique can be defined through simple rules, learned, and ultimately mastered.

Formally, Prompt Engineering is a comprehensive process of designing and optimizing specific and unambiguous prompts for a Large Language Model to ensure that it generates relevant, accurate, and coherent responses. It involves tailoring input prompts that are clear, concise, and also includes enough context in order for the model to understand the user's input and generate an appropriate response. It also involves a deep understanding of the capabilities and limitations of the model and requires iterations and testing to reach the desired outcome.

In essence, the iterative prompt development process is very similar to writing code. If it doesn't work, we refine and retry until we get the desired result.

Effective prompt engineering is essential for building conversational agents that can understand and respond to natural language inputs in a useful and engaging way for human users. Furthermore is of vital importance to prevent situations where the model may generate nonsensical, inappropriate or harmful responses.

Principles of Prompt Engineering

There are three principles behind writing effective prompts that will lead you to get the output you expect quickly and with no pain.

Principle 1: Write clear and specific instructions

The best way to obtain the desired model output is to provide instructions that are as clear and specific as you can make. This reduces the chances of getting irrelevant or incorrect responses. However, don't confuse writing clear instructions with writing short instructions. In some cases, longer prompts provide more clarity and context information that will lead the model to generate more detailed and accurate responses.

Here are some tactics you can follow to get clear and specific prompts.

1. Use delimiters

Use delimiters in your prompt to separate the prompt instructions (system instructions) from the user input. You can use any punctuation symbol that tells the model the separation between the instructions and input.

Some delimiters examples:

  • Triple quotes “““
  • Triple back-ticks ```
  • Triple dashes ---
  • Angle brackets < >
  • XML tags <tag></tag>

Remember that long or verbose delimiters generate longer prompts (in tokens number), which can be a problem on some models with a short value of maximum prompt length.

Using delimiters is also a helpful technique to avoid prompting injections by explicitly separating the model instructions from the rest of the prompt (I will cover this in more detail later in this post series).

Let's see this tactic in practice with a real example. Imagine we are working to improve an e-commerce website providing better insights to the sales team based on customer reviews. For instance, if shoppers frequently mention that a product's sizing is either too large or too small, this could indicate that the actual size chart is inaccurate.

However, manually analyzing customer reviews is a tedious and time-consuming task. For this reason, we will use GPT-4 to examine these reviews and provide a comprehensive summary.

The following examples are actual reviews that online shoppers have given of a Columbia Men's Republic Parka (Obi-Wan Kenobi™) parka. Which BTW looks soo cool!! If it didn't cost 500 bucks, I would buy it too!

1 Star
This is an over engineered waste of money

Nobody is truly going to be wearing this unless your in the frozen tundra…. Zero mobility way to large in length… Absolute waste…

---
5 Stars
Incredibly warm!

Incredibly warm! You won’t go wrong with this jacket!

---
5 Stars
So warm!

I am a 5”4 woman and got this men’s jacket as a gift. I moved from sunny Los Ángeles to Detroit Michigan during the storm and this jacket saved me. I get cold easily and I was so warm while walking in the snow that I had to unzip a bit because I was getting too warm. Love that this jacket has so many pockets, I don’t even need to bring a purse cuz this jacket holds it all. The details they put into this jacket is amazing.

---
1 Star
Waste of money great look but just not designed to be used and it’s shown by the resale value

Absolutely way too bulky and does not fit as described… This jacket is over engineered meaning unless your going to the polar ice caps you won’t need it. Was hoping for a solid winter jacket and this is definitely not it. Zero mobility in this with how bulky it is… And I have other jackets by Columbia… Also Ordered an XL just as my other jackets but this one is basically a XXL the jacket is also extremely long for no reason. Definitely will be returning this as the package also came opened and the jacket was dirty.

---
2 Stars
Overpriced, and runs big

I love Star Wars and really loved this jacket in concept. It took forever to ship and when it did finally show up the package was ripped open and the jacket was dirty and damaged. Customer service was not helpful and I kept getting kicked from chat. They wanted me to return it and rebuy while doing so meanwhile I have spent $1000 and have no jacket for most of winter. Not worth the money or the hassle. Guess I am still a north face guy.

---
5 Stars
I love it!!

A very nice parka!! Comfy and warm and it looks great! Very glad I purchased it.

---
5 Stars
Awesome

Warmest jacket I've ever had. Lots of pockets, looks excellent

---
2 Stars
Very nice looking jacket

The jacket looks really nice and is warm . The sizing chart is completely off. I would normally be a large to extra large because of wide shoulders and back (I matched the large I'm the size chart). I can almost fit 2 of me into this jacket . Definitely go a size down , maybe even 2 . If you an athletic build .

Let's start with a simple and specific prompt, indicating the model to analyze the customer's review and generate an overview of them. To make more clean and easy-to-read examples, we will separate the system instructions from the rest of the input. To do this, we will use Python string literals.

Prompt 💬

prompt = f"""
You will be provided with a list of product reviews delimited by <>.
Analyze each reviews and provide an overview of customers satisfaction, listing 
positive and negative aspects reported by customers. 

<{reviews}>
"""

Our initial prompt is extremely concise and specific, with <> to separate the instructions from the rest of the input (you can use back-ticks, triple quotes or any other character). Despite its simplicity, the model gets the idea and generates a correct response, identifying and enumerating the positive and negative aspects mentioned by shoppers in their reviews. In particular, the model properly identifies highlighted aspects by shoppers, like ample storage space, and comfortable and warm feeling, among others. Analogous to positive aspects, it recognizes customer complaints such as sizing issues and shipping issues.

Notice that the model cites some aspects not strictly included in the customer reviews. For instance: ample storage space. Customers mentioned multiple pockets but never said plainly "ample" (this is an implicit assumption). Also, none complained specifically about the jacket price. This discrepancy and inconsistency is called Hallucinations, and is a common issue in LLM. We will see later a tactic that helps to reduce it. Feel free to read through the customer's review and take your own insights, then compare them with the generated by the model.

Response 🤖

Overview of customer satisfaction:

Positive aspects:

1. Incredibly warm and suitable for cold weather conditions.
2. Multiple pockets providing ample storage space.
3. Attractive design and aesthetics.
4. Comfortable and good fit for some customers.

Negative aspects:

1. Over-engineered, bulky, and may not be suitable for everyday wear.
2. Sizing issues, with some customers reporting that the jacket runs large.
3. Poor customer service experiences, including shipping issues and unhelpful responses.
4. Expensive and potentially not worth the price for some customers.

2. Ask for structured output

Sometimes it can be helpful to ask the model for structured outputs such as JSON, XML, or any other format supported by the model. Using this tactic reduces the complexity of the model response while making it easier to process the response using Python or any other coding language. So let's go back to our initial prompt and update it, asking the model to format the output in JSON format with two lists: positive and negative aspects.

Prompt 💬

prompt = f"""
You will be provided with a list of product reviews delimited by <>.
Analyze each reviews and provide an overview of customers satisfaction, listing 
positive and negative aspects reported by customers. 

Provide them in JSON format with the following keys: 
positive_aspects, negative_aspects.

<{reviews}>
"""

The new model response is more specific, with less room for creativity and easy to read from code. This introduces another critical aspect of prompt engineering: Balance specificity with creativity. While being specific is crucial, it is also important to allow room for the model to demonstrate its creative capabilities. Striking the right balance between specificity and creativity can lead to more engaging and useful results.

Response 🤖

{
  "positive_aspects": [
    "Incredibly warm",
    "Many pockets",
    "High-quality details",
    "Comfortable",
    "Stylish",
    "Excellent appearance"
  ],
  "negative_aspects": [
    "Over-engineered",
    "Bulky",
    "Limited mobility",
    "Sizing issues",
    "Expensive",
    "Poor customer service",
    "Shipping issues",
    "Dirty and damaged upon arrival"
  ]
}

3. Check whether conditions are satisfied

When designing application prompts based on user input, ensuring that the input meets certain minimal criteria necessary to complete the instructions is good practice. If not, return a known error message. By doing so, we can guarantee that the model responds in a controlled manner to any input that exceeds the scope for which the prompt and the application were initially designed.

For instance, another brand's jacket is being promoted in the subsequent shopper review. To prevent it from skewing our analysis, we will modify our prompt to automatically ignore this type of review.

(Additional Review)

---
1 Star
Not a real Jacket

Looking for a real jacket, suitable for any weather condition and ready for the highest demands? Check out our McMurdo Men’s McMurdo Parka, the best value for money on the market.

First, iterate on our last prompt to distinguish between valid and invalid customer reviews. The way to do this is to explicitly tell or show the model what is a valid product review. In the following prompt, we are instructing the model to ignore unrelated reviews. I encourage you to do your own test with the model.

(Intermediate Prompt)

prompt = f"""
You will be provided with a product review about a Columbia Jacket, delimited by 
<>.

If the text is related to the product, provide an overview of customer satisfaction, listing positive and negative aspects reported by the customer. 

If the text is not related to the product, then simply write \"The provided review is not related to the product or is a Spam.\"

{reviews}

"""

Now we know how to validate a condition over the input, let's combine this with the original prompt to add an ignored reviews counter into the JSON response.

Prompt 💬

prompt = f"""
You will be provided with a list of product reviews about a Columbia Jacket delimited by <>.
Analyze each reviews and provide an overview of customers satisfaction, listing 
positive and negative aspects reported by customers. 

It may happen that some of the reviews are not strictly related to the product, in that case ignore the review, 
update a counter of ignored reviews that you will display with the rest of the output and continue with the next review. 

Provide them in JSON format with the following keys: 
positive_aspects, negative_aspects, ignored.

<{reviews}>
"""

Response 🤖

{
  "positive_aspects": [
    "Incredibly warm",
    "Many pockets",
    "Great details",
    "Comfy",
    "Looks great",
    "Excellent appearance"
  ],
  "negative_aspects": [
    "Over engineered",
    "Lack of mobility",
    "Too large in length",
    "Bulky",
    "Sizing issues",
    "Poor customer service",
    "Delayed shipping",
    "Package received damaged and dirty"
  ],
  "ignored": 1
}

Far better… Notice that this prompt relies on the current knowledge of the model to distinguish a review concerning a Columbia Jacket (valid) from any other text (invalid).

This example is enough, but there are more robust techniques to teach the model to distinguish between valid and invalid inputs. The few-shot prompting tactic covered in the next section is a good example. But first, feel free to add more review examples to push model limits.

4. Few-shot prompting

In some scenarios, providing examples of successful executions of the task and then asking the model to perform it is a smart strategy for more accurate and predictable results. To illustrate this tactic, let’s momentarily think of a related functionality in the e-commerce website. Imagine that now we want to generate a response message for each customer review automatically.

To do this, we will tell the model how to respond to some typical examples of product reviews.

Prompt 💬

prompt = f"""
You will be provided with a list of product reviews delimited by <>.
Analyze each review and provide a personalized answer for each one, based on the content of the review.

Here is a few examples:

---
Review: I am very happy with this jacket, it is worth every penny
Response: Thank you for your review, we are very happy that you are satisfied with our products.

---
Review: I normally wear Columbia size L jackets, this in L is like an XL. I reordered the M… unfortunately the blue was not longer available =(
Response: Thank you for your review! We are continuously working to bring you a better shopping experience and innovative products that will surprise you.

---
Review: Nobody is truly going to be wearing this unless your in the frozen tundra…. Zero mobility way to large in length… Absolute waste…
Response: Thank you for your review, the feedback from our users represents an invaluable asset for us that allows us to work to improve each product and bring you a better shopping experience. 

Provide them in JSON format with the following keys: review, response. Where review is the first 10 words of the customer 
review to easy identify it and response is the generated response.

<{reviews}>
"""

Note that the model generates different responses, based on the user message, with a tone similar to the provided examples. You can extend the teaching examples list to show the model how to act under different circumstances. Also, note the model is not strictly following our specifications. The review summary is five words in length instead of ten. This may indicate that we still need to work on our prompt to be clear and specific.

Response 🤖

{
  "1": {
    "review": "This is an over engineered waste",
    "response": "Thank you for your review. We're sorry that the product didn't meet your expectations. We value your feedback and will consider it for future product improvements."
  },
  "2": {
    "review": "Nobody is truly going to",
    "response": "Thank you for your review. We apologize if the product didn't meet your needs. We appreciate your feedback and will use it to improve our products and ensure they cater to a wider range of customers."
  },
  "3": {
    "review": "Incredibly warm! You won’t",
    "response": "Thank you for your review! We're glad to hear that you're happy with the warmth of the jacket. We hope it continues to serve you well in the colder months!"
  },
  "4": {
    "review": "I am a 5”4 woman",
    "response": "Thank you for your review! It's great to hear that the jacket has been keeping you warm and you're enjoying the many pockets. We appreciate your feedback and hope the jacket continues to serve you well!"
  },
  "5": {
    "review": "Absolutely way too bulky and",
    "response": "Thank you for your review. We apologize for the inconvenience you've experienced with the sizing and bulkiness of the jacket. Your feedback is important to us, and we'll take it into consideration for future product improvements."
  },
  "6": {
    "review": "I love Star Wars and",
    "response": "Thank you for your review. We're sorry to hear about the issues you experienced with shipping and customer service. Your feedback is valuable to us, and we'll work on improving our processes to provide a better experience in the future."
  },
  "7": {
    "review": "A very nice parka!! Comfy",
    "response": "Thank you for your review! We're happy to hear that you're enjoying the comfort and warmth of the parka. We hope it continues to serve you well in the colder months!"
  },
  "8": {
    "review": "Warmest jacket I've ever had.",
    "response": "Thank you for your review! It's great to hear that this is the warmest jacket you've ever had. We hope you continue to enjoy its warmth and style during the colder months!"
  },
  "9": {
    "review": "The jacket looks really nice",
    "response": "Thank you for your review. We apologize for the issues you experienced with sizing. We appreciate your feedback and will use it to improve our sizing charts to better accommodate customers with different body types."
  }
}

Principle 2: Give the model time to think

If you provide the model with a problem that is too complex, the model might need more time to think of a proper answer. This means spending more computational time to generate the response and also using more tokens in both input and output response. You can think of this as asking the model to organize its reasoning process in a particular way, for example, by following a series of steps or performing certain tasks before generating a final answer. But above all, use more tokens in the response and process it for more computation cycles.

Here are some tactics that can be used to instruct the model to think longer before generating a response.

1. Specify the steps to complete a task

The first tactic is to provide the model with a comprehensive list of steps to solve the task; the more specific the steps, the better. Let's go back to our shopper reviews and rewrite the model prompt to get customer insights and respond to reviews.

Prompt 💬

prompt = f"""
Your task is to perform the following actions: 
1 - Analyze the following product reviews delimited by 
  <> and summary each review with 1 sentence.
2 - Categorize the opinion according to the following categories: super happy, happy, neutral, upset
3 - Provide an overview of the customer satisfaction, listing positive and negative aspects mentioned in the review
4 - Provide a personalized answer for each one, based on the content of the review.
5 - Output a json list object that contains the 
  following keys for each review: summary, category, insights, answer.

Text: <{reviews}>
"""

I had to reduce the number of input reviews to narrow the response length, as the limit was leaving some unparsed reviews in the model output. Despite that, you can verify that the model generates more accurate review responses by understanding each review fairly well and extracting valuable insights.

Response 🤖

[
  {
    "summary": "Reviewer thinks the jacket is an over-engineered waste of money with zero mobility.",
    "category": "upset",
    "insights": {
      "positive": [],
      "negative": [
        "over engineered",
        "zero mobility",
        "large in length"
      ]
    },
    "answer": "We're sorry to hear that you found the jacket to be over-engineered and uncomfortable. We appreciate your feedback and will take it into consideration for future improvements."
  },
  {
    "summary": "Reviewer finds the jacket incredibly warm.",
    "category": "super happy",
    "insights": {
      "positive": [
        "incredibly warm"
      ],
      "negative": []
    },
    "answer": "We're thrilled to know that you're enjoying the warmth of the jacket! Thanks for sharing your positive experience."
  },
  {
    "summary": "Reviewer loves the warmth, pockets, and details of the jacket.",
    "category": "super happy",
    "insights": {
      "positive": [
        "warm",
        "many pockets",
        "amazing details"
      ],
      "negative": []
    },
    "answer": "We're glad that you love the jacket's warmth, pockets, and attention to detail. It's great to know that it has served you well during the cold weather in Detroit!"
  },
  {
    "summary": "Reviewer complains about the jacket being too bulky, not fitting as described, and receiving a dirty product.",
    "category": "upset",
    "insights": {
      "positive": [],
      "negative": [
        "too bulky",
        "does not fit as described",
        "dirty"
      ]
    },
    "answer": "We apologize for the inconvenience caused by the sizing and cleanliness issues. We appreciate your feedback and will work on improving these aspects. Please feel free to return the jacket for a refund or exchange."
  },
  {
    "summary": "Reviewer loves the parka, saying it's comfy, warm, and looks great.",
    "category": "super happy",
    "insights": {
      "positive": [
        "comfy",
        "warm",
        "looks great"
      ],
      "negative": []
    },
    "answer": "We're happy to hear that you love the parka's comfort, warmth, and style. Thank you for your positive review!"
  },
  {
    "summary": "Reviewer thinks the jacket is the warmest they've ever had, with lots of pockets and an excellent look.",
    "category": "super happy",
    "insights": {
      "positive": [
        "warmest jacket",
        "lots of pockets",
        "looks excellent"
      ],
      "negative": []
    },
    "answer": "We're delighted that you find our jacket to be the warmest you've ever had, and that you appreciate the pockets and style. Thank you for sharing your experience!"
  },
  {
    "summary": "Reviewer likes the look and warmth of the jacket but finds the sizing chart off.",
    "category": "neutral",
    "insights": {
      "positive": [
        "nice looking",
        "warm"
      ],
      "negative": [
        "sizing chart off"
      ]
    },
    "answer": "We appreciate your feedback on the jacket's look and warmth. We apologize for the sizing issue and will work to improve the accuracy of our sizing chart. Please consider exchanging for a better fit if needed."
  }
]

2. Instruct the model to work out its own solution before rushing to a conclusion

The second tactic is similar to the previous one but, in addition, asks the model to first solve the problem by itself before rushing to a conclusion. This is similar to instructing the model to first work on its own solution based on the context and then compare it with the input solution.

For example, if the task is to validate students’ answers for a simple math problem like 2 + x = 4 then x=?, instruct the model first to find a solution and then compare it with the student’s solution.

To illustrate the power of this tactic, we will explore another feature of the e-commerce website: processing returns and refunds tickets. The idea is to give the model a list of actual shopper's refund demands, to process and evaluate if a refund applies based on the company policy. Here are six examples of refund requests for various reasons.

---
User: Jack
Text: The description of the product stated that the socks were made primarily of cotton with a bit of spandex. The product I received indicated that the socks were made of primarily of polyester and a tiny bit of spandex. I wanted cotton socks so I returned the product. Several attempts to contact the supplier went unanswered and I still have not received a refund on my credit card. I need a solution and want a refund.

---
User: Tom
Text: When I received my purchase, the package was open and significantly damaged. After unpacking it, I found that the shirt was dirty and had several scratches. I want to return and have a new unit sent to me.

--- 
User: Bob
Text: I ordered an XL just as my other Columbia jackets but this one is basically a XXL. The jacket is also extremely long for no reason. I want to return it.

--- 
User: Shara
Text: The jacket looks really nice and is warm but the sizing chart is completely off. I would normally be a large to extra large because of wide shoulders and back (I matched the large I'm the size chart). However, after using it a week I can say I need to go a size down or maybe even 2. Can I change it for a medium size?

---
User: Sasha
Text: This jacket is incredibly warm and solidly built. However, they charged my credit card twice, so I am writing to get a refund of the second charge. Thanks in advance.

---

User: Chris
Text: I bought this backpack for my children because the capacity and solid construction look. However after 2 weeks of school is completely demolished. Poor quality materials and broken zips. I want a refund.

---

For those who have not had the joy of receiving a damaged product, the returns and refunds conditions are not particularly easy to read, with plenty of rules for specific cases and fine print. For example, let's imagine that the return and refund policy of our e-commerce site is reduced to the following four rules (in a real scenario, it would be much more than four).

  1. The customer received the product with damages or in poor conditions
  2. The customer received the wrong size or other product. The product must still be in the original package without usage.
  3. The customer is not satisfied with the product and wishes to change it. The product must still be in the original package without usage.
  4. The customer was charged more than once on his credit card.

Under these conditions, Shara and Chris's requests are not eligible for a refund since they took the purchase from its original packaging and used it for a while. The conditions clearly state that the products must be in their original packaging and not be used. All other requests apply for a refund.

Now we have the rules, let's design a prompt to instruct the model for processing refund requests. The idea is to follow a step-by-step reasoning process and reach a conclusion based on the rule set instead of just guessing. Feel free to try this prompt yourself or modify it to follow a different reasoning process. Notice that this would also be a great case to provide real examples as part of the prompt, and take advantage of the few-shot learning capabilities of the model.

Prompt 💬


prompt = f"""
You will be provided with a list of user complaints delimited by <>, who want a refund of their purchase for different reasons. 
Your task is to determine if corresponds or not, to give the customer a refund based on the company policy.

Refunds policies

Only apply a refund in the following scenarios:
- The customer received the product with damages or in poor conditions
- The customer received the wrong size or other product. The product must be still in the original package without usage.
- The customer is not satisfied with the product and wishes to change it. The product must be still in the original package without usage.
- The customer was charged more than once in his credit card.

To solve the problem do the following:

1 - Analyze whether the purchase applies the refunds policy based on the reasons given by the customer.
2 - Categorize the case in: "apply" or "not apply" based on your previous judgment.
3 - Provide a concise and personalized response with the reasons for your previous solution.
5 - Output a json list object that contains the 
  following keys for each case: category, answer.

Tickets: <{text}>
"""

As we can see in the answer, GPT-4 correctly solves each case and provides a more than correct justification pointing out why it does NOT apply for a refund in the two negative cases. Surely it is necessary to test on a larger and more representative dataset before concluding that LLM can perfectly deal with this task. However, the results are very encouraging.

Response 🤖

[
  {
    "category": "apply",
    "answer": "Dear Jack, we apologize for the inconvenience caused. Since you received a product with a different material than described and have already returned it, you are eligible for a refund. We will process your refund as soon as possible."
  },
  {
    "category": "apply",
    "answer": "Dear Tom, we're sorry to hear that the product arrived damaged and dirty. As per our refund policy, you are eligible for a replacement. Please return the damaged shirt, and we will send you a new one."
  },
  {
    "category": "apply",
    "answer": "Dear Bob, we apologize for the sizing issue you experienced. As you haven't used the jacket and it's still in the original package, you can return it. We will process your refund or exchange it for the correct size upon receiving the returned item."
  },
  {
    "category": "not apply",
    "answer": "Dear Shara, we appreciate your feedback on the sizing chart. However, since you have already used the jacket for a week, we cannot process an exchange or refund. We recommend referring to the sizing chart and contacting us before making future purchases to ensure a better fit."
  },
  {
    "category": "apply",
    "answer": "Dear Sasha, we apologize for the double charge on your credit card. We will refund the second charge immediately. Thank you for bringing this to our attention."
  },
  {
    "category": "not apply",
    "answer": "Dear Chris, we're sorry to hear that the backpack didn't meet your expectations. However, as the item has been used for two weeks, our refund policy does not apply in this case. We encourage you to read our refund policies carefully before making future purchases."
  }
]

Principle 3: Balance specificity with creativity

A critical aspect when you write prompts is balancing specificity with creativity. While being specific is crucial, allowing room for the model to demonstrate its creative capabilities is also important. Striking the right balance between specificity and creativity can lead to more engaging and useful results. However, giving too much space to creativity can be a bad idea for some poorly tested or unstable LLMs, making it easy for hallucinations, hate speech, or racism to appear in the response.

1. The Tolkien’s Method

J. R. R. Tolkien, a world-renowned British author known for the fantasy novel ttrilogy The Lord of The Rings, used to trust a narrative style in which details abound and delves into secondary stories and legends aplenty, all with the aim that the reader gets a precise and faithful image of his fantastic middle earth. A simple but effective technique to help the reader to picture landscapes and characters in a similar direction that Tolkien visualized. Several writers have done the same when describing their fantastic universes: George R. R. Martin, J. K Rowling, Dan Simmons, etc.

Analogously, we can play the role of a writer, like Tolkien, when designing prompts. In this way, providing exhaustive details and context, to orient the model (the reader) to picture exactly what we have in mind. This strategy, which I like to call the Tolkien Method, allows us to obtain super precise results while limiting model creativity.

For example, suppose we want to develop a new feature to automatically suggest a product description based on the product manufacturer's info and speed up the marketing team's daily work. Let's take the Columbia jacket as an example and build a prompt to complete this task.

Product Info 🧥

Columbia Men's Republic Parka (Obi-Wan Kenobi™)

Prompt 💬

prompt = f"""
You will be given a product name delimited by <>.
Your task is to provide a product description, to be used on the product listing page, within an e-commerce website.

<{product}>
"""

Response 🤖

The Columbia Men's Republic Parka (Obi-Wan Kenobi™) is the perfect intersection of fashion and fandom for all Star Wars enthusiasts. This special edition parka, inspired by the iconic Jedi Master Obi-Wan Kenobi, is designed to provide superior comfort and warmth while embracing the essence of your favorite Star Wars character.

Made from high-quality, water-resistant materials, this parka is designed to withstand diverse weather conditions, making it perfect for adventures in both urban and outdoor settings. The adjustable hood is lined with faux-fur trim reminiscent of Obi-Wan's desert attire, and the jacket is available in a Jedi-inspired color palette, adding an authentic Star Wars touch.

The parka features multiple pockets, providing ample storage for your essentials. The full-length zip with a snap-button storm flap ensures additional weather protection, and the adjustable cuffs and hem provide a customizable fit for optimal comfort.

As a part of Columbia's Star Wars collection, this Republic Parka showcases an embroidered Obi-Wan Kenobi silhouette patch, which will proudly proclaim your affinity for the legendary Jedi. Whether you're off on an interstellar adventure or simply braving the cold, the Columbia Men's Republic Parka (Obi-Wan Kenobi™) ensures that the Force will always be with you.

Giving the model absolute freedom, the response obtained is quite impressive. The product description mentions actual Columbia technologies and incorporates accurate facts about the Star Wars Saga. However, what would happen if the product is new, and the model does not know facts about it? For example, a new Columbia Parka inspired by the Star Wars saga aesthetics and Donald Trump (you can try it as an exercise).

An alternative approach is to restrict the model response by providing details about the response structure, use of language, and a source of facts to use. For example, let’s change the prompt using Tolkien’s method to restrict the possible description. First, let's include a complete sheet with product information to be used as a source of facts.

Product Info 🧥

Men's Republic Parka (Obi-Wan Kenobi™)

Details
THE JEDI™ WAY
Seize the high ground this winter with a special-edition hooded parka built to keep you warm in frigid conditions and crafted for Star Wars: The Clone Wars™ fans and outdoor enthusiasts alike.
MASTER THE ELEMENTS
Inspired by Obi-Wan Kenobi™ and his adventures on the ice planet Orto Plutonia™, the waterproof-breathable shell keeps you dry and protected while 650-fill insulation and advanced thermal-reflective lining team up against the cold.
FOR THE REPUBLIC
Fan-favorite details include Jedi and Republic insignia along with an internal “blood chit” label coded in Aurebesh™, the universal written language of the Star Wars galaxy. Additional features include security, hand, chest, and cargo pockets along with comfort cuffs and an adjustable hood (with removable faux fur) and hem.

Omni-Tech™ waterproof/breathable critically seam sealed
Omni-Heat™ Infinity advanced thermal reflective
650 fill power down insulation, RDS certified
Attached, adjustable hood
Interior security pocket
Internal mesh pockets
Zippered hand pockets
Dual entry chest pockets
Cargo pockets
Drawcord adjustable hem
Drop tail
Removable faux fur
Comfort cuff with thumb hole
Center Back Length: 33"
Imported

Prompt 💬

prompt = f"""
You will be given a product name and complete description delimited by <>.

Your task is to provide a product description, to be used on the product listing page, within an e-commerce website, based only on the information present on the product 
details. Do not use facts about the product outside the provided description. Also, ensure description is at most 150 words long and contains common used words in english. Regarding the structure, starts with a catchy sentence that captures the spirit of the 
product and then continues mentioning possible uses of the product in real life and use this to highlight the main characteristics of the product. Ends the description with a final catchy sentence that motivates the reader to go to the nearest store to buy it.

<{product}>
"""

Response 🤖

Embark on your everyday adventures with the Men's Republic Parka (Obi-Wan Kenobi™), designed to withstand the coldest winters and perfect for Star Wars: The Clone Wars™ fans and outdoor enthusiasts. Channel your inner Jedi with this parka, inspired by Obi-Wan Kenobi's exploits on the frosty Orto Plutonia™. Experience supreme warmth and comfort, thanks to the 650-fill insulation and Omni-Heat™ Infinity advanced thermal-reflective lining that works diligently against the cold.

Whether you're braving the city streets on a frosty day or exploring snowy landscapes, this parka ensures ultimate protection. The waterproof-breathable shell, combined with an adjustable hood, keeps you dry, while the cargo, chest, and hand pockets provide ample storage. Subtle Star Wars-inspired details such as Jedi and Republic insignias and an internal Aurebesh™ label elevate your style quotient.

So, don your parka and seize the high ground this winter - for the Republic! Your next cold-weather adventure awaits

As you can see in the answer, the more info you provide in the input, context, and expected output, the more predictable the result is.

2. Temperature and Top P Parameters

In addition to the improved transformer architecture and massive unsupervised training data, better decoding methods have also played an important role in achieving impressive results in language generation demonstrated by models like GPT-3. If you are not familiar with decoding methods, I strongly recommend you to read this post using different decoding methods for language generation with Transformers. In short, auto regressive language models use different decoding methods to select the next token after inferring the individual probabilities. The most popular are Greedy searchBeam searchTop-K sampling, and Top-p sampling. Particularly GPT-3 and GPT-4 use Top-p sampling and understanding how to use it can help you to get more expressive and creative results.

Top P

Top-p sampling, introduced by Holtzman et al (2019), chooses from the smallest possible set of words whose cumulative probability exceeds the probability p. In other words, if you set  Top P=0.5 the model will consider only the subset of top tokens whose cumulative probability exceeds p=0.5 for predicting the next word and ignore the rest. Similarly, setting a low value of Top P (for instance 0.1), reduces the list of tokens to be considered for the next word prediction, while setting a high value of Top P increase the size of the list.

In other words, high value of Top P increase randomness in next word prediction, allowing more creative results, while lower values of Top P decrease creativity, making the next-token selection more predictable. By default, Top P is set to 1.0 on OpenAI models.

To illustrate the effect of the Top P parameter, let’s generate a product description based on the product title and tweak this parameter to observe the impact on the generated output.

To amplify the effect on the result, we are limiting the output to 20 words long, and running the same prompt through the model four times for each value of Top P.

Prompt 💬

prompt = f"""
You will be given a product name delimited by <>.
Your task is to provide a product description, to be used on the product listing page, within an e-commerce website. Use a limit of 20 words in the description

<Columbia Men's Republic Parka (Obi-Wan Kenobi™)>
"""

Responses 🤖 for Top P = 1 (default)

(1)
Stay warm and stylish with Columbia's Republic Parka inspired by Obi-Wan Kenobi™. Perfect for any outdoor adventure.

(2)
Stay warm and stylish with this Columbia Men's Republic Parka featuring Obi-Wan Kenobi™ artwork. Perfect for any Star Wars fan.

(3)
Stay warm and stylish with the Columbia Men's Republic Parka, featuring a unique Obi-Wan Kenobi™ design.

(4)
Stay warm on Hoth with this Star Wars-inspired parka featuring Omni-Heat™ reflective lining and waterproof fabric.

Responses 🤖 for Top P = 0.1

(1)
Stay warm on Hoth with this Star Wars-inspired parka featuring Omni-Heat™ reflective lining and waterproof fabric.

(2)
Stay warm and stylish with this Columbia parka, inspired by Obi-Wan Kenobi™. Perfect for any winter adventure.

(3)
Stay warm and stylish with this Columbia parka, inspired by Obi-Wan Kenobi™. Perfect for any winter adventure.

(4)
Stay warm like a Jedi with this Columbia parka, inspired by Obi-Wan Kenobi™. Perfect for cold weather adventures.

As you can see from model responses comparing the difference between the model output for each of the repetitions, with Top P = 1 we have more variance on the output, altering several words and in different positions, even the output length changes. On the other hand, the responses for Top P = 0.1 are more deterministic, with minor variations on some words. In summary, tweak Top P parameter is an effective strategy to change from deterministic results (Top P close to 0) to unpredictable and more creative results (Top P close to 1).

Temperature

A trick inspired by thermodynamics is to make the distribution of the next words' conditioned probabilities ​sharper (increasing the likelihood of high-probability words and decreasing the likelihood of low-probability words) by lowering the so-called temperature of the softmax. A temperature parameter close to 1 would mean that the logits are passed through the softmax function without modification. If the temperature is near zero, the highest probable tokens will become very likely compared to the other tokens. An illustration from How to generate text: using different decoding methods for language generation with Transformers, reflects pretty well the effect of applying temperature to a simple example.

Enough theory; it is time to see the Temperature parameter in action with the same prompt example.

As you can see from model responses comparing the results from each configuration, the effect on the model response is very similar to the effect produced by Top P parameter. Using a Temperature value close to 0 generates more deterministic results, while using values close to 1 produces more unpredictable and creative results. You can increase Temperature till 2.0 to get even more randomness in the results. However, I need to warn you that sometimes values over 1.5 tend to produce hallucinations and nonsense tokens (try yourself).

Responses 🤖 for Temperature = 1.0 (default)

(1)
Stay warm on Hoth or anywhere in the galaxy with this insulated parka, featuring Obi-Wan Kenobi™ inspired design elements.

(2)
Stay warm on planet Hoth with this Jedi-worthy parka featuring Columbia's signature technology.

(3)
Stay warm in style with this parka inspired by Obi-Wan Kenobi™, crafted by Columbia for men's ultimate comfort.

(4)
Stay warm and stylish with the Columbia Men's Republic Parka featuring iconic Star Wars character Obi-Wan Kenobi.

Responses 🤖 for Temperature = 1.5

(1)
Battle winter chill in iconic style wearing Columbia Men's Republic Parka, designed with insight from the recognized Jedi warrior Ob, vs Kenobi.

(2)
Stay warm as you navigate treacherous planets with this Columbia parka inspired by the legendary Jedi Master Obi-Wan.

(3)
Stay toasty warm on the coldest of days with this stylish jacket inspired by Obi-Wan Kenobi™ from Columbia.

(4)
Stay warm with this insulated parka inspired by Obi-Wan Kenobi's iconic hooded style. Made by Columbia for dependable quality.

Responses 🤖 for Temperature = 0.2

(1)
Stay warm like a Jedi with this Columbia parka featuring Obi-Wan Kenobi™ design. Perfect for cold weather adventures.

(2)
Stay warm in style with this Columbia parka inspired by Obi-Wan Kenobi™. Perfect for any Star Wars fan.

(3)
Stay warm in style with the Columbia Men's Republic Parka, inspired by the legendary Jedi Master Obi-Wan Kenobi™.

(4)
Stay warm and stylish with the Columbia Men's Republic Parka, inspired by Obi-Wan Kenobi™.

OpenAI also offers two extra options called Frequency penalty and Presence penalty that can be used to encourage the model to include new tokens in the response penalizing the repetition of tokens. Read more about this here.

Five Tips & Tricks to Make your Prompt Effective

The principles can help you as a general guide to writing clear and effective prompts. In addition, the following tips & tricks can help you optimize your prompts even more.

1- Iteration

There is no such thing as a perfect prompt, so start fast with and simple and imperfect prompt and then iterate, iterate, iterate. You can follow a four steps cycle: (i) write the prompt, (ii) execute it, (iii) analyze the error, (iv) find improvements, and repeat.

2- Be Explicit and Use Objective Language

Words that do not have a clear meaning or whose meaning may be subjective can lead the model to confusion. For example, as part of an instruction, ask to rewrite an input text to make it looks better. What does better mean? Instead, it is preferable to indicate how we want to improve the text explicitly: limit the total length to 50 words, correct grammatical errors, avoid personal language, etc.

3- Itemizing Instructions

Instead of writing a long paragraph with complex instructions, break it down into simple items instructions.

4- Avoid Negation Statements

Try to avoid negation statements, which sometimes can lead the model to confusion. Turn them into assertion statements. For example, instead of writing don’t use misogynistic or sexist language expressions,” write use inclusive and respectful language.”

5- Switch words

Some words may be more effective than others in contextualizing and guiding the model in the right direction. If your prompt isn't working, try replacing important words with synonyms, opting for simpler words.

A final piece of advice, life is not just about prompting! I mean, equally important is to know well the model you are using, how it works, and what extra parameters it has. For example, we can tune model creativity or robustness by tuning model temperature or Top_P parameter.

In conclusion, read the documentation!

Wrapping up

In this first post in the Prompt Engineering 101 series, we introduce an intuitive definition of prompt engineering and address three principles behind building effective prompts:

  1. Write clear and specific instructions

  2. Give the model time to think

  3. Balance specificity with creativity.

Throw real input examples and prompts; we covered different tactics to create prompts aligned to these principles, such as using delimiters, asking for structured responses such as JSON, etc.

Finally, we played with additional parameters like temperature (included by some models like GPT3/GPT-4), which helped us obtain more interesting results.

You can find all the examples, along with prompts, responses, and Python scripts to interact directly with the OpenAI API and try your prompts here.

In the following post, we will see more advanced techniques like Chain-Of-Though and Self-Consistency and different techniques and tools to optimize prompts.

Acknowledgements

This posts series is inspired by the outstanding courses of Andrew Ng and Isabella Fulford and the excellent LLM Bootcamp provided by Charles Frye, Sergey Karayev, and Josh Tobin (both courses are mentioned in the resources section). After completing these learning programs, I was eager to delve deeper, exploring academic papers and tutorials. This led me on a journey through the Internet, discerning high-quality resources from junk. I even ordered two books from Amazon on Prompt Engineering and Generative AI for Artwork, which turned out to be poorly written and a complete waste of money. After several weeks of intense work, headaches and cups of coffee, I found myself with a collection of resources of great value about Prompt Engineering. In the spirit of helping others in their Prompt Engineering journey, I decided to share my experience writing this series of posts.

I hope these posts will be helpful and that you enjoy them as much as I did writing them.

References


Written by eviotti | Machine Learning Director at IDATHA.com — AI Builder — CV and NLP practitioner — Hungry reader and stories writer.
Published by HackerNoon on 2023/05/29