How to Communicate Better with AI: A Prompt Engineering Guide | 5Y View

五源资本五源资本·March 23, 2023

Skills that matter for the future.

5Y Capital Partner Kai Liu

The invention of the printing press in the 15th century dramatically accelerated the spread of knowledge and the advancement of productivity; the birth of the personal computer in the 20th century had a similar impact. Throughout the journey of these transformative technologies — from their inception to widespread adoption to profound societal impact — the greatest driving force has always been ordinary people's ability to understand and wield these tools.

With the printing press, literacy became a universal skill. With the personal computer, typing on a keyboard became a universal skill. Now, as we've barely cracked open the door to AGI, the ability to comprehend and harness AI has become critically important. It will determine whether, in the near future, you are left behind by history or stand on the shoulders of a new giant.

Article translated and adapted by "Data Storytelling in New Language"

Original source:

https://github.com/dair-ai/Prompt-Engineering-Guide

With the rise of large language models like ChatGPT, people have witnessed their extraordinary capabilities in automatically generating all kinds of text and their broad applications. However, because these models can produce inaccurate or inappropriate text, prompt engineering has emerged. Prompt engineering controls the output of large language models by designing and constructing input prompts, thereby improving the accuracy and reliability of generated text and delivering better results and experiences for various application scenarios. Thus, the background and significance of prompt engineering lies in optimizing input prompts to guide large language models toward generating more accurate, reliable, and expected output text, thereby better realizing their potential and value.

Introduction to Prompts

Prompt engineering is a relatively new discipline focused on developing and optimizing prompts for various applications and research topics to effectively leverage language models (LMs). Prompt engineering skills help us better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve LLM performance on a range of common and complex tasks, such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques for interacting with LLMs and other tools.

Basic Prompts

You can accomplish a great deal through prompting, but the quality of results depends on how much information you provide. A prompt can contain instructions or questions, along with other details such as contextual inputs or examples.

For example: to get the model to generate a description of today's sky in Shanghai, we can progressively refine the input prompt to help the model complete the task.

Prompt: Complete this sentence: The sky is

Output: The sky refers to the space above the atmosphere, typically what people see from the ground. Its color and form change with time, location, season, and weather conditions.

As shown in this example, the language model outputs a string of text related to the context "The sky is" — containing only instructions and a question, with no background information. The result is generic knowledge, far from what we wanted to achieve. This simple example highlights the need to provide more context or instructions to clarify our specific goal. Let's try a slight improvement:

Prompt: Based on today's weather, complete a sentence describing today's weather: The sky is

Output: The sky is overcast, with dark clouds gathering, and it is currently raining.

Further specifying location (without today's weather information):

Prompt: Based on today's weather in Shanghai, complete a sentence describing today's weather: The sky is

Output: The sky is overcast, with rain currently falling, and the temperature is rather cold.

Telling the LLM that today is sunny:

Prompt: Based on sunny weather in Shanghai today, complete a sentence describing today's weather: The sky is

Output: The sky is azure blue, with bright sunshine and no clouds.

We gave the model context about what we wanted it to do with the sentence completion, so the result looks better because it strictly follows what we told it to do ("complete the sentence"). This approach of designing optimal prompts to guide the model in performing tasks is known as prompt engineering.

Today's ChatGPT (also called LLM, or large language model) has become an essential tool for executing various advanced tasks. Whether for text summarization, question answering systems, conversational agents, mathematical reasoning, or code generation, ChatGPT has demonstrated remarkable capabilities and potential.

Parameter Tuning (LLM Settings)

When using prompts, you can interact with an LLM via API or directly. Several parameters can be configured to adjust the output results.

Temperature:

Temperature is a parameter used to control the randomness of text generation.

The lower the temperature value, the closer the generated text stays to the model's existing knowledge and training examples — more conservative and predictable. As the temperature value approaches 0, the generated text increasingly converges on the model's most probable output.

Conversely, at higher temperature values, generated text becomes more diverse, novel, and surprising.

This is because at high temperatures, the generation process becomes more random, and the model more readily selects and combines from multiple possible words and phrases, thereby creating more varied text.

In practice, for fact-based Q&A, we might prefer lower temperatures to encourage more factual and concise responses.

For poetry generation or other creative tasks, increasing temperature may be more beneficial.

Top_p:

A parameter used to control the diversity of generated text, which can be understood as a probability threshold. During text generation, the model first calculates the probability distribution across all possible words, then based on the Top_p setting, retains only the portion of vocabulary with the highest cumulative probability. Specifically, the model ranks all words by probability from highest to lowest, then continuously accumulates these probability values until the cumulative probability exceeds the Top_p threshold. At this point, the model retains only the highest-probability words and discards the rest. This effectively controls text diversity and prevents repetitive or stilted output. Larger Top_p values yield more diverse text, but may also reduce accuracy, as the model may consider lower-probability words when selecting vocabulary.

Standard Prompts

Standard prompts follow this format:

?

They can also be formatted in QA style, which is standard for many QA datasets, as shown below:

Q: ? A:

Building on this standard format, a popular and effective prompting technique is called "few-shot prompting," where we provide examples. Few-shot prompts can be formatted as follows:

? ? ? ?

The QA format is not mandatory. The format is flexible and not constrained to any specific structure — it entirely depends on the task at hand. For example, you can perform a simple classification task and provide examples demonstrating the task, as shown below:

prompt: This is great! // Positive This movie is wonderful // Positive The performance was terrible // Negative This comedy isn't funny //

output: This is great! // Positive This movie is wonderful! // Positive The performance was terrible! // Negative This comedy isn't funny. // Negative

Elements of a Prompt

A prompt may contain any of the following components:

  • Instruction: The specific task or instruction you want the model to perform
  • Context: External information or additional context that can guide the model toward better responses
  • Input Data: The input or question we are interested in finding a response for
  • Output Indicator: The type or format of the output.

Not all components are required in a prompt, and the format depends on the task at hand.

General Tips for Designing Prompts

Here are some techniques for designing prompts:

Start Simple

Designing prompts is an iterative process that requires extensive experimentation to achieve optimal results. Using a simple playground like ChatGPT is a good starting point.

You can begin with simple prompts and gradually add more elements and context as your goals become clearer. Version control is very important throughout this process. As you'll see in many examples below, clarity and conciseness typically yield better results.

When you have a large task involving many different subtasks, you can try breaking it down into simpler subtasks and build up as results improve. Avoid adding excessive complexity at the beginning of prompt design.

The Instruction

You can use instructions to design effective prompts for various simple tasks, indicating what you want the model to accomplish — such as "write," "classify," "summarize," "translate," "sort," and so on.

Keep in mind that you'll need to experiment extensively to see what works best. Try different instructions using different keywords, contexts, and data to find what works best for your specific use case and task. Generally, the more specific and relevant the context is to the task you're trying to perform, the better. We'll cover the importance of sampling and adding more context in an upcoming guide.

prompt: Increment Wang Xinyi's age by one: [{"name":"wangxinyi","age":30},{"name":"lufeiyang","age":28}] >

output: Note that this is a JSON array containing two objects, and you need to specify which object to modify. If we change one object to increment Wang Xinyi's age by 1, the JSON would look like this: [{"name":"wangxinyi","age":31}, {"name":"lufeiyang","age":28}]

prompt: Increment Wang Xinyi's age by one. Output only JSON, in a code block: [{"name":"wangxinyi","age":30},{"name":"lufeiyang","age":28}]

output:

[ {"name":"wangxinyi","age":31},

  {"name":"lufeiyang","age":28}]

Specificity

When using a model to perform a task, you need to be very specific and detailed in describing the instructions and tasks you want the model to execute, because the more detailed and thorough the instruction description, the better the results the model generates. This is especially important when you want a specific output or a particular style.

When designing instructions, you shouldn't rely too heavily on specific tokens or keywords; instead, you should have good formatting and detailed instructions. Providing examples is a very effective method, as it helps the model generate desired outputs that match specific formats.

At the same time, you also need to consider instruction length, as there are certain limits. The details in an instruction should be relevant and contribute to the current task. Excessive unnecessary details aren't necessarily a good approach. You need to balance specificity and detail when designing instructions. Finally, we encourage extensive experimentation and iteration to optimize instructions for your application. This is a process of continuous trial and improvement.

Prompt:

I need help training an intent recognizer. You need to identify my intent and slots (extract intent and slot information based on input). For example: "I want to analyze each school's average evaluation score and highest score" should be converted to: {"intent":["grouped aggregation"],"slots":{"grouping fields":["school"],"aggregation":[{"score":"average"},{"score":"maximum"}]}}. Output format: JSON, in a code block.

output: Here is the output with intent and slot information extracted based on the input:

prompt: Calculate average occupancy rate, maximum passenger capacity, and average passenger capacity by aircraft model. Output JSON.

output: I'm sorry, I cannot directly provide the data you need. Please provide data on aircraft models and seat information, and I will help you process and analyze it.

prompt: Help me recognize intent and convert slots to JSON: Calculate average occupancy rate, maximum passenger capacity, and average passenger capacity by aircraft model.

output:

Avoid Impreciseness

The above prompts tell us to be detailed and improve formatting. It's easy to fall into the trap of trying to be too clever in your prompts and potentially creating imprecise descriptions. Being direct and explicit is usually better. The analogy here is very similar to effective communication — the more direct, the more effective the message.

Prompt: Explain the concept of prompt engineering. Keep the explanation short, just a few sentences, not too descriptive.

Output: Prompt engineering refers to the methods and techniques of designing and adjusting input text when using natural language generation models, to help the model produce more accurate and expected outputs. It includes designing clear and explicit prompts and reasonable parameter settings, aiming to improve model performance and effectiveness.

Prompt: Explain the concept of prompt engineering to a high school student in 2-3 sentences.

Output: Prompt engineering refers to designing short statements or questions to guide AI models in generating desired outputs. These statements need to be specific and detailed to help the model understand task requirements. Prompt engineering is widely used in natural language processing, machine translation, and other fields.

Prompt: Prompt engineering is the way and skill of communicating with models, just like the way and skill people use to communicate with each other.

Output: Excellent explanation! Concise and clear, easy to understand.

Prompt: Prompt engineering is the way and skill of communicating with models, just like the way and skill people use to communicate with each other. Help me polish this.

Output: Prompt engineering refers to the methods and skills of effectively communicating with models, similar to communication styles and techniques in interpersonal interaction.

Prompt:

Explain what prompt engineering is to a high school student in 2-3 sentences.

Output:

Prompt engineering is the approach and technique used to better communicate with AI models, just like the ways and skills people use to communicate with each other, helping us more accurately obtain the results we want.

To Do or Not to Do?

When designing prompts, an important technique is to avoid describing what not to do, and instead describe what to do. This increases prompt specificity while making it easier for the model to understand and produce good responses.

Prompt:

Please help me write a short essay introducing Shanghai. It's best not to mention big city problems, severe aging issues, or insufficient innovation.

Output:

Shanghai is one of China's most international and modern cities, and also one of the country's centers for economy, finance, technology, and culture. It has a long history and cultural traditions, while also being a city full of vitality and innovative spirit.

In Shanghai, you can taste various cuisines, including traditional local dishes, various seafood, and international foods to satisfy your palate. You can also enjoy various arts and cultural activities such as concerts, theater, exhibitions, and festivals, experiencing different cultural atmospheres.

Beyond food and culture, Shanghai is also a modern city. It has an advanced transportation network including subways, highways, and airports, allowing you to conveniently reach anywhere. Additionally, Shanghai is a city of innovation and technology, with many high-tech enterprises and research institutions attracting innovative talent from around the world.

Shanghai also has many famous attractions and historical sites, such as the Shanghai Museum, the Bund, Yu Garden, and City God Temple. These attractions are not only important cultural heritage of Shanghai but also nationally famous tourist destinations.

In summary, Shanghai is a city full of vitality and charm. It will provide you with rich and colorful cultural and life experiences. Whether you are a tourist or innovative talent, you can find your sense of belonging here.

Prompt:

Please write an article introducing Qingdao, including information on: the city's geographical location, population, history, culture, tourist attractions, etc.

Output:

Qingdao is a coastal city in Shandong Province, China, located at the southeastern tip of the Shandong Peninsula on the western shore of the Yellow Sea. With a population of over nine million, it is one of China's most important ports. Qingdao has a long history dating back to the 7th century BC. Since the late 19th century, Qingdao became a German colony in China, leaving behind many exquisite buildings and cultural heritage. The most famous among these are the German Governor's House in Lu Xun Park and the Qingdao German Prison Museum. During World War II, Japan occupied Qingdao and established the famous Anti-Japanese War Memorial Hall there to commemorate China's sacrifices and victories in resisting aggression.

Qingdao has rich cultural traditions, most notably its beer culture. Tsingtao Beer is one of China's most popular beer brands, with a history dating back to the late 19th century. Every October, Qingdao hosts an international beer festival, attracting beer enthusiasts from around the world. Additionally, Qingdao has many stunning tourist attractions, such as Badaguan, Zhanqiao Pier, Xiaoyu Mountain, May Fourth Square, and Mount Lao. Among these, Badaguan is a group of scenic and historic buildings, one of Qingdao's most famous attractions. Zhanqiao Pier is a century-old wooden plank road, one of Qingdao's oldest structures. Xiaoyu Mountain is a natural park with beautiful scenery, plus many stone carvings and temples to visit. May Fourth Square is a square in downtown Qingdao, commemorating China's May Fourth Movement and also one of Qingdao's landmarks. Mount Lao is Qingdao's highest mountain, a nature reserve with magnificent scenery and long history.

In summary, Qingdao is a city with long history, unique culture, and beautiful scenery. Its history, culture, and tourist attractions make it one of China's most popular tourist destinations.

Basic Prompting Tasks

How to use well-designed prompts to perform various interesting and different classic NLP tasks. Using examples to illustrate key concepts and basic task types helps with better understanding and learning. These classic tasks include text summarization, information extraction, question answering, text classification, sentiment analysis, and more. Carefully designed prompts can provide sufficient context and guidance, helping models accurately understand task requirements and output high-quality results.

  • Text Summarization
  • Information Extraction
  • Question Answering
  • Text Classification
  • Conversation
  • Code Generation
  • Reasoning

Text Summarization

One of the standard tasks in natural language processing is text summarization. One of the most promising applications of large language models is the ability to summarize articles and concepts into quick, easy-to-read summaries.

For example, if you want to understand vitamins and see a summarized output:

Prompt: Explain vitamins.

Output: Vitamins are a class of organic compounds that serve as essential nutrients for the body's normal physiological and metabolic processes. The human body cannot produce vitamins on its own and must obtain them from food. Vitamins can be divided into two categories: water-soluble and fat-soluble, which differ in their functions and absorption mechanisms within the body. Vitamins play various important roles in the human body, such as promoting immune system health and maintaining the normal function of organs and tissues including the eyes, skin, and teeth.

Prompt: Summarize the following in one sentence: Vitamins are a class of organic compounds that serve as essential nutrients for the body's normal physiological and metabolic processes. The human body cannot produce vitamins on its own and must obtain them from food. Vitamins can be divided into two categories: water-soluble and fat-soluble, which differ in their functions and absorption mechanisms within the body. Vitamins play various important roles in the human body, such as promoting immune system health and maintaining the normal function of organs and tissues including the eyes, skin, and teeth.

Output: Vitamins are essential organic nutrients that must be obtained from food, classified as water-soluble or fat-soluble, and play various important roles in the human body.

Information Extraction

Large language models can easily and quickly perform classification and many other natural language processing (NLP) tasks. LLMs can extract key information from user prompts, such as dates, times, locations, people, organizations, and more.

Question Answering

Improving prompt formatting is one of the best ways to get specific answers from a model. In prompts, you can combine instructions, context, inputs, and output indicators for better results. While these components aren't mandatory, they're good practice because more specific guidance leads to better outcomes. Below is an example showing how a model responds to a more structured prompt.

When the instruction description isn't clear enough and the model fails to understand the input, the answer will be wrong — drawing on general knowledge rather than the information provided:

Prompt: Answer the following question based on the provided context. Keep your answer brief and clear. Context: Vitamins were discovered during research on leukemia. In the early 20th century, doctors found that certain foods contained a substance capable of preventing scurvy. They called this substance "vitamin," meaning "vital amine," because people initially believed it existed only in food. Later, scientists gradually discovered multiple vitamins and further explored their important effects on the human body. Question: What disease was being researched when vitamins were discovered?

Output: Vitamins were discovered during research on scurvy.

The answer above extracts from the general background knowledge rather than the provided context:

Prompt: Answer the following question based on the background information I provide. Keep your answer brief and clear. Background information: Vitamins were discovered during research on leukemia. In the early 20th century, doctors found that certain foods contained a substance capable of preventing scurvy. They called this substance "vitamin," meaning "vital amine," because people initially believed it existed only in food. Later, scientists gradually discovered multiple vitamins and further explored their important effects on the human body. Question: What disease was being researched when vitamins were discovered?

Output: Vitamins were discovered during research on leukemia.

Further refining the output format:

Prompt: Answer the following question based on the background information I provide. Keep your answer brief and clear, and return only the entity name: Background information: Vitamins were discovered during research on leukemia. In the early 20th century, doctors found that certain foods contained a substance capable of preventing scurvy. They called this substance "vitamin," meaning "vital amine," because people initially believed it existed only in food. Later, scientists gradually discovered multiple vitamins and further explored their important effects on the human body. Question: What disease was being researched when vitamins were discovered?

Output: Leukemia

Text Classification

As a prompt engineer, providing better instructions is only part of the job. When dealing with more complex application scenarios, instructions alone aren't enough. At this point, you need to think more about context and other available elements to improve the model's accuracy and efficiency. Beyond instructions, you can provide additional elements such as input data or examples to enhance the prompt's information density and effectiveness. Through these extra elements, the model can better understand your intent and deliver better results.

Let's demonstrate this with a text classification example.

Training the model:

Prompt: I need a classifier. Here are a few examples: This is amazing! -> {"classification":'positive'} This movie is fantastic -> {"classification":'positive'} The acting was terrible -> {"classification":'negative'} This comedy isn't funny -> {"classification":'negative'}

Output: You need a classifier that can categorize text. Here are a few examples, each with a text and its corresponding classification: "This is amazing!" -> {"classification":'positive'} "This movie is fantastic" -> {"classification":'positive'} "The acting was terrible" -> {"classification":'negative'} "This comedy isn't funny" -> {"classification":'negative'}

Testing:

Prompt: Classify the following text and output as JSON code: This movie is total garbage

Output: {"classification": "negative"}

Extended test:

Prompt: Classify the following text and output as JSON code: The garbage station has a lot of garbage

Output: {"classification": "neutral"}

Defining output format:

Prompt: Classify the following text and output as markdown JSON code: This movie is total garbage

Output: Classification result: {"classification": "negative"}

Conversation

One of the most interesting aspects of using prompt engineering is the ability to teach an LLM system how to behave, what its intent is, and what its identity is. This is particularly useful when building conversational systems like customer service chatbots. For example: through iterative refinement, gradually finding the content we need:

Text Summarization:

Prompt: Write a 30-50 word summary for When Notebook Meets ChatGPT (AIGC): A New Paradigm for Data Analysis/Data Science

Output: This article introduces how to combine Notebook with AI-generated conversational engines (AIGC) to form a new paradigm for data analysis and data science exploration. This new paradigm improves efficiency, lowers barriers to entry, and leverages AIGC's strengths to reduce internal communication costs within organizations. The article also notes the challenges this paradigm faces.

Summary with special requirements:

Prompt: Make it more vivid and upbeat, suitable for posting on social media

Output: Data analysis meets AI, and sparks fly! The fusion of Notebook and AIGC not only boosts data analysis efficiency but also lowers barriers and reduces communication costs. Challenges remain, but this is a whole new paradigm for data science!

Code Generation

One of the most effective applications of LLMs (large language models) is code generation. Copilot, Codex, and ChatGPT are excellent examples. Through prompting, various code generation tasks can be accomplished.

Telling ChatGPT metadata:

Prompt: I have two tables. 1. Table name: language, fields: ['chin', 'engl', 'index', 'school'], meaning: ['Chinese', 'English', 'student ID', 'school']; 2. Table name: science, fields: ['chem', 'index', 'math', 'phys', 'school'], meaning: ['Chemistry', 'student ID', 'Math', 'Physics', 'school']; Just remember this, no need to be verbose

Output: Got it. You have two tables with the following fields and meanings: Table name: language Fields: ['chin', 'engl', 'index', 'school'] Meanings: ['Chinese', 'English', 'student ID', 'school'] Table name: science Fields: ['chem', 'index', 'math', 'phys', 'school'] Meanings: ['Chemistry', 'student ID', 'Math', 'Physics', 'school']

Generating code:

Prompt: Query the top 5 students by total score for each school

Output: To query the top 5 students by total score for each school, we need to group by school, calculate each student's total score, then find the top 5 students per school. Here is a sample SQL statement:

SELECT school AS school,
       index AS student_id,
       chin AS chinese_score,
       engl AS english_score,
       math AS math_score,
       phys AS physics_score,
       chem AS chemistry_score,
       (chin + engl + math + phys + chem) AS total_score
FROM (
    SELECT l.school, l.index,
           l.chin, l.engl, s.math, s.phys, s.chem,
           ROW_NUMBER() OVER (PARTITION BY l.school ORDER BY (l.chin + l.engl + s.math + s.phys + s.chem) DESC) AS rank
    FROM language l
    JOIN science s ON l.index = s.index AND l.school = s.school
) t
WHERE rank <= 5;

Reasoning

Perhaps one of the most challenging tasks for current LLMs is anything requiring some form of reasoning capability. Reasoning is one of the most actively researched areas to follow up on, because LLM reasoning can give rise to all kinds of complex applications.

There has been some progress on mathematical ability. But it's worth noting that current LLMs struggle with reasoning tasks, which calls for more advanced prompting techniques. I explored some of these data reasoning topics separately in Why Can't ChatGPT Play the 24 Game?

This mathematical reasoning is wrong:

Prompt: All the odd numbers in this array add up to a prime number: 15, 32, 5, 13, 82, 7, 1.

Output: This statement is incorrect. While some of the odd numbers are prime (for example, 5 and 13), not all odd numbers added together equal a prime number. Let's add these odd numbers: 15 + 5 + 13 + 7 + 1 = 41. Therefore, this statement is incorrect because 41 is not a prime number. In fact, if you add all the numbers in this array together, the result is 175, which is also not a prime number.

Attempting to improve results by refining the prompt (adding reasoning steps):

Prompt: Is the following statement correct? Determine this through the following steps: identify odd numbers, add them together, and determine whether the result is a prime number. All the odd numbers in this array add up to a prime number: 15, 32, 5, 13, 82, 7, 1.

Output: Determining through the following steps: Identify odd numbers: In this array, the odd numbers are 15, 5, 13, 7, 1. Add them together: 15+5+13+7+1=41. Determine whether the result is a prime number: 41 is a prime number, because it is only divisible by 1 and 41 itself. Therefore, the result of adding all odd numbers in this array is a prime number.

Advanced Prompting

Clearly, improving prompts can yield better results across different tasks. This is the core idea of prompt engineering. Next, we'll cover some advanced topics:

  • Zero-shot Prompting
  • Few-shot Prompting
  • Chain-of-Thought (CoT) Prompting
  • Zero-shot CoT
  • Self-Consistency
  • Generate Knowledge Prompting
  • Automatic Prompt Engineer

Zero-shot Prompting

Today's large language models, trained on massive amounts of data and fine-tuned for instruction following, can perform zero-shot tasks. Here's one example we've used:

Prompt: Classify the text as neutral, negative, or positive. Text: I think the vacation was okay.

Completion: Classification: Neutral

Prompt: Text: I think his performance was quite good.

Completion: Classification: Positive

Prompt: Text: I think his performance was not great.

Completion: Classification: Negative

In the prompts above, we didn't provide the model with any examples — this is zero-shot capability at work. When zero-shot falls short, it's recommended to provide demonstrations or examples in the prompt. Below we'll discuss the approach called "few-shot prompting."

Few-shot Prompting

Although large language models have demonstrated remarkable zero-shot capabilities, they still have limitations on more complex tasks when using zero-shot settings. To improve on this, we use a technique called few-shot prompting, providing demonstrations in context to guide the model toward better performance. The demonstrations serve as conditioning for subsequent examples, helping the model generate responses.

Prompt: Paris is the capital of France, with famous attractions including the Eiffel Tower, the Louvre, and the Champs-Élysées. New York is a major city in the United States, with famous attractions including the Statue of Liberty, Times Square, and Central Park. Tokyo is the capital of Japan, with famous attractions including Tokyo Skytree, Senso-ji Temple, and Shinjuku Gyoen. Beijing is?

Completion: Beijing is the capital of China, with famous attractions including the Forbidden City, the Great Wall, and the Temple of Heaven.

Prompt: Shanghai is?

Completion: Shanghai is a city in China, one of the country's centers for economy, finance, trade, and transportation, with famous attractions including the Bund, the Oriental Pearl Tower, and Yu Garden.

Prompt: Imagine you are a real estate agent. Please write an advertisement to sell the following house. The house is located in a bustling commercial district, surrounded by multiple shopping centers and food streets, just a 5-minute walk from the subway station. Built in 2015, the house has three bedrooms and two bathrooms, with a spacious living room and an open-concept kitchen with a bar counter. The total area is 120 square meters, with parking space for one car.

Completion: Exquisite three-bedroom residence — the perfect location and meticulous interior design make this house your ideal living space. Located in the heart of the commercial district, with comprehensive surrounding amenities to meet all your shopping and dining needs. Built in 2015, every room features large windows with abundant natural light, creating a warm and bright living atmosphere. You'll enjoy a spacious living room and an open-concept kitchen with a bar counter, providing the perfect venue for your gatherings and entertainment. Additionally, the house comes with one parking space for your convenience. This is an opportunity you can't miss — contact us now to schedule a viewing!

The model somehow learned how to perform the task by being provided with just one example (i.e., 1-shot). For more difficult tasks, we can try increasing the number of demonstrations (e.g., 3-shot, 5-shot, 10-shot, etc.).

According to research findings from Min et al. (2022), here are several optimization tips regarding demonstrations/examples when conducting few-shot learning:

  • "Regardless of whether the labels are correct for individual inputs, the label space specified by the demonstrations and the distribution of input text both matter."
  • Even using random labels, the format you use plays a critical role in performance, which is much better than having no labels at all.
  • Additional results show that selecting random labels from the true label distribution (rather than a uniform distribution) also helps improve performance.

Limitations of Few-shot Prompting

Standard few-shot prompting works for many tasks, but it is still not a perfect technique, especially when dealing with more complex reasoning tasks. Recently, Chain-of-Thought (CoT) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models prompting has gained popularity for addressing more complex arithmetic, commonsense, and symbolic reasoning tasks.

Chain-of-Thought (CoT) Prompting

Chain-of-Thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to achieve better results on more complex tasks that require reasoning before responding.

Prompt: The odd numbers in this group add up to a prime number: 4, 8, 9, 15, 12, 2, 1. A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False. The odd numbers in this group add up to a prime number: 17, 10, 19, 4, 8, 12, 24. A: Adding all the odd numbers (17, 19) gives 36. The answer is False. The odd numbers in this group add up to a prime number: 16, 11, 14, 4, 8, 13, 24. A: Adding all the odd numbers (11, 13) gives 24. The answer is False. The odd numbers in this group add up to a prime number: 17, 9, 10, 12, 13, 4, 2. A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False. The odd numbers in this group add up to a prime number: 15, 32, 5, 13, 82, 7, 1. A:

Completion: Adding all the odd numbers (15, 5, 13, 7, 1) gives 41, which is a prime number. So the answer is True.

Solving basic logic problems through CoT:

Prompt: It takes 2 minutes to boil one egg. How long does it take to boil 5 eggs?

Completion: It takes 2 minutes to boil one egg, so the time required to boil 5 eggs is the sum of the boiling times for 5 eggs. Therefore, boiling 5 eggs takes 5 × 2 = 10 minutes. The answer is 10 minutes.

Prompt: Consider the parallel case: It takes 2 minutes to boil one egg. How long does it take to boil 5 eggs?

Completion: If we want to boil 5 eggs simultaneously, we can use a parallel approach — boiling eggs in different pots at the same time. This way, the time can be shortened because each pot only needs to boil one egg, so the time to boil 5 eggs should equal the time to boil one egg, which is 2 minutes. Therefore, boiling 5 eggs simultaneously takes 2 minutes.

Prompt: One pot can boil a maximum of 5 eggs at once. It takes 2 minutes to boil one egg. With only one pot, how long does it take to boil 10 eggs?

Completion: If we only have one pot, we can boil a maximum of 5 eggs at a time. Therefore, to boil 10 eggs, we need to boil at least twice. First boil 5 eggs, then boil another 5 eggs. Since each batch of 5 eggs takes 2 minutes, the total time for two batches is 2 × 2 = 4 minutes. Therefore, boiling 10 eggs takes 4 minutes.

Still can't handle optimization problems:

Prompt: There are two pots — a large pot that can boil 5 eggs at once and takes 5 minutes per batch, and a small pot that can boil 2 eggs at once and takes 3 minutes per batch. Now there are 8 eggs. What's the shortest time needed to boil all the eggs?

Completion: We can divide the 8 eggs into three groups — one group with 5 eggs, another with 2 eggs, and a third with just 1 egg. Using the large and small pots to boil two groups respectively, the total time needed is: The group of 5 eggs needs 1 large pot batch: 5 minutes The group of 2 eggs needs 1 small pot batch: 3 minutes The remaining 1 egg needs 1 large pot batch: 5 minutes So the total time needed is 5 + 3 + 5 = 13 minutes. The answer is 13 minutes.

Prompt: The large and small pots can boil simultaneously.

Completion: Okay, then we can use the large and small pots at the same time. First use the large pot to boil 5 eggs, which takes 5 minutes. At this point, there are still 3 eggs left to boil in the small pot. Then use the small pot to boil these 3 eggs, which takes 3 minutes. So the total time to boil all the eggs is 5 minutes + 3 minutes = 8 minutes.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — CoT is an emergent ability that appears with sufficiently large language models.

Zero-Shot CoT

The idea behind zero-shot CoT is simple: add "Let's think step by step" to the original prompt.

Prompt: I went to the market and bought 10 apples. I gave 2 apples to my neighbor and 2 to the repairman. Then I went and bought 5 more apples, and ate 1. How many apples do I have left?

Completion: You have 11 apples left. You started with 10 apples, gave 2 to your neighbor and 2 to the repairman, leaving you with 6. Then you bought 5 more, giving you 11 apples. After eating 1, you have 10 left.

Prompt: I went to the market and bought 10 apples. I gave 2 apples to my neighbor and 2 to the repairman. Then I went and bought 5 more apples, and ate 1. How many apples do I have left? Let's think step by step

Completion: I started with 10 apples. Then I gave 2 to my neighbor, so now I have 8 apples. I gave 2 more to the repairman, so now I have 6 apples. Then I bought 5 more apples, so now I have 11 apples total. Finally, I ate 1 apple, so now I have 10 apples left.

This simple prompt is remarkably effective at completing this task. It's especially useful when you don't have many examples to include in your prompt.

Self-Consistency

Of all the techniques for improving generative models, self-consistency is among the more advanced. Proposed in the paper Self-Consistency Improves Chain of Thought Reasoning in Language Models (2022), it aims to "replace the naive greedy decoding approach used in chain-of-thought prompting." The idea is to generate multiple diverse reasoning paths through few-shot chain-of-thought, then use the generated results to select the most consistent answer. This helps improve chain-of-thought prompting performance on tasks involving arithmetic and commonsense reasoning.

Prompt: When I was 6 years old, my sister was half my age. Now I'm 70. How old is my sister?

Completion: Your sister should be 64 years old now, because when you were 6, she was 3, and since you've grown to 70, she should be 6 years younger than you.

Prompt: Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today? A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted. So, they must have planted 21 - 15 = 6 trees. The answer is 6. Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now? A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so in total he has 7 + 2 = 9 toys. The answer is 9. Q: There were nine computers in the server room. Every day, five more computers were installed, from Monday to Thursday. How many computers are in the server room now? A: There are 4 days from Monday to Thursday. 5 computers were added each day. That means 4 * 5 = 20 computers were added in total. There were 9 computers to start with, so now there are 9 + 20 = 29 computers. The answer is 29. Q: Michael had 58 golf balls. On Tuesday, he lost 23 golf balls. On Wednesday, he lost 2 more. How many golf balls did he have at the end of Wednesday? A: Michael started with 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On Wednesday he lost 2 more, so now he has 35 - 2 = 33 balls. The answer is 33. Q: When I was 6 years old, my sister was half my age. Now I'm 70. How old is my sister? A:

Completion: If you are 70 now, your sister should be 35 now. Because when you were 6, she was half your age, which is 3. And from then on, the age gap between you has always remained at 3 years, so when you are 70, your sister should be 70 - 3 = 67. But this answer doesn't seem logical, because from the information provided in the question, the ratio between these two ages is 1:2, but this answer gives a ratio of about 1:1.9. So there may be errors or missing information.

Generate Knowledge Prompting

LLMs — large language models — are constantly being improved, and one popular technique is the ability to integrate knowledge or information to help the model make more accurate predictions.

Using a similar line of thinking, could the model also generate knowledge before making predictions? This is what the paper Generated Knowledge Prompting for Commonsense Reasoning attempts — generating knowledge to include as part of the prompt. Specifically, how much does this help with tasks like commonsense reasoning?

Automatic Prompt Engineer (APE)

Large Language Models Are Human-Level Prompt Engineers proposes Automatic Prompt Engineer (APE), a framework for automatic instruction generation and selection. The instruction generation problem is framed as natural language synthesis, treated as a black-box optimization problem using LLMs to generate and search for candidate solutions.

The first step involves a large language model (acting as an inference model) that, given output demonstrations, generates candidate instructions for the task. These candidate solutions guide the search process. Instructions are executed using the target model, and the most suitable instruction is selected based on computed evaluation scores.

APE discovered zero-shot CoT prompts that outperform the human-designed "Let's think step by step" prompt (Kojima et al., 2022). The prompt "Let's work through this problem step by step to make sure we have the right answer." elicits chain-of-thought reasoning and improves performance on the MultiArith and GSM8K benchmarks:

This touches on an important topic related to prompt engineering: the idea of automatically optimizing prompts. While we won't explore this topic in depth in this guide, here are some key papers if you're interested:

  • AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts — proposes a gradient-guided search method to automatically create prompts for various tasks.
  • Prefix-Tuning: Optimizing Continuous Prompts for Generation — a lightweight alternative to fine-tuning that adds trainable continuous prefixes for NLG tasks.
  • The Power of Scale for Parameter-Efficient Prompt Tuning — proposes a mechanism for learning soft prompts via backpropagation.

5Y Capital finds, supports, and inspires lone entrepreneurs, providing them with support from the spiritual to the operational. We believe that if the "crazy" you in others' eyes starts to be believed in, the world will become a different place.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG

WWW.5YCAP.COM