Bolt's Take | When We Talk About AI Applications, What Exactly Do We Mean?

线性资本·June 12, 2024·10·0

Over the past three months, we've looked at more than 100 AI-related applications. We'd like to share some of our perspectives on building AI applications. These views are based solely on what we know today — they may well contain errors, and are offered for reference only.

Over the past three months, we've looked at more than 100 AI applications. We'd like to share some perspectives on building AI applications. These views are based on what we know today, so they inevitably contain errors and are for reference only.

Models

For application developers, the most important thing is to turn advances in base model capabilities into advantages for yourself. When developing applications, if most of your time is spent using engineering methods to compensate for today's model limitations, you should ask yourself: if models get stronger tomorrow, will the work I'm doing still matter? This is an easily misunderstood point. If you don't think about which directions models are more likely to improve in, you'll conclude that nothing you do matters.
The most capable frontier models won't be open source, and will concentrate in a tiny number of teams. But open-source models are already good enough for the vast majority of applications, and will continue to get better and cheaper. In fact, there's no need for 400B+ models to be open source, because most people can't afford to run them anyway. A common path for application developers: start quickly with cloud-hosted models, then consider migrating to local open-source or customized models once you reach a certain scale.

Data

Data is forever at the core of AI. Large models have seen massive amounts of data during training. As an application developer, what kinds of data can become a moat?

Domain-specific data: This is an old chestnut. Such data often differs greatly from the distribution of large model training data, both in content and form. The key is choosing how to use this data based on different scenarios. Typically, data whose content is distinctive gets used for fine-tuning; data whose form is distinctive gets used for alignment; data meant to ensure accuracy gets used for RAG.
Knowhow data: Process documentation, best practices from experiments, and so on. Previously, developers had to digest and internalize this knowhow to embed it into applications for better results. One change that large models bring is that this data can now be directly or indirectly consumed by models—whether to help the model plan and reason, or to generate the domain-specific data mentioned above.
Feedback data: Record as much feedback as possible within your application. This feedback can be used to continue training models or training agents. In fact, whether an AI application team can build lasting advantages in effectiveness often depends on how efficiently they can utilize this feedback data.
- Explicit feedback: In some scenarios, this is direct results on task success or failure. In others, it comes from user evaluations of model outputs.
- Implicit feedback: Whether the user uses the model's output, or whether they continue. In a storytelling scenario, continuing is positive feedback; in a Q&A scenario, re-asking the question is negative feedback.
- Intermediate feedback: If it's a multi-step process, try to find feedback signals at each intermediate step. Isolating problems always makes them easier to optimize.
Prompts themselves: All prompts, model versions, and corresponding model outputs should be logged and managed. AI applications still follow certain principles of software engineering. Prompts are part of your code, and should live in your version control system and testing processes.

Human-Machine Interaction (HMI)

HMI forever seeks balance between usability (efficiency) and accuracy (effectiveness). AI's comprehension capabilities have substantially raised the ceiling of this balance. A few years ago, entering personal information required filling out a complex form; today you just need a photo of your ID card or a standard resume. This is the improvement in interaction efficiency brought by machine comprehension.
The fundamental principles of AI application interaction haven't changed. For any given problem, the optimal interaction is always the simplest one that satisfies accuracy requirements.
If accuracy requirements can be met, natural language input is the most attractive input method in this new era—not only because of its extremely low barrier to entry, but because it opens up new application scenarios on new devices.
But returning to the desktop, natural language interaction is often merely supplementary. Sometimes it's for usability reasons—voice input is simply too exhausting. Sometimes it's for accuracy reasons, such as pixel-level image editing.
Often, a substantial leap in interaction itself can redefine an application.

Application-Model Interaction (AMI)

This is a new problem, and an interesting one. Application-model interaction resembles human interaction in many ways. The most important similarity: in this interaction, the model is the subject, and the application software is the object. Therefore, it is the application software (and sometimes the entire IT system) that provides suitable interaction to adapt to the AI. (Enhancing the model itself to gain stronger interaction capabilities is another direction, somewhat analogous to training a person to better use tools.)
Unlike HMI, AMI's primary goal is effectiveness—that is, ensuring the quality of model outputs. Today this may include improving the model's problem-solving capabilities, and ensuring its accuracy and stability. This goal has many measurable objective criteria, and therefore can be continuously optimized.
AI-application interaction, from a channel perspective, is just prompts. But it involves many dimensions. Overall it's similar to HMI: giving AI data, when to give it, what data to give, in what form—different choices can produce vastly different results. Thinking carefully, each of the following resembles human interaction quite closely. Interaction content: For example, RAG—given a question, how much context is most appropriate? These questions determine specific principles for embedding chunking, retrieval, and other operations.
- Interaction format: Much prompt engineering research discusses this. Generally, the format of received content can substantially affect large model performance. Models prefer reading clearly structured content—bullet points, JSON, Markdown, even code. For slightly complex reasoning problems, natural language descriptions may confuse the model, while a segment of pseudocode might be clearly understood. Of course, much of this content cleaning work can itself be done by models.
- Interaction modality: Sometimes the most appropriate input modality isn't text. For a model with visual capabilities, the most suitable input for a web information extraction task might be passing a screenshot of the webpage directly to the model—far more effective than feeding the entire page's HTML code as text.
Much of what models receive as input actually comes from external systems, so these external systems need some adaptation to provide suitable information to AI: For example, only by providing detailed descriptions of external APIs can models choose to call them well. Thus providing these detailed descriptions through comments is work that external systems must do to adapt to AI. (Design documents could also be used, but they struggle to stay consistent with actual API implementations.) Many of today's systems weren't designed with AI usage in mind, so there's substantial adaptation work to be done in the future—perhaps with opportunities for AI applications or infrastructure to emerge.

This essay has only scattered coverage of a small slice of relevant topics; many more questions deserve future discussion, such as how AI interacts with the external world on existing and future devices (but that's a large topic, perhaps for another time). Finally, this is an era of rapidly advancing technology. Every few months we see breakthrough changes that require revising many views. So we will keep thinking, and hope to share more here with everyone.

Linear Bolt Bolt is Linear Capital's dedicated investment program for early-stage, globally-oriented AI applications. It upholds Linear's investment philosophy and principles, focusing on technology-driven transformative projects, and aims to help founders find the shortest path to their goals. Whether in speed of action or investment approach, Bolt's commitment is lighter, faster, and more flexible.