LLM Output "Human Consensus" good, but...
As large language models (LLMs) like GPT become more prevalent in our daily lives, it’s essential to understand their nature and limitations. One way to frame their outputs is through the lens of "human consensus"—knowledge and understanding widely accepted by society and embedded in the datasets these models are trained on. However, human consensus exists at different levels, and not all forms of knowledge fit neatly into this framework. Recognizing these levels and limitations is crucial for making the best use of LLMs and understanding where they fall short.
Levels of Human Consensus in LLM Outputs
LLMs can be seen as navigating various "layers" of human consensus, each one representing a deeper or more specialized form of knowledge. In a sense, these layers move from the basics we learn early in life to more advanced reasoning that professionals or researchers acquire. Here’s a breakdown of the consensus levels LLMs typically output, as illustrated in the chart:
Natural Language Consensus (2022): At the most fundamental level, LLMs can understand and produce natural language, following basic grammar, vocabulary, and structures. This level corresponds to what an average person might learn in middle school.
Evaluation Consensus (2023): As LLMs develop, they learn to make basic evaluations or judgments, reflecting general societal values or accepted norms. This ability is rooted in widely shared opinions and cultural standards.
Moral Consensus (2023): LLMs are also trained on datasets that include ethical or moral judgments. While often complex and culturally specific, LLMs can replicate general principles of right and wrong commonly held in many societies.
Artistic Consensus (2023): By this level, LLMs can discuss art and culture, capturing aesthetic opinions and trends. Though art is subjective, certain popular interpretations and stylistic trends form a cultural consensus that LLMs can mimic.
Introductory Knowledge Consensus (2024): LLMs progress to introductory knowledge within specialized fields, similar to what is covered in undergraduate courses. This knowledge allows them to handle fundamental questions across diverse domains, though without deep expertise.
Professional Knowledge Consensus (2025): Moving to more complex levels, LLMs begin to emulate professional-level insights, drawing on the language and knowledge of experts. While still consensus-driven, this knowledge represents the technical understanding typical of trained professionals.
Logic Consensus (2025): At the highest level, LLMs can engage in logical reasoning, making inferences based on patterns within their training data. This “logic consensus” allows for a form of structured, reasoned output similar to the analytic skills developed at advanced academic levels.
These levels reflect the expanding capabilities of LLMs to replicate what is broadly accepted within human knowledge. They excel in areas where society has established a widely shared understanding, enabling LLMs to generate responses that align with common beliefs and knowledge structures.
Limitations: What Falls Outside Human Consensus?
While LLMs can simulate a broad array of human knowledge, there are areas where consensus doesn’t exist, or where knowledge is too specialized or context-dependent for these models to handle effectively. Here are some examples where LLMs struggle:
Complex Workflows with High Interdependency: Scenarios involving multiple individuals with unique contributions and interdependent roles can be challenging to describe accurately in natural language. For example, planning a large collaborative project with specific tasks and decision points for each participant often requires a visual flowchart rather than a purely textual description.
Technical Engineering Diagrams and Spatial Designs: Certain fields, like engineering or architecture, depend heavily on spatial layouts and symbols—circuit diagrams, CAD models, and architectural plans are difficult to translate into natural language and require precise, non-verbal representation.
Scientific Simulations and Complex Calculations: Scientific research often relies on simulations, calculations, and precise mathematical models that go beyond LLM capabilities. These tasks require exact algorithms and cannot be fully represented by language alone.
Dynamic Software Architectures: Large-scale software development requires handling intricate architectures, dependencies, and modular interactions that are better represented with code and architectural diagrams than with text-based explanations. LLMs struggle to maintain coherence across complex software structures.
Subjective Sensory and Aesthetic Experiences: Describing sensory details or subjective experiences—like the texture of a fabric or the nuances of a musical composition—is challenging for LLMs. These experiences often have personal or cultural variations that cannot be easily captured through text.
Why Recognizing These Boundaries Matters
LLMs are powerful tools for replicating human consensus and reflecting widely shared knowledge. But for tasks that demand a high degree of precision, customization, or interdisciplinary expertise, LLMs reach their limits. Visual tools, specialized software, and expert human intervention remain essential for tasks that don’t fit the consensus model.
Understanding the boundaries of LLMs helps us set realistic expectations and ensures that we use these models where they are most effective. Future advancements might improve LLMs’ abilities in specialized fields, but acknowledging their limitations will enable us to leverage them as powerful assistants rather than all-encompassing solutions.
In short, LLMs offer a remarkable way to access human consensus across knowledge levels, yet they remind us that not all knowledge fits neatly within shared understanding. For now, we must balance LLMs with other tools and human expertise to address the complex, unique, and often ambiguous areas that make human intelligence so distinctive.
“人类共识”,我先提出的,纪念一下😝