When you condition an LLM agent on a personality like "you are highly agreeable and conscientious" or a rich life story about a cooperative schoolteacher from rural Vermont, does that actually change how it behaves in a social dilemma? Or does it just change how it talks about behaving? (Note: I expand on this question further in the notes section, where I also reflect on the Stanford paper Generative Agents: Interactive Simulacra of Human Behavior by Park et al. and share my thoughts after reading it.)
The Say-do gap: the distance between an agent's stated traits (what it "says" it is) and its observed behavior (what it "does") across social contexts. In human psychology, personality traits measured by instruments like the Big Five Inventory are known to predict behavior, but imperfectly, and with heavy moderation by context. The question is whether this holds for LLM agents, and whether the format of personality conditioning (trait scores vs. narrative backstory vs. both) matters for behavioral fidelity.
I built a three-stage pipeline in this project that generates calibrated synthetic populations, runs them through multi-agent social dilemmas, and analyzes whether Big Five traits actually predict cooperative, strategic, and prosocial behavior in simulation.
I. Population Generation
The first stage produces a synthetic population where each agent has both a target personality profile and a rich narrative identity, with measurable calibration between the two.
Big Five Trait Sampling
Each agent is assigned a target trait vector over the Big Five dimensions: Openness (), Conscientiousness (), Extraversion (), Agreeableness (), and Neuroticism (). Targets are sampled to cover the trait space (not clustered around population means) so the resulting population spans the full range of personality configurations.
Life Story Generation
For each target vector , the LLM generates a persona life story, which is a paragraph-length narrative biography consistent with those traits. A highly agreeable, low-neuroticism profile might yield a community organizer who mediates neighborhood disputes; a low-agreeableness, high-openness profile might produce an itinerant documentary filmmaker who alienates collaborators.
Calibration via BFI Self-Report
The generated persona is then administered the BFI questionnaire through the LLM and the agent answers the standard 44-item inventory in character. Responses are scored to produce a measured trait vector . The calibration gap quantifies how well the life story actually encodes the intended personality. Personas with large calibration gaps can be regenerated or flagged. This ensured the population entering simulation has known and verified trait profiles.
This matters because it separates two potential failure modes. The LLM might fail to write a story consistent with the target traits, or it might fail to embody a story consistently when acting in character. The calibration step catches the first failure before it contaminates downstream results.
II. Social Simulation
The second stage runs a factorial experiment: three personality conditioning formats crossed with three social scenarios.
Conditioning Formats
| Format | Description |
|---|---|
| Life story only | The agent's system prompt contains the narrative biography but no explicit trait scores. The agent must derive its behavioral tendencies from the story. |
| BFI only | The system prompt contains the five trait scores (e.g., "Agreeableness: 4.2/5, Neuroticism: 1.8/5") with no narrative context. The agent must interpret abstract numbers as behavioral dispositions. |
| Life story + BFI | Both the narrative and the scores are provided. This tests whether redundant personality information (the same traits expressed in two formats) produces more consistent behavior than either alone. |
Social Scenarios
Each scenario is a multi-agent, multi-round interaction designed to elicit different facets of social behavior.
Public goods game: The classic contribution dilemma. Each agent receives an endowment and decides how much to contribute to a shared pool. The pool is multiplied by a factor and split equally. Individual rationality says contribute nothing; collective welfare says contribute everything. The tension between cooperation and free-riding makes this a clean test of agreeableness and prosociality.
Contribution share for agent in round :
where is the contribution and is the endowment. Payoff:
Negotiation: A fairness and bargaining scenario. Agents must divide a resource under asymmetric information or power. Tests whether stated agreeableness translates into fair offers or whether agents optimize regardless of persona.
Collaboration: A consensus-building task where agents must converge on a shared decision. Tests leadership emergence, constructive engagement, and whether high-extraversion agents actually drive group dynamics or just talk more.
Agent Architecture
Each agent wraps an LLM (Gemini by default, via LiteLLM) with a prompt/memory integration layer. At every step of the process, the agent receives the current game state, its own history of actions and observations, and its personality conditioning. It produces a natural-language reasoning trace and a structured action. Memory accumulates across rounds so agents can develop strategies, build (or lose) trust, and respond to other agents' behavior over time.
III. Analysis Pipeline
The third stage extracts behavioral and qualitative metrics from simulation transcripts and tests whether personality predicts behavior.
LLM-as-Judge Evaluation
An independent LLM evaluator scores each agent's behavior on five qualitative dimensions (fairness, cooperative intent, strategic constructiveness, leadership, and toxicity). Bias calibration is applied to control for evaluator tendencies. The judge sees anonymized transcripts with no access to the agent's personality conditioning, preventing trait-label leakage into qualitative scores.
Regression Models
The core analysis fits a sequence of increasingly specified regression models on both behavioral outcomes (contribution share, fairness of offers) and judged outcomes (cooperative intent, leadership scores):
Model 1 (Agreeableness only). A baseline testing the single trait most theoretically linked to cooperation:
Model 2 (Full BFI). All five traits as predictors:
Model 3 (Full specification). BFI traits plus conditioning format indicators and scenario fixed effects, testing whether the way personality is communicated moderates its behavioral impact:
where and are indicator vectors for conditioning format and scenario. If conditioning format matters, will be significant. The same personality produces different behavior depending on whether it was expressed as a story, as numbers, or both.
Behavioral Archetypes
Beyond trait-level analysis, I clustered agents by their full behavioral profiles using -SNE for dimensionality reduction and silhouette-score-based model selection for cluster count. This produces an archetype landscape, emergent behavioral types (e.g., "consistent cooperators," "strategic defectors," "conditional reciprocators") that may or may not align with the Big Five dimensions used to generate the population.
The interesting case is when archetypes cut across trait profiles, ie. when a high-agreeableness agent and a low-agreeableness agent end up in the same behavioral cluster because the scenario or conditioning format overrode the trait signal.
IV. Stack
| Component | Tool |
|---|---|
| LLM access | LiteLLM (Gemini default) |
| Configuration | Hydra (config groups: data, population, agent, simulation, analysis, calibration, output) |
| Agent framework | custom base + LLM agent with prompt/memory integration |
| Evaluation | LLM-as-judge with bias calibration, regression via statsmodels |
| Clustering | -SNE + silhouette model selection via scikit-learn |