Commonsense Reasoning¶
Commonsense Reasoning Overview¶
Commonsense reasoning enables agents to make natural, obvious inferences about the world — the kind humans do effortlessly. Example: if asked to “find the weather outside,” a robot should not jump out a window. The robot needs commonsense knowledge that jumping from a height causes injury.
Key ideas:
A small set of primitive actions (only 14) organizes vast amounts of knowledge
Primitive actions compose into hierarchies of sub-actions
Actions cause state changes, enabling prediction of consequences
The Problem of Multiple Meanings¶
Two complementary challenges in understanding verbs:
One word, many meanings (polysemy): “eat” can mean consume food, erode (“river ate the riverbank”), obsess (“eating him”), absorb losses (“eat the losses”)
Many words, one meaning (synonymy): “ate,” “devoured,” “consumed,” “ingested,” “partook,” “dined on” all convey the same basic event
Additionally, sentences with different structure can share meaning: “Bob shot Bill” ≡ “Bob killed Bill with a gun.”
Primitive Actions¶
Rather than maintaining separate frames for every verb, we map verbs to a small set of primitive actions that capture deep meaning:
Move-object (propel) — physically moving something
Ingest — taking something into the body
Expel — removing something from the body
Move-body-part — moving a part of one’s own body
Grasp — grabbing/holding an object
Speak — communicating verbally
Attend — focusing sensory attention
Think — cognitive processing
Feel — experiencing emotion
Do — generic action (used when specific primitive is unclear)
(Plus a few more, totaling ~14)
All synonymous verbs for eating (ate, devoured, consumed, ingested, partook) map to the single primitive action Ingest. The frame for Ingest specifies: agent, object, initial state (object outside body), final state (object inside body, object dead, agent happy via defaults).
Mapping examples:
“John pushed the cart” → Propel (agent in contact with object)
“John took the book from Mary” → Move-object (transfer of possession)
“John ate ice cream with a spoon” → Ingest
“John decided to go to the store” → Think (decision) + Move-object (movement)
Thematic Roles and Primitive Actions¶
Processing a sentence like “John pushed the cart”:
Bottom-up: encounter verb “pushed” → probe memory → retrieve frame for Propel
Top-down: frame specifies expected slots (agent, object) with rules for extraction
Rule for agent: “if a concept before the verb is animate → fill agent slot” → John
Rule for object: “if a concept after the verb is inanimate → fill object slot” → cart
The frame acts as a hook — once activated, it generates expectations and guides slot-filling. Processing combines bottom-up (data-driven) initiation with top-down (knowledge-driven) completion.
If the wrong frame is selected, difficulty in filling slots signals the need to abandon and try another. In longer stories, context from surrounding sentences resolves ambiguity.
Implied Actions¶
Some verbs don’t map directly to primitives. “John fertilized the field” implies “John put fertilizer on the field” — the implied action Move-object (put) is the true primitive.
“Bill shot Bob” implies “Bill propelled a bullet into Bob” — the primitive is Propel, with agent=Bill, object=bullet, destination=Bob.
When a verb doesn’t map cleanly, the agent must infer implied actions that do map to primitives. This itself is commonsense reasoning.
Actions and Sub-actions¶
Primitive actions can decompose into sub-action sequences. For “Ashok puts the wedge on the block,” the Move-object action decomposes into:
Grasp the wedge
Move-body-part (arm) to position above block
Release (un-grasp) the wedge
This hierarchical decomposition parallels hierarchical planning — complex actions break into simpler constituents.
State Changes¶
Actions produce state changes. Frame representations capture both the action and resulting states:
“Susan comforted Jing” → Action: Do(Susan, something) → State change: Jing’s mood becomes happy
“Ashok enjoyed eating a frog” → Two frames: Ingest(agent=Ashok, object=frog) + Feel(agent=Ashok, object=enjoyment), connected by a “result” link
For sentences with multiple verbs (e.g., “Maria told Ben to throw the ball”):
Frame 1: Speak(agent=Maria, result=Frame 2)
Frame 2: Propel(agent=Ben, object=ball)
The “result” slot connects action frames, enabling multi-verb sentence understanding.
Cognitive Connection¶
Commonsense reasoning is central to cognition but not fully understood. We use goals, context, and world models to decide what actions are reasonable. This extends beyond physical actions to the social world — theory of mind allows us to ascribe goals, beliefs, and desires to others, enabling commonsense social inferences.
Context is critical: it disambiguates polysemous words (one word, many meanings) and identifies synonymous expressions (many words, one meaning). Commonsense reasoning provides the formal structure to interpret the world, enabling agents to predict effects of actions and reason about causes of observed states.