Vision / Visual Counting

Fruit Counting From Image

In the image task, the model counts bananas, apples, and oranges, even when some fruit overlap. It has to identify each fruit type and count carefully instead of giving a rough description.

Prompt

How many fruits of each type are in the image?

Bananas, apples, and oranges arranged on a wooden table.
Max tokens
100K
Reasoning effort
High
Execution
Single-shot via API

Fortytwo Prime

Fortytwo

PASS4 / 4

Based on the image, here is the count of each fruit type:
• Apples: 8
• Bananas: 7 (3 in the bunch on the left, 4 in the bunch on the right)
• Oranges: 5
Total: 20 fruits

Fortytwo identifies apples, bananas, and oranges, including the two banana bunches, and returns the exact answer-key counts.

vs
Grok 4.20xAI
FAIL1 / 4

9 apples, 6 bananas, and 6 oranges.

Breakdown:

  • Bananas: 3 in the left bunch + 3 in the right bunch = 6.
  • Apples: 2 in the top row + 4 in the middle row + 3 in the bottom row (including the red-and-yellow one and those with leaves) = 9.
  • Oranges: 2 in the upper-right area + 1 mid-right + 1 lower-center + 1 bottom-right + 2 at bottom-left (one with leaves) = 6.

Grok 4.20 identifies apples, bananas, and oranges, but overcounts apples and oranges while undercounting bananas.

Model verdicts