Six-person AI startup Poetiq just officially claimed the top spot on the ARC-AGI-2 reasoning

POETIQ

🏆 Poetiq tops ARC-AGI-2 with Gemini variant

Image source: Poetiq

The Rundown: Six-person AI startup Poetiq just officially claimed the top spot on the ARC-AGI-2 reasoning benchmark, beating out Google’s Gemini 3 Deep Think at half the cost by orchestrating existing models over building its own.

The details:

  • Poetiq’s meta-system adapts to new models within hours, achieving the top-ranked results shortly after Gemini 3 launched without any retraining.
  • Using Gemini 3 Pro as a base, Poetiq’s refinement system scored 54% at $30 per task — outpacing Google’s top variant Deep Think at 45% and $77.
  • The result marks the first system to crack the 50% barrier on ARC-AGI-2, with leading models previously struggling to hit 5% just six months ago.
  • The startup’s open-sourced approach uses LLMs to continuously refine their own outputs, with a built-in self-auditing system to ensure quality solutions.

Why it matters: The ARC-AGI-2 progress from sub-5% to over 50% in just months shows how quickly things are advancing. Poetiq’s refinement shows a future with AI gains coming from two directions at once: frontier model development and clever orchestration built on top of them from teams without massive compute budgets.

Leave a comment