๐Ÿ”ฌ Deep Researcher with Test-Time Diffusion (Google Cloud;2025) ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

August 22, 2025 6 minutes

์ธ๊ฐ„์˜ ์—ฐ๊ตฌ ๊ณผ์ •์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ Diffusion ๋ฐฉ์‹์œผ๋กœ ๋ฐœ์ „ํ•œ AI Deep Research Agent

Test-Time Diffusion Deep Researcher; TTD-DR

๋…ผ๋ฌธ ๊ฐœ์š”

  • ์ œ๋ชฉ: Deep Researcher with Test-Time Diffusion
  • ์ €์ž: Rujun Han, Yanfei Chen ์™ธ (Google Cloud AI Research, Google Cloud)
  • ๋ฐœํ‘œ: 2025๋…„ 7์›”, arXiv:2507.16075v1
  • ๋ถ„์•ผ: AI Research Agents Large Language Models Test-Time Scaling

๋…ผ๋ฌธ ์š”์•ฝ

Google Cloud AI ResearchํŒ€์ด ์ธ๊ฐ„์˜ ์—ฐ๊ตฌ ๊ณผ์ •์„ ๋ชจ๋ฐฉํ•œ ํ˜์‹ ์ ์ธ AI ์—ฐ๊ตฌ ์—์ด์ „ํŠธ TT**D-DR(Test-Time Diffusion Deep Researcher)**๋ฅผ ๋ฐœํ‘œํ–ˆ๋‹ค. ๊ธฐ์กด Deep Research ์—์ด์ „ํŠธ๋“ค์ด ๋ณต์žกํ•œ ์žฅ๋ฌธ ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ ์ƒ์„ฑ์—์„œ ๋ณด์ด๋Š” ์„ฑ๋Šฅ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, ์ธ๊ฐ„์˜ ๊ณ„ํš-์ดˆ์•ˆ-์ˆ˜์ • ๊ณผ์ •์„ diffusion ํ”„๋กœ์„ธ์Šค๋กœ ๋ชจ๋ธ๋งํ•œ ๊ฒƒ์ด ํ•ต์‹ฌ์ด๋‹ค. ์ดˆ๊ธฐ “๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š”” ์ดˆ์•ˆ์„ ์ ์ง„์ ์œผ๋กœ ์ •์ œํ•˜๋Š” denoising ๊ณผ์ •๊ณผ ๊ฐ ๊ตฌ์„ฑ์š”์†Œ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ตœ์ ํ™”ํ•˜๋Š” self-evolution ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฒฐํ•ฉํ–ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ OpenAI Deep Research ๋Œ€๋น„ 69.1%~74.5%๋กœ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ๊ธฐ์กด ์—ฐ๊ตฌ ์—์ด์ „ํŠธ๋“ค์„ ํฌ๊ฒŒ ๋›ฐ์–ด๋„˜๋Š” ์„ฑ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค.

AI๊ฐ€ ์ง„์งœ ์—ฐ๊ตฌ์ž์ฒ˜๋Ÿผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์„๊นŒ?

Image

์ตœ๊ทผ ChatGPT, Claude ๊ฐ™์€ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ๋“ค์ด ๋‹จ์ˆœํ•œ ์งˆ๋ฌธ๋‹ต๋ณ€์„ ๋„˜์–ด ๋ณต์žกํ•œ ์—ฐ๊ตฌ ์ž‘์—…๊นŒ์ง€ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ํ•œ๊ณ„๊ฐ€ ๋ช…ํ™•ํ•˜๋‹ค. ํŠนํžˆ Deep Research ์ž‘์—…-์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ ์ •๋ณด ์ˆ˜์ง‘, ๋ถ„์„, ์ข…ํ•ฉ์ด ํ•„์š”ํ•œ ๊ณ ์ฐจ์›์  ์—ฐ๊ตฌ-์—์„œ๋Š” ์„ฑ๋Šฅ์ด ๊ธ‰๊ฒฉํžˆ ๋–จ์–ด์ง„๋‹ค.

๋ฌธ์ œ์˜ ํ•ต์‹ฌ์€ ๊ธฐ์กด AI ์—ฐ๊ตฌ ์—์ด์ „ํŠธ๋“ค์ด ์ธ๊ฐ„์˜ ์‹ค์ œ ์—ฐ๊ตฌ ๊ณผ์ •๊ณผ๋Š” ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค. ์ธ๊ฐ„ ์—ฐ๊ตฌ์ž๋Š” ์„ ํ˜•์ ์œผ๋กœ ์ฒซ ๋ฌธ์žฅ๋ถ€ํ„ฐ ๋งˆ์ง€๋ง‰ ๋ฌธ์žฅ๊นŒ์ง€ ์ฐจ๋ก€๋Œ€๋กœ ์“ฐ์ง€ ์•Š๋Š”๋‹ค. ๋Œ€์‹  ์ „์ฒด์ ์ธ ๊ณ„ํš์„ ์„ธ์šฐ๊ณ , ์ดˆ์•ˆ์„ ์ž‘์„ฑํ•œ ๋’ค, ์ถ”๊ฐ€ ์ž๋ฃŒ๋ฅผ ์ฐพ์•„๊ฐ€๋ฉฐ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜์ •ํ•ด๋‚˜๊ฐ„๋‹ค.

Google์˜ ์—ฐ๊ตฌํŒ€์€ ๋ฐ”๋กœ ์ด ์ ์— ์ฃผ๋ชฉํ–ˆ๋‹ค. ๊ณผ์—ฐ AI๋„ ์ธ๊ฐ„์ฒ˜๋Ÿผ “์ƒ๊ฐํ•˜๊ณ , ์ดˆ์•ˆ์„ ์“ฐ๊ณ , ์ˆ˜์ •ํ•˜๋Š”” ๋ฐฉ์‹์œผ๋กœ ์—ฐ๊ตฌํ•  ์ˆ˜ ์žˆ์„๊นŒ?

๋ฐฐ๊ฒฝ ๋ฐ ๋ฌธ์ œ ์ •์˜

ํ˜„์žฌ Deep Research ์—์ด์ „ํŠธ์˜ ํ•œ๊ณ„

Image

OpenAI Deep Research, Perplexity Deep Research, Grok DeepSearch ๋“ฑ ๊ธฐ์กด์˜ Deep Research Agents์€ ๋Œ€๋ถ€๋ถ„ Chain-of-Thought, Monte Carlo Tree Search, self-refinement ๊ฐ™์€ test-time scaling ๊ธฐ๋ฒ•๋“ค์„ ์กฐํ•ฉํ•ด์„œ ๋งŒ๋“ค์–ด์กŒ๋‹ค.

์ด๋Ÿฌํ•œ ์ ‘๊ทผ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ์ ๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค:

  1. liner ๋˜๋Š” parallelized ๋ฐฉ์‹: ๊ณ„ํš โ†’ ๊ฒ€์ƒ‰ โ†’ ์ƒ์„ฑ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌ
  2. global context ์†์‹ค: ๊ธด ์—ฐ๊ตฌ ๊ณผ์ •์—์„œ ์ดˆ๊ธฐ ์ •๋ณด๊ฐ€ ์†์‹ค๋˜๊ฑฐ๋‚˜ ์ผ๊ด€์„ฑ์ด ๋–จ์–ด์ง
  3. ์ธ๊ฐ„ ์ธ์ง€ ๊ณผ์ •๊ณผ์˜ ๊ดด๋ฆฌ: ์‹ค์ œ ์—ฐ๊ตฌ์ž์˜ ์ž‘์—… ๋ฐฉ์‹์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•จ, ๋ณต์žกํ•œ long-form ๋ณด๊ณ ์„œ ์ƒ์„ฑ ์‹œ ์„ฑ๋Šฅ ์ •์ฒด

์ธ๊ฐ„์˜ ์—ฐ๊ตฌ ๊ณผ์ • vs Diffusion

์ธ์ง€๊ณผํ•™ ์—ฐ๊ตฌ์— ๋”ฐ๋ฅด๋ฉด ์ธ๊ฐ„์ด ๋ณต์žกํ•œ ์ฃผ์ œ์— ๋Œ€ํ•ด ๊ธ€์„ ์“ธ ๋•Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŒจํ„ด์„ ๋ณด์ธ๋‹ค:

  • High-level planning (์ „์ฒด ๊ตฌ์กฐ ์„ค๊ณ„)
  • Draft writing (์ดˆ์•ˆ ์ž‘์„ฑ)
  • Multiple revision cycles (๋ฐ˜๋ณต์  ์ˆ˜์ •)
  • Literature search during revision (์ˆ˜์ • ๊ณผ์ •์—์„œ์˜ ์ถ”๊ฐ€ ์ž๋ฃŒ ์ˆ˜์ง‘)

์ด๋Š” diffusion model์ด noise๊ฐ€ ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ ์ง„์ ์œผ๋กœ ์ •์ œํ•ด๋‚˜๊ฐ€๋Š” ๊ณผ์ • (denoising)๊ณผ ๋งค์šฐ ์œ ์‚ฌํ•˜๋‹ค.

Diffusion์ธ๊ฐ„์˜ ์—ฐ๊ตฌ ๊ณผ์ •
noise๊ฐ€ ๋งŽ์€ ์ดˆ๊ธฐ ์ด๋ฏธ์ง€๋ถˆ์™„์ „ํ•œ ์ดˆ๊ธฐ ์ดˆ์•ˆ
Denoising ๊ณผ์ •๋ฐ˜๋ณต์  ์ˆ˜์ •/๊ฐœ์„ 
์™ธ๋ถ€ ์กฐ๊ฑด๋ถ€ ์ •๋ณด์ฐธ๊ณ  ์ž๋ฃŒ ๊ฒ€์ƒ‰
๊ณ ํ’ˆ์งˆ ์ตœ์ข… ์ด๋ฏธ์ง€์ตœ์ข… ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ ์™„์„ฑ

Diffusion์œผ๋กœ ์—ฐ๊ตฌํ•˜๋Š” AI Agent

TTD-DR์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ ์ƒ์„ฑ์„ diffusion ๊ณผ์ •์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

“We conceptualize the generation of a complex research report as a diffusion process where an initial, noisy draft is progressively refined into a high-quality final output.”

“์šฐ๋ฆฌ๋Š” ๋ณต์žกํ•œ ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ ์ƒ์„ฑ์„ ์ดˆ๊ธฐ์˜ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ดˆ์•ˆ์ด ์ ์ง„์ ์œผ๋กœ ์ •์ œ๋˜์–ด ๊ณ ํ’ˆ์งˆ์˜ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฌผ์ด ๋˜๋Š” diffusion ๊ณผ์ •์œผ๋กœ ๊ฐœ๋…ํ™”ํ–ˆ๋‹ค.”

๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜

1. Denoising with Retrieval

  • ์ดˆ๊ธฐ ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ(์ฃผ๋กœ LLM ๋‚ด๋ถ€ ์ง€์‹ ๊ธฐ๋ฐ˜)๋ฅผ ์ž‘์„ฑ
  • ๊ฐ denoising ๋‹จ๊ณ„์—์„œ ์™ธ๋ถ€ ์ •๋ณด ๊ฒ€์ƒ‰์œผ๋กœ ๋‚ด์šฉ์„ ๋ณด๊ฐ•
  • ์ดˆ์•ˆ๊ณผ ์—ฐ๊ตฌ ๊ณ„ํš์ด ๋‹ค์Œ ๊ฒ€์ƒ‰ ๋ฐฉํ–ฅ์„ ๋™์ ์œผ๋กœ ์•ˆ๋‚ด

2. Self-Evolution

  • ๊ฐ ๊ตฌ์„ฑ์š”์†Œ(๊ณ„ํš, ์งˆ๋ฌธ, ๋‹ต๋ณ€, ๋ณด๊ณ ์„œ ์ƒ์„ฑ)๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ตœ์ ํ™”
  • ๋‹ค์–‘ํ•œ ์ง€์‹ ํƒ์ƒ‰์„ ์žฅ๋ คํ•˜๊ณ  ์ •๋ณด ์†์‹ค์„ ์™„ํ™”
  • diffusion ๊ณผ์ •์— ๋” ๋‚˜์€ context ์ œ๊ณต

Test-Time Diffusion Deep Researcher (TTD-DR)

: 3๋‹จ๊ณ„ Backbone Model + 2๊ฐ€์ง€ Optimization ๊ธฐ๋ฒ•

Backbone Deep Research Agent

Image

TTD-DR์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ๋Š” 3๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ๋‹ค:

Stage 1: Research Plan Generation

  • ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ๋ฅผ ๋ฐ›์•„ ๊ตฌ์กฐํ™”๋œ ์—ฐ๊ตฌ ๊ณ„ํš ์ƒ์„ฑ
  • ์ตœ์ข… ๋ณด๊ณ ์„œ์— ํ•„์š”ํ•œ ํ•ต์‹ฌ ์˜์—ญ๋“ค์„ ๋‚˜์—ด
  • ํ›„์† ์ •๋ณด ์ˆ˜์ง‘ ๊ณผ์ •์˜ ์ดˆ๊ธฐ ๊ฐ€์ด๋“œ๋ผ์ธ ์—ญํ• 

Stage 2: Iterative Search and Synthesis

  • 2a) Search Question Generation: ์—ฐ๊ตฌ ๊ณ„ํš๊ณผ ์ด์ „ ์ปจํ…์ŠคํŠธ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ ์ƒ์„ฑ
  • 2b) Answer Searching: ์™ธ๋ถ€ ์†Œ์Šค ๊ฒ€์ƒ‰ํ•˜์—ฌ ๊ด€๋ จ ๋ฌธ์„œ ์ฐพ๊ณ  ์š”์•ฝ๋œ ๋‹ต๋ณ€ ๋ฐ˜ํ™˜
  • ์—ฐ๊ตฌ ๊ณ„ํš์ด ์ถฉ๋ถ„ํžˆ ์ปค๋ฒ„๋˜๊ฑฐ๋‚˜ ์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ์ˆœํ™˜

Stage 3: Final Report Generation

  • 1๋‹จ๊ณ„์˜ ๊ณ„ํš๊ณผ 2๋‹จ๊ณ„์˜ ์งˆ๋ฌธ-๋‹ต๋ณ€ ์Œ๋“ค์„ ์ข…ํ•ฉ
  • ํฌ๊ด„์ ์ด๊ณ  ์ผ๊ด€์„ฑ ์žˆ๋Š” ์ตœ์ข… ๋ณด๊ณ ์„œ ์ƒ์„ฑ

Component-wise Self-Evolution

Image

๊ฐ ๋‹จ๊ณ„์˜ ์—์ด์ „ํŠธ ์„ฑ๋Šฅ์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜:

  1. Initial States: ๋‹ค์–‘ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์—ฌ๋Ÿฌ ๋‹ต๋ณ€ ๋ณ€ํ˜• ์ƒ์„ฑ (temperature, top_k ์กฐ์ •)
  2. Environmental Feedback: **LLM-as-a-judge*๋ฅผ ํ†ตํ•ด Helpfulness, Comprehensiveness ํ‰๊ฐ€
  3. Revision Step: ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ ๋ณ€ํ˜•์„ ๊ฐœ์„ 
  4. Cross-over: ์—ฌ๋Ÿฌ ๊ฐœ์„ ๋œ ๋ณ€ํ˜•๋“ค์„ ํ•˜๋‚˜์˜ ๊ณ ํ’ˆ์งˆ ๊ฒฐ๊ณผ๋ฌผ๋กœ ํ†ตํ•ฉ

*LLM-as-a-judge: ๋‹ค๋ฅธ LLM์ด ์ƒ์„ฑํ•œ ํ…์ŠคํŠธ์˜ ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋Š” LLM

Report-level Denoising with Retrieval

diffusion model์˜ sampling ๊ณผ์ •์—์„œ ์˜๊ฐ์„ ๋ฐ›์€ ํ•ต์‹ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜

Image
  • preliminary draft๋ฅผ โ€œnoisyโ€ ์‹œ์ž‘์ ์œผ๋กœ ์„ค์ •
  • iterative refinement๋ฅผ ํ†ตํ•ด ์ ์ง„์ ์œผ๋กœ ํ’ˆ์งˆ ํ–ฅ์ƒ
  • ๊ฐ ๋‹จ๊ณ„์—์„œ retrieval mechanism์ด ์™ธ๋ถ€ ์ •๋ณด๋ฅผ ๋™์ ์œผ๋กœ ํ†ตํ•ฉ

๊ณ„์† ๊ฐœ์„ ๋˜๋Š” ์ดˆ์•ˆ์ด ๊ฒ€์ƒ‰์„ ์•ˆ๋‚ดํ•˜๊ณ , ๊ฒ€์ƒ‰์ด ์ดˆ์•ˆ์„ ์ •์ œํ•˜๋Š” ์ง€์†์ ์ธ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„์ด๋‹ค. ****์ด๋ฅผ ํ†ตํ•ด ๋ณด๊ณ ์„œ์˜ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์—ฐ๊ตฌ๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰๋˜๋„๋ก ํ•œ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ

์‹คํ—˜ ์„ธํŒ…

ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹:

  • LongForm Research: 205๊ฐœ ์‹ค์ œ ์‚ฐ์—… ๋„๋ฉ”์ธ ์ฟผ๋ฆฌ
  • DeepConsult: ๋น„์ฆˆ๋‹ˆ์Šค/์ปจ์„คํŒ… ๊ด€๋ จ Deep Research ํ”„๋กฌํ”„ํŠธ
  • HLE-Search: Humanity’s Last Exam์—์„œ ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ ์ฟผ๋ฆฌ 200๊ฐœ ์„ ๋ณ„
  • GAIA: ์‹ค์ œ AI ๋Šฅ๋ ฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ multi-hop ์งˆ๋ฌธ ๋ฒค์น˜๋งˆํฌ

ํ‰๊ฐ€ ๋ฐฉ๋ฒ•:

  • Side-by-side comparison: ๋‘ ๋ณด๊ณ ์„œ๋ฅผ ์ง์ ‘ ๋น„๊ตํ•˜์—ฌ ์šฐ์ˆ˜์„ฑ ํ‰๊ฐ€
  • Helpfulness & Comprehensiveness: ์žฅ๋ฌธ LLM ์‘๋‹ต ํ‰๊ฐ€
  • Human-calibrated LLM-as-a-judge: ์ธ๊ฐ„ ํ‰๊ฐ€์ž์™€์˜ alignment ๋น„๊ต

์„ฑ๋Šฅ ๋น„๊ต

์‹œ์Šคํ…œLongForm ResearchDeepConsultHLE-SearchGAIA
TTD-DR (ours)69.1%74.5%33.9%69.1%
OpenAI Deep Research--29.1%67.4%
Perplexity Deep Research21.8%32.0%14.5%54.5%
Grok DeeperSearch16.1%16.0%19.3%47.9%
GPT-Researcher18.3%9.4%2.0%37.7%
Image Image

์š”์•ฝ

  1. ์„ฑ๋Šฅ ์šฐ์œ„: ์žฅ๋ฌธ ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ ์ƒ์„ฑ ์ž‘์—…์—์„œ ๊ธฐ์กด ์‹œ์Šคํ…œ๋“ค ๋Œ€๋น„ 2-3๋ฐฐ ์ด์ƒ์˜ ์Šน๋ฅ 
  2. ํšจ์œจ์ ์ธ test-time scaling: ๋น„์Šทํ•œ latency time์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ
  3. Self-evolution ํšจ๊ณผ: ๊ฒ€์ƒ‰ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€์˜ ๋ณต์žก๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผœ ์ •๋ณด์˜ ํ’๋ถ€ํ•จ ์ฆ๋Œ€
  4. Denoising ํšจ๊ณผ: ์ดˆ๊ธฐ 9๋‹จ๊ณ„๋งŒ์œผ๋กœ๋„ ์ตœ์ข… ๋ณด๊ณ ์„œ ์ •๋ณด์˜ 51.2% ํ†ตํ•ฉ

์‹œ์‚ฌ์ 

ํ•™๋ฌธ์  ์˜์˜

TTD-DR์€ ์ธ๊ฐ„์˜ ์ธ์ง€ ๊ณผ์ •์„ AI ์‹œ์Šคํ…œ ์„ค๊ณ„์— ์ฒด๊ณ„์ ์œผ๋กœ ๋ฐ˜์˜ํ•œ ์ฒซ ๋ฒˆ์งธ ์‹œ๋„๋กœ, ๋‹จ์ˆœํžˆ ๊ธฐ์กด ๊ธฐ๋ฒ•๋“ค์„ ์กฐํ•ฉํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์ธ์ง€๊ณผํ•™ ์—ฐ๊ตฌ์—์„œ ๋ฐํ˜€์ง„ ์ธ๊ฐ„์˜ ๊ธ€์“ฐ๊ธฐ ํŒจํ„ด์„ diffusion ๊ณผ์ •์œผ๋กœ ์ถ”์ƒํ™”ํ•œ ๊ฒƒ์ด ๊ธฐ์—ฌํ•œ ๋ถ€๋ถ„์ด๋‹ค.

๋˜ํ•œ test-time scaling์˜ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ–ˆ๋‹ค. ๊ธฐ์กด์˜ ๋‹จ์ˆœํ•œ ๋ฐ˜๋ณต์ด๋‚˜ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์„ ๋„˜์–ด, ๊ตฌ์กฐํ™”๋œ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„์™€ component-wise ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ํšจ์œจ์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

์‚ฐ์—…์  ์˜ํ–ฅ

ํ˜„์žฌ OpenAI, Perplexity, Anthropic ๋“ฑ์ด ๊ฒฝ์Ÿํ•˜๊ณ  ์žˆ๋Š” AI Research Assistant ์‹œ์žฅ์— ์ƒˆ๋กœ์šด ๊ธฐ์ค€์„ ์ œ์‹œํ–ˆ๋‹ค. ํŠนํžˆ ๊ธฐ์—… ํ™˜๊ฒฝ์—์„œ ์š”๊ตฌ๋˜๋Š” ๋ณต์žกํ•œ ์‹œ์žฅ ๋ถ„์„, ๊ธฐ์ˆ  ๋™ํ–ฅ ์กฐ์‚ฌ, ์ „๋žต ๊ธฐํš ๋“ฑ์˜ ์ž‘์—…์—์„œ ์‹ค์งˆ์  ๋„์›€์„ ์ค„ ์ˆ˜ ์žˆ๋Š” ์ˆ˜์ค€์— ๋„๋‹ฌํ–ˆ๋‹ค.

Google Cloud AI Research์˜ ์ด๋ฒˆ ์—ฐ๊ตฌ๋Š” ๊ฒ€์ƒ‰ ๋„๊ตฌ๋งŒ์œผ๋กœ๋„ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค๋Š” ์ ์—์„œ ์ฃผ๋ชฉํ•  ๋งŒํ•˜๋‹ค. ๋งŽ์€ ๊ฒฝ์Ÿ์‚ฌ๋“ค์ด ๋‹ค์–‘ํ•œ proprietary tool๋“ค์„ ํ†ตํ•ฉํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€๋Š” ๋ฐ˜๋ฉด, ๋” ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์–ด๋‚ธ ๊ฒƒ์ด๋‹ค.

ํ•œ๊ณ„์™€ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

๋…ผ๋ฌธ์—์„œ ์ €์ž๋“ค์ด ๋ช…์‹œ์ ์œผ๋กœ ์ธ์ •ํ•œ ํ•œ๊ณ„์ ๋“ค์ด ์žˆ๋‹ค:

  1. ๋„๊ตฌ์˜ ์ œ์•ฝ: ํ˜„์žฌ๋Š” ๊ฒ€์ƒ‰ ๋„๊ตฌ๋งŒ ์‚ฌ์šฉํ•˜๋ฉฐ, ์›น ๋ธŒ๋ผ์šฐ์ง•์ด๋‚˜ ์ฝ”๋”ฉ ๋„๊ตฌ๋Š” ๋ฏธํฌํ•จ
  2. Agent tuning ๋ถ€์žฌ: test-time scaling์—๋งŒ ์ง‘์ค‘ํ•˜๊ณ  ํ•™์Šต ๊ธฐ๋ฐ˜ ์ตœ์ ํ™”๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์Œ
  3. ๊ณ„์‚ฐ ๋น„์šฉ: ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ ๋ฐ˜๋ณต๊ณผ self-evolution์œผ๋กœ ์ธํ•œ ๋†’์€ ์—ฐ์‚ฐ ๋น„์šฉ

ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ๋Š” multimodal ๋Šฅ๋ ฅ ํ†ตํ•ฉ, ๋” ๋‹ค์–‘ํ•œ ๋„๊ตฌ ํ™œ์šฉ, ๊ทธ๋ฆฌ๊ณ  ํ›ˆ๋ จ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•๊ณผ์˜ ๊ฒฐํ•ฉ์ด ์ฃผ์š” ๊ณผ์ œ๊ฐ€ ๋  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

๊ฒฐ๋ก 

TTD-DR์€ ๋‹จ์ˆœํžˆ ์„ฑ๋Šฅ ์ˆ˜์น˜๋ฅผ ๊ฐœ์„ ํ•œ ๊ฒƒ์„ ๋„˜์–ด AI ์—ฐ๊ตฌ ์—์ด์ „ํŠธ ์„ค๊ณ„์˜ ๊ทผ๋ณธ์  ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜์„ ์ œ์‹œํ–ˆ๋‹ค. ์ธ๊ฐ„์˜ ์—ฐ๊ตฌ ๊ณผ์ •์„ diffusion ํ”„๋กœ์„ธ์Šค๋กœ ๋ชจ๋ธ๋งํ•˜๊ณ , ์ดˆ์•ˆ ์ค‘์‹ฌ์˜ ๋ฐ˜๋ณต์  ์ •์ œ (refining) ๋ฐฉ์‹์„ ๋„์ž…ํ•œ ๊ฒƒ์€ ํ˜์‹ ์ ์ด๋‹ค.

ํŠนํžˆ ๋ณต์žกํ•œ ์ถ”๊ฐ€ ๋„๊ตฌ ์—†์ด๋„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ๊ฐœ์„ ๋งŒ์œผ๋กœ ๊ธฐ์กด ์‹œ์Šคํ…œ๋“ค์„ ์••๋„ํ•œ ์ ์€ ์ธ์ƒ์ ์ด๋‹ค. ์ด๋Š” AI ์—ฐ๊ตฌ์—์„œ “๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ, ๋” ํฐ ๋ชจ๋ธ, ๋” ๋ณต์žกํ•œ ๋„๊ตฌ"๊ฐ€ ํ•ญ์ƒ ์ •๋‹ต์€ ์•„๋‹Œ ๊ฒƒ ๊ฐ™๋‹ค. (๊ทธ๋ž˜๋„ ์Šค์ผ€์ผ๋ง ๋ฒ•์น™ ๋ชป ์ฐธ์ง€)

ํ•˜์ง€๋งŒ ์‹ค์ œ ์ƒ์šฉํ™” ๊ด€์ ์—์„œ๋Š” ์—ฌ์ „ํžˆ ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๊ณผ์ œ๋“ค์ด ์žˆ๋‹ค. ๋†’์€ ๊ณ„์‚ฐ ๋น„์šฉ๊ณผ ์ œํ•œ์ ์ธ ๋„๊ตฌ ํ™œ์šฉ์€ ํ˜„์‹ค์ ์ธ ์ œ์•ฝ์ด๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  TTD-DR์ด ์ œ์‹œํ•œ ๋ฐฉํ–ฅ์„ฑ-์ธ๊ฐ„์˜ ์ธ์ง€ ๊ณผ์ • ๋ชจ๋ฐฉ๊ณผ ์ฒด๊ณ„์ ์ธ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„-์€ ํ–ฅํ›„ AI Research Agent ๋ฐœ์ „์˜ ์ค‘์š”ํ•œ ๋ฐœํŒ์ด ๋  ๊ฒƒ์ด๋‹ค.

Google Cloud์˜ ํ•ด๋‹น ์—ฐ๊ตฌ๋Š” AI๊ฐ€ ๋‹จ์ˆœํ•œ ์งˆ๋ฌธ๋‹ต๋ณ€ ๋„๊ตฌ๋ฅผ ๋„˜์–ด ์ง„์ •ํ•œ ์—ฐ๊ตฌ ํŒŒํŠธ๋„ˆ๋กœ ์ง„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€ ์˜๋ฏธ ์žˆ๋Š” ์„ฑ๊ณผ๋‹ค.