๐Ÿช†Matryoshka Representation Learning (2022) ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ (feat. EmbeddingGemma)

September 4, 2025 in embedding 5 minutes

0/Matryoshka Representation Learning; MRL

gemini-embedding-001 (Google, 2025), text-embedding-3-large (2024, OpenAI), voyage-context-3 (2025, Voyage AI) ๋“ฑ ์ตœ์‹  ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์—์„œ Matryoshka Representation Learning; MRL๋ฅผ ์ง€์›ํ•˜๋‹ค๊ณ  ํ•˜๋Š”๋ฐ MRL์ด ๋ญ˜๊นŒ?

Matryoshka Representation Learning(MRL)๋Š” ๋Ÿฌ์‹œ์•ˆ ์ธํ˜• ๋งˆํŠธ๋ฃŒ์‹œ์นด ์ฒ˜๋Ÿผ ํ•˜๋‚˜์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ์•ˆ์— ์—ฌ๋Ÿฌ ์„ธ๋ถ„ํ™”๋œ ์ •๋ณด๋ฅผ ๋‹ด์•„ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…(downstream task)์˜ ์—ฐ์‚ฐ ์ œ์•ฝ ์กฐ๊ฑด์— ์œ ๋™์ ์œผ๋กœ ์ ์‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋œ ํ‘œํ˜„ ํ•™์Šต(representation learning) ๊ธฐ๋ฒ•์ด๋‹ค.

*“multi embedding"์ด๋ผ๊ณ ๋„ ๋ถ€๋ฅด๊ธฐ๋„ ํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” multi-objective MRL๋กœ ํ‘œํ˜„ํ•œ๋‹ค *downstream task๋ž€ ํ•™์Šต๋œ ์ž„๋ฒ ๋”ฉ์„ ๋ถ„๋ฅ˜/๊ฒ€์ƒ‰/๋žญํ‚น ๋“ฑ๊ณผ ๊ฐ™์€ ํ›„์† ์ž‘์—… (ํ”ํžˆ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํ…Œ์Šคํฌ์— ๋งž๊ฒŒ ํŒŒ์ธํŠœ๋‹ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.)

MRL ๊ตฌ์กฐ๋„

ย 

ํฐ ์ž„๋ฒ ๋”ฉ ์•ˆ์— ๊ทธ ์ž์ฒด๋กœ๋„ ์œ ์šฉํ•œ ์ž‘์€ ์ž„๋ฒ ๋”ฉ๋“ค์ด ๊ฒน๊ฒน์ด ๋“ค์–ด๊ฐ€ ์žˆ์–ด ์ƒํ™ฉ์— ๋งž๊ฒŒ ๊บผ๋‚ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. โ€œ์ฑ… ์ฝ๊ธฐโ€๋กœ ๋น„์œ ํ•˜์ž๋ฉด 32์ฐจ์›์œผ๋กœ ์ฑ… ํ‘œ์ง€์™€ ๋ชฉ์ฐจ๋ฅผ ์‚ดํŽด๋ณด๊ณ , ๋‚ด์šฉ์„ ๋” ์ฝ๊ณ  ์‹ถ์œผ๋ฉด 128์ฐจ์›๊นŒ์ง€ ์ฑ…์„ ํŽผ์ณ๋ณด๊ณ , ๊ทธ๋Ÿผ์—๋„ ๋ถ€์กฑํ•˜๋ฉด ๋ถ€๋ก, ์ฆ‰ ์ตœ์ข… ์ฐจ์›๊นŒ์ง€ ๋ณด๋Š” ๊ฒƒ์ด๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ ์ž„๋ฒ ๋”ฉ ์ฐจ์›์ด ๋†’์„์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐ€์ง€๋งŒ, ๊ทธ๋Ÿด์ˆ˜๋ก ๋น„์šฉ๊ณผ ์†๋„๋„ ํ•จ๊ป˜ ์ฆ๊ฐ€ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ƒํ™ฉ์— ๋งž๊ฒŒ ์œ ์—ฐํ•˜๊ฒŒ ์ €์ฐจ์›, ๊ณ ์ฐจ์›์„ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ž„๋ฒ ๋”ฉ ํ•˜๋‚˜ ์•ˆ์— ๊ฐ„๋žตํ•œ ์ •๋ณด(coarse)๋ถ€ํ„ฐ ์ž์„ธํ•œ ์ •๋ณด(fine)๊นŒ์ง€ ์ˆœ์„œ๋Œ€๋กœ ๋‹ด์•„๋‘์ž!
๊ทธ๋ž˜์„œ ๊ณ ์ฐจ์›๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ ์ €โ†’์ค‘โ†’๊ณ ์ฐจ์›์„ ์ƒํ™ฉ์— ๋งž๊ฒŒ ์“ฐ์ž!

1/ rigidity โ†” flexibility?

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” rigidity โ†” flexibility ์˜ ๊ฐœ๋…์œผ๋กœ ์„ค๋ช…ํ•˜๋Š”๋ฐ, ๋ฌด์—‡์„ ๋งํ•˜๋Š”๊ฑธ๊นŒ?

๊ณผ๊ฑฐ ์ž„๋ฒ ๋”ฉ ๊ณ ์ • ์ฐจ์›(fixed dimension)์ด๋ผ, ๋‹ค๋ฅธ ์ฐจ์›์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์ด์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ๋‹ค์‹œ ํ›ˆ๋ จํ•ด์•ผํ–ˆ๋‹ค. (์˜ค๋ฒ„ํ—ค๋“œ โ†‘)

MRL์€ ๊ณ„์‚ฐ ์ž์›๊ณผ ์ƒํ™ฉ์— ๋งž์ถฐ ๋ฐ”๋กœ ํ•˜๋‚˜์˜ ๋ชจ๋ธ & ํ•˜๋‚˜์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ํ•„์š”ํ•œ ์ฐจ์›์˜ ์ •๋ณด๋ฅผ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด ์œ ์—ฐํ•˜๋ฉฐ ํšจ์œจ์ ์ด๋‹ค.

์ž„๋ฒ ๋”ฉ ํ•˜๋‚˜๋กœ coarse โ†’ fine (์ „๋ฐ˜์  โ†’ ์„ธ๋ถ€์ ์œผ๋กœ) ์š”์•ฝ ์ •๋ณด์—์„œ ์„ธ๋ถ€์ ์ธ ์ •๋ณด๋ฅผ ๋‹ด๋Š” ๊ฒƒ์ด๋‹ค. ๋‹ค์–‘ํ•œ ์„ธ๋ถ„์„ฑ(granularity)๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์ •๋ณด๋ฅผ ๊ณ„์ธต์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•œ๋‹ค.

2/ ์™œ ํ•„์š”ํ•œ๊ฐ€?

์ž„๋ฒ ๋”ฉ ์ฐจ์› $d$, ๋ฐ์ดํ„ฐ ํฌ๊ธฐ $N$, ๋ผ๋ฒจ ์ˆ˜ $L$์— ๋น„๋ก€ํ•˜์—ฌ ์„ ํ˜•์ ์œผ๋กœ ์ถ”๋ก  ๋น„์šฉ์ด ์ฆ๊ฐ€ํ•œ๋‹ค.

๊ฒฝ์‚ฌ ๊ธฐ๋ฐ˜ ํ•™์Šต์€ ์˜๋ฏธ ์žˆ๋Š” ์ •๋ณด๊ฐ€ ๋ฒกํ„ฐ ์ „์—ญ์œผ๋กœ ํผ์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด ๋‚ฎ์€ ์ฐจ์›๋งŒ์œผ๋กœ ์ถฉ๋ถ„ํ•œ ์ž‘์—…์ž„์—๋„ ํฐ ์ฐจ์›์„ ๊ฐ•์ œํ•˜๋Š” ๋น„ํšจ์œจ์ด ๋ฐœ์ƒํ•œ๋‹ค.

*์—ฌ๊ธฐ์„œ์˜ ์ถ”๋ก  ๋น„์šฉ์€ ๊ณ„์‚ฐ๋œ deep representation(๊ณ ์ฐจ์› ๋ฒกํ„ฐ; ๋ฐ์ดํ„ฐ์˜ ๋ณธ์งˆ์ ์ด๊ณ  ์˜๋ฏธ ์žˆ๋Š” ์ •๋ณด๊ฐ€ ์••์ถ•๋œ ํ˜•ํƒœ๋กœ ํ‘œํ˜„; ๊ณ  ์ˆ˜์ค€ ํ‘œํ˜„)์„ ์‹ค์ œ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜(downstream application)์— ํ™œ์šฉํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๋น„์šฉ์„ ๋งํ•œ๋‹ค.

MRL์€ ์ถ”๊ฐ€ ์ถ”๋ก  ๋น„์šฉ ์—†์ด ๊ธฐ์กด ํŒŒ์ดํ”„๋ผ์ธ์„ ์กฐ๊ธˆ ์ˆ˜์ •ํ•ด ์ ์‘ํ˜• ์ž„๋ฒ ๋”ฉ(adaptive embeddings)์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

3/ ์•„์ด๋””์–ด - โ€œ์•ž์€ ์š”์•ฝ, ๋’ค๋Š” ๋””ํ…Œ์ผโ€(coarseโ†’fine)

MRL ๊ตฌ์กฐ๋„

MRL์€ ๊ณ ์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ์•ˆ์— coarse-to-fine granularity ์ˆ˜์ค€์˜ ์ •๋ณด๋ฅผ ๊ณ„์ธต์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•œ๋‹ค.

$O(\\log d)$ : d-์ฐจ์› ๋ฒกํ„ฐ ์•ˆ์— $M=\\{8,16,32,\\dots,d\\}$ ์„ ์ •ํ•ด, ๊ฐ $m \\in \\mathcal{M}$์—๋Œ€ํ•ด ์•ž $m$๊ฐœ $z_{1:m}$๋งŒ ์‚ฌ์šฉํ•ด๋„ ์œ ์šฉํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 1024 ์ฐจ์›์˜ ์ตœ์ข… ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ ์žˆ๋‹ค๋ฉด, ๊ทธ ์ค‘ ์ฒ˜์Œ 512์ฐจ์›๋งŒ ์‚ฌ์šฉํ•ด๋„ ํŠน์ • ๋ชฉ์ ์— ์™„์ „ํ•˜๊ณ  ์œ ์šฉํ•œ ์ž„๋ฒ ๋”ฉ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.

$$ \min_{\theta_F, \{W^{(m)}\}} \frac{1}{N} \sum_{m \in \mathcal{M}} \sum_{i} c_m \mathcal{L}(W^{(m)} F(x_i; \theta_F)_{1:m}, y_i) $$
  • $F(x_i;\\theta_F)$: ์ž…๋ ฅ x๋กœ๋ถ€ํ„ฐ d์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ z๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ƒ์„ฑ๋ง
  • $F(\\cdot)_{1:m}$: ์ƒ์„ฑ๋œ d์ฐจ์› ์ค‘ ์ฒ˜์Œ **m์ฐจ์›(prefix)**๋งŒ ์‚ฌ์šฉ
  • $W^{(m)}$: m์ฐจ์› ์ž„๋ฒ ๋”ฉ์„ ์œ„ํ•œ ์„ ํ˜• ๋ถ„๋ฅ˜๊ธฐ ๊ฐ€์ค‘์น˜
  • $c_m$: ๊ฐ ์ฐจ์›๋ณ„ ์†์‹ค์— ๋Œ€ํ•œ ์ค‘์š”๋„ ๊ฐ€์ค‘์น˜(relative importance)
  • $M$: ์ตœ์ ํ™”ํ•  ์ž„๋ฒ ๋”ฉ ์ฐจ์›์˜ ์ง‘ํ•ฉ e.g. {8, 16, 32, โ€ฆ , 256, 512}
  • $L$: ๋‹ค์ค‘ ํด๋ž˜์Šค ์†Œํ”„ํŠธ๋งฅ์Šค ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ(multi-class softmax crossentropy) ์†์‹ค ํ•จ์ˆ˜

๋‹จ์ˆœํžˆ ์ตœ์ข… ์ฐจ์›์— ๋Œ€ํ•œ ์†์‹ค ํ•จ์ˆ˜(loss function)๋งŒ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋ฏธ๋ฆฌ ์ •ํ•ด๋‘” ์—ฌ๋Ÿฌ ์ค‘์ฒฉ๋œ(nested) ์ฐจ์›๋“ค ๊ฐ๊ฐ์— ๋Œ€ํ•ด ๋™์‹œ์— ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•˜๋„๋ก ๋ชจ๋ธ ํ›ˆ๋ จํ•œ๋‹ค. ๋ชจ๋“  ์ค‘๊ฐ„ ์ฐจ์›์„ ๊ฐ๊ฐ์˜ ๋ชจ๋ธ์— ๋Œ€ํ•ด ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šตํ•˜์ง€ ์•Š๊ณ ๋„ ๋™์ผํ•œ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ํ•œ ๋ฒˆ์˜ forward pass๋งŒ์œผ๋กœ ๋ชจ๋“  ๊ณ„์ธต์  ํ‘œํ˜„์„ ์–ป์–ด ์ถ”๋ก  ์‹œ ์ƒ๋‹นํ•œ ๊ณ„์‚ฐ ๋น„์šฉ์„ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

# PyTorch code for Matryoshka Cross-Entropy Loss
import torch.nn as nn

class Matryoshka_CE_Loss(nn.Module):
    def __init__(self, relative_importance, **kwargs):
        super(Matryoshka_CE_Loss, self).__init__()
        self.criterion = nn.CrossEntropyLoss(**kwargs)
        self.relative_importance = relative_importance

    def forward(self, output, target):
        loss = 0
        for i in range(len(output)):
            loss += self.relative_importance[i] * self.criterion(output[i], target)
        return loss

4/ โ€œ๊ทธ๋ƒฅ ์ž๋ฅด๊ธฐโ€ vs. โ€œPCAโ€ vs. MRL

์ผ๋ฐ˜ ์ž„๋ฒ ๋”ฉ์˜ ์•ž ์ฐจ์›์„ ๊ทธ๋ƒฅ ์ž˜๋ผ๋‚ด ์‚ฌ์šฉํ•˜๋ฉด, ๊ทธ ์•ž ์ฐจ์› ๋ฒกํ„ฐ๋Š” ์›๋ณธ ์ •๋ณด๋ฅผ ์ œ๋Œ€๋กœ ๋‹ด์ง€ ๋ชปํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฒฝ์‚ฌ ๊ธฐ๋ฐ˜ ํ•™์Šต ๋ชจ๋ธ์€ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ๋ชจ๋“  ์ฐจ์›์— ๊ฑธ์ณ ์ •๋ณด๊ฐ€ ํ™•์‚ฐ(diffuse)๋˜์–ด ์ธ์ฝ”๋”ฉ๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ทธ๋ž˜์„œ ์ฒ˜์Œ ๋ช‡ ์ฐจ์›๋งŒ ๋–ผ์–ด๋‚ด ์‚ฌ์šฉํ•˜๋ฉด ์ •๋ณด์˜ ํ’ˆ์งˆ์„ ๋ณด์žฅํ•  ์ˆ˜ ์—†๋‹ค.

PCA/SVD๊ณผ ๊ฐ™์€ ์‚ฌํ›„ ์••์ถ•์€ ์ฐจ์›์„ ์กฐ๊ธˆ ์ค„์ด๋ฉด ์ •ํ™•๋„๊ฐ€ ์˜ฌ๋ผ๊ฐ€์ง€๋งŒ, ๊ณผํ•˜๊ฒŒ ์ค„์ด๋ฉด ์ •ํ™•๋„๊ฐ€ ๋งŽ์ด ๊ฐ์†Œํ•œ๋‹ค. ๋ฐ˜๋ฉด, MRL์€ ํ•™์Šต ์ดํ›„ ๋‹จ๊ณ„๊ฐ€ ์•„๋‹ˆ๋ผ end-to-end ํ•™์Šต ๋‹จ๊ณ„์—์„œ ๋ฏธ๋ฆฌ ์—ฌ๋Ÿฌ ์ฐจ์›์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ ํ™”๋˜์–ด ์žˆ์–ด ์ •ํ™•๋„๊ฐ€ ์œ ์ง€๋œ๋‹ค.

5/ EmbeddingGemma๋กœ ๋ณด๋Š” MRL ์˜ˆ์‹œ

EmbeddingGemma MRL ์˜ˆ์‹œ

25๋…„ 9์›”์— ์†Œ๊ฐœ๋œ EmbeddingGemma ๋ชจ๋ธ๋„ MRL์„ ์ง€์›ํ•˜๋Š”๋ฐ, ๊ณต์‹ ๋ฌธ์„œ์—์„œ MRL ์„ค๋ช…์ด ๊ฐ„๋žตํžˆ ์ž˜ ๋˜์–ด ์žˆ๋‹ค.

Python
# MRL test for `google/embeddinggemma-300M`
import os, numpy as np, torch
from sentence_transformers import SentenceTransformer

MODEL_ID = "google/embeddinggemma-300M"
DEVICE   = "cuda" if torch.cuda.is_available() else "cpu"
TOKEN    = os.getenv("HF_TOKEN")  # ํ•„์š” ์‹œ ์„ค์ •

# Load Model
model = SentenceTransformer(MODEL_ID, device=DEVICE, token=TOKEN)

data = ["์•„์ดํฐ", "๊ฐค๋Ÿญ์‹œ", "์‚ผ์„ฑ", "๊ณ ์–‘์ด"]

def l2norm(E):  # ํ–‰๋ณ„ L2 ์ •๊ทœํ™”
    return E / (np.linalg.norm(E, axis=1, keepdims=True) + 1e-12)

def cosine_to_anchor(E):
    a = E[0]
    sims = []
    for i in range(1, len(E)):
        s = float(np.dot(a, E[i]) / (np.linalg.norm(a)*np.linalg.norm(E[i]) + 1e-12))
        sims.append((data[i], s))
    return sims

def show(title, E):
    sims = cosine_to_anchor(E)
    print(f"\n[{title}] shape={E.shape}")
    for name, s in sims:
        print(f"  {data[0]} vs {name}: {s:.4f}")
    order = [name for name, s in sorted(sims, key=lambda x: x[1], reverse=True)]
    print("  rank:", " > ")
    return np.array([s for _, s in sims])

def spearman(u, v):  # ์ˆœ์œ„ ์•ˆ์ •์„ฑ ๊ฐ„๋‹จ ์ง€ํ‘œ
    r = lambda x: np.argsort(np.argsort(-x))
    return float(np.corrcoef(r(u), r(v))[0, 1])

# ===== 1/ full embedding =====
emb_full = model.encode(data, convert_to_numpy=True)
D = emb_full.shape[1]
s_full = show("FULL", emb_full)

# ===== 2/ truncate to 512 dims +  L2 normalization =====
E512 = l2norm(emb_full[:, :min(512, D)])
s_512 = show("TRUNCATE 512 + L2", E512)

# ===== 3/ truncate to 256 dims + L2 normalization =====
E256 = l2norm(emb_full[:, :min(256, D)])
s_256 = show("TRUNCATE 256 + L2", E256)

# check MRL
print(f"\nSpearman(FULL vs 512) = {spearman(s_full, s_512):.3f}")
print(f"Spearman(FULL vs 256) = {spearman(s_full, s_256):.3f}")
print(f"Base dim = {D}")
Output
[FULL] shape=(4, 768)
  ์•„์ดํฐ vs ๊ฐค๋Ÿญ์‹œ: 0.9355
  ์•„์ดํฐ vs ์‚ผ์„ฑ: 0.9326
  ์•„์ดํฐ vs ๊ณ ์–‘์ด: 0.8970
  rank: ์•„์ดํฐ > ๊ฐค๋Ÿญ์‹œ > ์‚ผ์„ฑ > ๊ณ ์–‘์ด

[TRUNCATE 512 + L2] shape=(4, 512)
  ์•„์ดํฐ vs ๊ฐค๋Ÿญ์‹œ: 0.9442
  ์•„์ดํฐ vs ์‚ผ์„ฑ: 0.9419
  ์•„์ดํฐ vs ๊ณ ์–‘์ด: 0.9133
  rank: ์•„์ดํฐ > ๊ฐค๋Ÿญ์‹œ > ์‚ผ์„ฑ > ๊ณ ์–‘์ด

[TRUNCATE 256 + L2] shape=(4, 256)
  ์•„์ดํฐ vs ๊ฐค๋Ÿญ์‹œ: 0.9568
  ์•„์ดํฐ vs ์‚ผ์„ฑ: 0.9548
  ์•„์ดํฐ vs ๊ณ ์–‘์ด: 0.9270
  rank: ์•„์ดํฐ > ๊ฐค๋Ÿญ์‹œ > ์‚ผ์„ฑ > ๊ณ ์–‘์ด

Spearman(FULL vs 512) = 1.000
Spearman(FULL vs 256) = 1.000
Base dim = 768

์‹ค์ œ๋กœ EmbeddingGemma ๋ชจ๋ธ์—์„œ MRL์ด ์ž˜ ๋™์ž‘ํ•˜๋Š”์ง€ ํ™•์ธํ•œ ๊ฒฐ๊ณผ ์ฐจ์›์„ ๊ธฐ์กด 768์ฐจ์›์—์„œ 256์ฐจ์›์œผ๋กœ ์ถ•์†Œํ•˜์˜€๋Š”๋ฐ๋„ ์˜๋ฏธ๋ฅผ ์ž˜ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

References