๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
AI

ํ…์ŠคํŠธ feature vector ์‚ดํŽด๋ณด๊ธฐ

by kaizen_bh 2025. 10. 29.

 

 

 

ํ…์ŠคํŠธ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ ๋А๋‚€ ์ ์€ ๋ชจ๋ธ, ํ•™์Šต๊ณผ ๊ด€๋ จ๋œ ๋ถ€๋ถ„๋“ค์€ ๋ธ”๋ž™ ๋ฐ•์Šค์ธ ๋ถ€๋ถ„๋“ค์ด ๋งŽ์•„ ์ธ๊ณผ๊ด€๊ณ„๋“ค์„ ํŒŒ์•…ํ•˜๊ธฐ๊ฐ€ ์ •๋ง ์–ด๋ ต๋‹ค๋Š” ๊ฒƒ์ด์˜€๋‹ค

๋ฌผ๋ก  ๋ชจ๋ธ ์ธก๋ฉด์—์„œ๋„ Loss๋‚˜ ์„ฑ๋Šฅ ์ง€ํ‘œ, ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜, ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๋“ฑ ๋‹ค๋ค„๋ณผ ๋ถ€๋ถ„๋“ค์€ ์ •๋ง ๋งŽ์ง€๋งŒ... ์•„์ง ์ดํ•ด๋„๊ฐ€ ๋„ˆ๋ฌด ๋‚ฎ์•„์„œ ์–ด๋–ค ๋ถ€๋ถ„์„ ์ˆ˜์ •ํ•˜์˜€์„ ๋•Œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๋˜๋Š” ์–ด๋–ค ์˜ํ–ฅ์ด ์ƒ๊ธธ์ง€ ์˜ˆ์ƒ์กฐ์ฐจ ๋˜์ง€ ์•Š์•˜๋‹ค

 

๊ทธ๋ž˜์„œ ๋ฐ์ดํ„ฐ ์ฐจ์›์—์„œ ์–ด๋–ป๊ฒŒ ๋” ๋ณผ ์ˆ˜ ์žˆ์„๊นŒ ํ•˜๋‹ค๊ฐ€ ์ธ์ฝ”๋”๋ฅผ ํ†ตํ•ด feature vector๋ฅผ ์ถ”์ถœํ•˜์—ฌ ๋ถ„์„ํ•ด๋ณด๋ผ๋Š” ์ข‹์€ ์•„์ด๋””์–ด๋ฅผ ๋ฐ›์•„์„œ ์ง„ํ–‰ํ•ด๋ณด์•˜๋‹ค

 

 

 

๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ ์ž์ฒด๋ฅผ ๋„˜์–ด์„œ, ๋ชจ๋ธ์ด ํ•™์Šตํ•œ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง• ๊ณต๊ฐ„(Feature Space)์„ ๋ถ„์„ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ์˜ ๊ทผ๋ณธ์ ์ธ ๋ฌธ์ œ์ ์„ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ

 

๐Ÿ‘‰ ๋ชจ๋ธ์˜ feature space(ํŠน์ง• ๊ณต๊ฐ„)๋ฅผ ์ง์ ‘ ๋“ค์—ฌ๋‹ค๋ด์„œ, ๋น„์Šทํ•œ ๊ฐ์ • ์œ ํ˜•์˜ ๋ฌธ์žฅ๋“ค์ด ์‹ค์ œ๋กœ ๋น„์Šทํ•œ ์ž„๋ฒ ๋”ฉ์œผ๋กœ ๋ฝ‘ํžˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๋Š” ๊ฒƒ์ด๋‹ค

์ฆ‰, ๋‹จ์ˆœํžˆ ์ •ํ™•๋„๋งŒ ๋ณด๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ ๋ชจ๋ธ์ด ๊ฐ์ • ๊ตฌ์กฐ๋ฅผ “์ž˜ ํ•™์Šตํ•˜๊ณ  ์žˆ๋Š”๊ฐ€”๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ๊ฒ€์ฆํ•˜์ž๋Š” ์ ‘๊ทผ

 

 

 

1. ์ตœ์ข… ๋ชฉํ‘œ: ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„ ๋ถ„์„

  • ์‹œ๋„ ๋‚ด์šฉ: ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜๊ธฐ(Classifier) ์ž…๋ ฅ ์ง์ „ ๋‹จ๊ณ„์—์„œ ์ถ”์ถœ๋œ ํŠน์ง• ๋ฒกํ„ฐ(Feature Vector)๋“ค์„ ๋ฝ‘์•„๋‚ด์–ด(Embedding), ์ด ๋ฒกํ„ฐ๋“ค์„ ํด๋Ÿฌ์Šคํ„ฐ๋ง(Clustering) ๋ฐ ์‹œ๊ฐํ™”(Visualization) ํ•ด๋ณด๋Š” ๊ฒƒ์ด๋‹ค
  • ๋ชฉ์ : ๋ชจ๋ธ์ด ๋ฐ”๋ผ๋ณด๋Š” ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง• ๊ณต๊ฐ„์—์„œ, ์›๋ž˜์˜ ๋ผ๋ฒจ(๊ธ์ •/๋ถ€์ •, ๊ฐ•ํ•œ/์•ฝํ•œ)์ด ์œ ์‚ฌํ•œ ํ…์ŠคํŠธ๋“ค๋ผ๋ฆฌ ์‹ค์ œ๋กœ ๊ฐ€๊น๊ฒŒ ๋ชจ์—ฌ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์—ฌ, ๋‹ค์Œ ๋‘ ๊ฐ€์ง€๋ฅผ ์ ๊ฒ€ํ•œ๋‹ค
    • ๋ฐ์ดํ„ฐ ๋ฌธ์ œ: ๋ผ๋ฒจ์ด '์ค‘๊ตฌ๋‚œ๋ฐฉ'์ด๋ผ๋Š” ๋ง์ฒ˜๋Ÿผ, ๋ฐ์ดํ„ฐ ์ž์ฒด์˜ ๋ผ๋ฒจ๋ง์ด ๋ชจํ˜ธํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ๋šœ๋ ทํ•˜๊ฒŒ ๋ถ„๋ฆฌ๋˜์ง€ ์•Š๋Š”์ง€.
    • ๋ชจ๋ธ ๋ฌธ์ œ: ํŠน์ง• ์ถ”์ถœ๊ธฐ(Feature Extractor)๊ฐ€ ๋ผ๋ฒจ์„ ๊ตฌ๋ถ„ํ•  ๋งŒํผ ์˜๋ฏธ ์žˆ๋Š” ํŠน์ง•์„ ์ž˜ ํ•™์Šตํ•˜์ง€ ๋ชปํ–ˆ๋Š”์ง€.

 

 

 

2. ๊ตฌ์ฒด์ ์ธ ์‹คํ–‰ ๋‹จ๊ณ„

๋‹จ๊ณ„ ๊ธฐ์ˆ ์  ๋‚ด์šฉ ๋น„๊ณ  / ์ค‘์š”์„ฑ
ํŠน์ง• ์ถ”์ถœ NLP ๋ชจ๋ธ์—์„œ ์ธ์ฝ”๋”(Encoder) ๋˜๋Š” Feature Extractor ๋ถ€๋ถ„๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ข… ํด๋ž˜์Šค ์˜ˆ์ธก(Classification)์ด ์•„๋‹Œ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœ.
ํ—ˆ๊น…ํŽ˜์ด์Šค ๋“ฑ์—์„œ ์ œ๊ณตํ•˜๋Š” extractor๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ๊ธฐ์กด ๋ชจ๋ธ ํด๋ž˜์Šค์—์„œ ๋ถ„๋ฅ˜๊ธฐ ๋ ˆ์ด์–ด ์ง์ „์˜ ์ถœ๋ ฅ์„ ๊ฐ€์ ธ์™€์•ผ ํ•œ๋‹ค
๊ฐ€์žฅ ์ค‘์š”ํ•œ ํ•ต์‹ฌ:
๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ(Class)๊ฐ€ ์•„๋‹ˆ๋ผ,
ํŠน์ง• ๊ณต๊ฐ„(Feature Space)์„ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ.
์ฐจ์› ์ถ•์†Œ ํŠน์ง• ๋ฒกํ„ฐ๋Š” ๊ณ ์ฐจ์›(์ˆ˜๋ฐฑ~์ˆ˜์ฒœ ์ฐจ์›)์ด๋ฏ€๋กœ, ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•ด t-SNE๋‚˜ PCA ๊ฐ™์€ ๊ธฐ๋ฒ•์œผ๋กœ 2D ๋˜๋Š” 3D๋กœ ์ฐจ์›์„ ์ถ•์†Œ ์ฃผ์˜: ์ผ๋‹จ ๊ณ ์ฐจ์› ์ƒํƒœ์—์„œ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์‹œ๋„ํ•ด๋ณด๊ณ , ์‹œ๊ฐํ™”ํ•  ๋•Œ๋งŒ ์ถ•์†Œํ•˜๋Š” ๊ฒƒ์ด ๋ฐ์ดํ„ฐ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ผ ์ˆ˜ ์žˆ์Œ
๋ถ„์„ ๋ฐ ๊ฒ€์ฆ ์ถ”์ถœ๋œ ์ €์ฐจ์› ๋ฒกํ„ฐ๋“ค์„ K-means๋‚˜ DBSCAN ๋“ฑ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๊ณ ,
์›๋ž˜์˜ ๋ผ๋ฒจ(๊ฐ•ํ•œ ๊ธ์ •/์•ฝํ•œ ๋ถ€์ • ๋“ฑ)์„ ์ƒ‰์œผ๋กœ ํ‘œ์‹œํ•˜์—ฌ ์‹œ๊ฐํ™”
ํ•ต์‹ฌ ๊ฒ€ํ†  ์‚ฌํ•ญ: ๋ผ๋ฒจ์ด ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋“ค์ด ํ•˜๋‚˜์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์ž˜ ๋ฌถ์ด๋Š”์ง€, ์•„๋‹ˆ๋ฉด ๋ผ๋ฒจ์ด ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋“ค์ด ์„ž์—ฌ์žˆ๋Š”์ง€.
๋ผ๋ฒจ ์กฐ์ • (์žฌ๊ฒ€์ฆ) ๋ผ๋ฒจ์ด ๋„ˆ๋ฌด ์„ธ๋ถ„ํ™”(๊ฐ•/์•ฝ)๋˜์–ด ๋ชจํ˜ธ์„ฑ์ด ํฌ๋‹ค๋ฉด, ์ด์ง„ ๋ถ„๋ฅ˜(Binary Classification) ๋ฌธ์ œ(๊ธ์ • vs. ๋ถ€์ •)๋กœ ๋‹จ์ˆœํ™”ํ•˜์—ฌ ํŠน์ง• ๊ณต๊ฐ„์„ ์žฌ๋ถ„์„ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง์˜ ์‹คํšจ์„ฑ์„ ๊ฒ€์ฆํ•˜๋Š” ์ค‘์š”ํ•œ ๋‹จ๊ณ„

 

 

 

 

 


 

 

 

 

 

1. ํ—ˆ๊น…ํŽ˜์ด์Šค์—์„œ Feature Vector ์ถ”์ถœ ๋ฐฉ๋ฒ•

 

 

์ฐธ๊ณ ํ•˜๋ฉด ์ข‹์€ ์ž๋ฃŒ

https://chanmuzi.tistory.com/243

 

[PyTorch] AutoModel vs AutoModelForSequenceClassification ๋น„๊ตํ•˜๊ธฐ (BERT ํŒŒํ—ค์น˜๊ธฐ!!)

๋ณธ ๊ฒŒ์‹œ๋ฌผ์€ NLP ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜์ธ BERT๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋“œ๋””์–ด ํ˜ผ์ž์„œ ์•„์ฃผ ๊ฐ„๋‹จํ•œ ํ”„๋กœ์ ํŠธ์— ๋„์ „ํ•ด ๋ณผ ๊ธฐํšŒ๊ฐ€ ์ฃผ์–ด์ ธ์„œ ๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๊ตฌ์กฐ

chanmuzi.tistory.com

 

 

์ฃผ ๋ชจ๋ธ์ธ BERT๊ฐ€ ์–ด๋–ค ๋ชจ๋ธ์ธ์ง€๋„ ์•Œ์•„์•ผํ•œ๋‹ค

https://wikidocs.net/115055

 

17-02 ๋ฒ„ํŠธ(Bidirectional Encoder Representations from Transformers, BERT)

* ํŠธ๋žœ์Šคํฌ๋จธ ์ฑ•ํ„ฐ์— ๋Œ€ํ•œ ์‚ฌ์ „ ์ดํ•ด๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ![](https://wikidocs.net/images/page/35594/BERT.PNG) BERT(Bidire…

wikidocs.net

 

 

 

ํŠน์ง• ์ถ”์ถœ : Feature Extractor๋งŒ ์‚ฌ์šฉ

  • ํ—ˆ๊น…ํŽ˜์ด์Šค ๋ชจ๋ธ์€ ๋ฐ”๋””์™€ ํ—ค๋“œ๋กœ ๊ตฌ๋ถ„๋˜์–ด ์žˆ์Œ. ๊ฐ™์€ ๋ฐ”๋””๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ ํ—ค๋“œ๋งŒ ๋ฐ”๊พธ์–ด ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ๊ธฐ์— ๊ฐ๊ฐ ๋”ฐ๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ฑฐ๋‚˜ ์ปค์Šคํ…€ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค
  • ๋ฒ ์ด์Šค๋ผ์ธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํ—ˆ๊น…ํŽ˜์ด์Šค ๋ชจ๋ธ(AutoModelForSequenceClassification)์€ “feature extractor + classifier”๊ฐ€ ๊ฒฐํ•ฉ๋œ ํ˜•ํƒœ
  • ํ•˜์ง€๋งŒ ์ง€๊ธˆ ํ•„์š”ํ•œ ๊ฑด classifier๋กœ ๊ฐ€๊ธฐ ์ „์˜ hidden representation, ์ฆ‰ “๋ฌธ์žฅ์˜ ์˜๋ฏธ๋ฅผ ์š”์•ฝํ•œ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ”๋ฅผ ๋ณด๋Š” ๊ฒƒ.
  • ๊ทธ๋ž˜์„œ AutoModel์ด๋‚˜ AutoModelForMaskedLM์ฒ˜๋Ÿผ classifier ํ—ค๋“œ๋ฅผ ์ œ๊ฑฐํ•œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€์„œ ์จ์•ผํ•œ๋‹ค

 

AutoModelForSequenceClassification

“๋ฌธ์žฅ ๋ถ„๋ฅ˜๊นŒ์ง€ ๋˜๋Š” ์™„์„ฑ๋œ ๋ชจ๋ธ”

[Transformer encoder] → [pooler] → [classification head (linear layer + softmax)]

 

  • ์ฆ‰, feature extractor + classifier๊ฐ€ ๋ชจ๋‘ ํฌํ•จ๋˜์–ด ์žˆ์Œ.
  • ์ด ๋ชจ๋ธ์€ ํ•™์Šต ์‹œ loss์™€ logits์„ ๋ฐ”๋กœ ๋ฐ˜ํ™˜
  • ๋ถ„๋ฅ˜ ์ž‘์—…(๊ฐ์„ฑ ๋ถ„์„, ์ŠคํŒธ ํƒ์ง€ ๋“ฑ)์— ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ธํŒ…๋˜์–ด ์žˆ์Œ.
  • ๊ฒฐ๊ณผ๋กœ ๋ฐ”๋กœ ํด๋ž˜์Šค๋ณ„ ์ ์ˆ˜(logits)๋ฅผ ์–ป๊ฒŒ ๋จ.
  • ์ด๊ฑด softmax๋งŒ ์ทจํ•˜๋ฉด ๊ณง๋ฐ”๋กœ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ ๊ฐ€๋Šฅ.

 

 

AutoModel

“ํŠน์ง•(Feature)๋งŒ ์ถ”์ถœํ•˜๋Š” Encoder ๋ชจ๋ธ”

[Transformer encoder only]

 

  • ์ฆ‰, classifier head๊ฐ€ ์—†์Œ.
  • loss๋‚˜ logits์„ ๊ณ„์‚ฐํ•˜์ง€ ์•Š๊ณ , ๋ฌธ์žฅ ๋˜๋Š” ํ† ํฐ ์ˆ˜์ค€์˜ hidden state๋งŒ ๋ฐ˜ํ™˜ํ•จ.
  • outputs.last_hidden_state๋ฅผ ํ†ตํ•ด ๊ฐ ํ† ํฐ์˜ ์ž„๋ฒ ๋”ฉ์„ ์–ป๊ณ , [CLS] ํ† ํฐ์ด๋‚˜ ํ‰๊ท  ํ’€๋ง์„ ํ†ตํ•ด ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ์„ ๋ฝ‘์„ ์ˆ˜ ์žˆ์Œ.

 

 

 

 

๋‘˜์„ ๋น„๊ตํ•ด์„œ ํ‘œ๋กœ ์ •๋ฆฌํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค

ํ•ญ๋ชฉ AutoModelForSequenceClassification AutoModel
๋ชฉ์  ๋ฌธ์žฅ ๋ถ„๋ฅ˜ ํ”ผ์ฒ˜ ์ถ”์ถœ
๊ตฌ์กฐ Transformer + Classifier Head Transformer๋งŒ
์ถœ๋ ฅ logits (ํด๋ž˜์Šค๋ณ„ ์ ์ˆ˜) hidden states (์ž„๋ฒ ๋”ฉ)
์‚ฌ์šฉ ์‹œ์  ํ›ˆ๋ จ/์ถ”๋ก  ๋‹จ๊ณ„ ๋ถ„์„/์‹œ๊ฐํ™” ๋‹จ๊ณ„
๋Œ€ํ‘œ ํ•„๋“œ outputs.logits outputs.last_hidden_state
ํ•™์Šต ๋ชฉ์  ๋ถ„๋ฅ˜ loss ๊ณ„์‚ฐ ๊ฐ€๋Šฅ feature representation ํƒ์ƒ‰์šฉ

 

 

๊ฐ„๋‹จํ•˜๊ฒŒ ์ •๋ฆฌํ•ด๋ณด๋ฉด

  • AutoModelForSequenceClassification
    → ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ฑฐ๋‚˜ ์˜ˆ์ธกํ•  ๋•Œ ์‚ฌ์šฉ.
  • AutoModel
    → ๋ถ„๋ฅ˜๊ธฐ ์•ž๋‹จ์˜ “๋ฌธ์žฅ ํ‘œํ˜„(embedding)”์„ ์ถ”์ถœํ•  ๋•Œ ์‚ฌ์šฉ. “ํŠน์ง• ๋ฒกํ„ฐ(feature vector)”๋Š” ๋ฐ”๋กœ ์ด ๋‹จ๊ณ„

 

 

๊ทธ๋Ÿผ ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ์ถœ๋ ฅ์ด ๋‚˜์˜ค๋Š”์ง€ ์˜ˆ์‹œ๋กœ ํ™•์ธํ•ด๋ณด์ž

 

AutoModelForSequenceClassification vs AutoModel ๋น„๊ต ์˜ˆ์‹œ

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModel
import torch

# ๋ชจ๋ธ ์ด๋ฆ„
MODEL_NAME = "klue/roberta-base"

# ์˜ˆ์‹œ ๋ฌธ์žฅ
sentences = ["์ด ์˜ํ™” ์ง„์งœ ์ตœ๊ณ ๋‹ค", "์Šคํ† ๋ฆฌ๊ฐ€ ๋„ˆ๋ฌด ์ง€๋ฃจํ–ˆ๋‹ค"]

# 1๏ธโƒฃ ํ† ํฌ๋‚˜์ด์ €
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

# 2๏ธโƒฃ AutoModelForSequenceClassification: classifier ํฌํ•จ
model_cls = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=4)
outputs_cls = model_cls(**inputs)

print("=== AutoModelForSequenceClassification ===")
print("logits shape:", outputs_cls.logits.shape)  # (batch_size, num_labels)
print("logits example:", outputs_cls.logits)

# 3๏ธโƒฃ AutoModel: feature extractor๋งŒ
model_feat = AutoModel.from_pretrained(MODEL_NAME)
outputs_feat = model_feat(**inputs)

print("\n=== AutoModel ===")
print("last_hidden_state shape:", outputs_feat.last_hidden_state.shape)  # (batch_size, seq_len, hidden_dim)
# CLS ํ† ํฐ ์ž„๋ฒ ๋”ฉ๋งŒ ํ™•์ธ
cls_embedding = outputs_feat.last_hidden_state[:, 0, :]
print("CLS embedding shape:", cls_embedding.shape)  # (batch_size, hidden_dim)
print("CLS embedding example:", cls_embedding[0][:5])  # ์ฒซ 5์ฐจ์› ๊ฐ’๋งŒ ์ถœ๋ ฅ

 

=== AutoModelForSequenceClassification ===
logits shape: torch.Size([2, 4])
logits example: tensor([[-0.0906,  0.3809,  0.2833, -0.0410],
        [-0.0896,  0.3786,  0.2914, -0.0469]], grad_fn=<AddmmBackward0>)

=== AutoModel ===
last_hidden_state shape: torch.Size([2, 8, 768])
CLS embedding shape: torch.Size([2, 768])
CLS embedding example: tensor([ 0.0730, -0.5276, -0.2339, -0.0581,  0.0356], grad_fn=<SliceBackward0>)

 

  • ๋ผ๋ฒจ์ด 4๊ฐœ๋ผ ๊ฐ€์ •, ๋ฐฐ์น˜๊ฐ€ 2์ธ ๋ฌธ์žฅ ๋ฐ์ดํ„ฐ 2๊ฐœ๋ฅผ ๋„ฃ์—ˆ์„ ๋•Œ 
    • AutoModelForSequenceClassification ์€ 2๊ฐœ ๋ฌธ์žฅ์— ๋Œ€ํ•ด ๊ฐ๊ฐ์˜ ๋ผ๋ฒจ์— ๋Œ€ํ•œ logits ๊ฐ’์„ ์ถœ๋ ฅํ•œ๋‹ค
    • AutoModel ์€ ๋ฌธ์žฅ ๊ฐ๊ฐ์— ๋Œ€ํ•ด 768์ฐจ์›์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด์€ feature vector๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค

 

 

๐Ÿค” ์—ฌ๊ธฐ์„œ ๋“ค์€ ์˜๋ฌธ์ . ์™œ CLS ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์—์„œ ์•„๋ž˜์ฒ˜๋Ÿผ [ :, 0, : ] ์ธ๋ฑ์‹ฑ์„ ํ•ด์คฌ์„๊นŒ?? 

outputs_feat.last_hidden_state[:, 0, :]

 

์œ„์—์„œ ํ™•์ธํ–ˆ๋“ฏ์ด last_hidden_state ์˜ shape์€ (batch_size, seq_len, hidden_dim) ํ˜•ํƒœ์ด๋‹ค

 

  • batch_size → ๋ฌธ์žฅ ๋ช‡ ๊ฐœ๋ฅผ ํ•œ ๋ฒˆ์— ๋„ฃ์—ˆ๋Š”์ง€, ๋ฌธ์žฅ์˜ ๊ฐœ์ˆ˜
  • seq_len → ๊ฐ€์žฅ ๊ธด ๋ฌธ์žฅ์˜ ๋‹จ์–ด ๊ฐœ์ˆ˜, ํ† ํฐ ๊ธธ์ด (ํ† ํฌ๋‚˜์ด์ €๋กœ ์ชผ๊ฐ  ํ† ํฐ ์ˆ˜, padding ํฌํ•จ)
  • hidden_dim → Transformer๊ฐ€ ๊ฐ ํ† ํฐ์— ๋Œ€ํ•ด ๋งŒ๋“  ์ž„๋ฒ ๋”ฉ ์ฐจ์› (์˜ˆ: 768)

 

์—ฌ๊ธฐ์„œ [CLS] ํ† ํฐ์ด ์–ด๋–ค ์—ญํ• ์„ ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค

 

BERT ๊ณ„์—ด ๋ชจ๋ธ์—์„œ๋Š”:

  • ์ž…๋ ฅ ๋ฌธ์žฅ ๋งจ ์•ž์— ํŠน์ˆ˜ ํ† ํฐ [CLS]๋ฅผ ๋„ฃ์Œ
  • [CLS] ํ† ํฐ์˜ hidden state๋Š” ๋ฌธ์žฅ ์ „์ฒด ์˜๋ฏธ๋ฅผ ์š”์•ฝํ•˜๋Š” ๋ฒกํ„ฐ๋กœ ํ•™์Šต๋จ
  • ๊ทธ๋ž˜์„œ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์—์„œ๋Š” classifier๊ฐ€ [CLS] ๋ฒกํ„ฐ๋งŒ ์‚ฌ์šฉํ•จ

์ฆ‰, [CLS]๋Š” ๋ฌธ์žฅ-level representation์„ ๋‹ด๋‹นํ•˜๋Š” ํ† ํฐ์ด๋‹ค

์•„๋ž˜ ๊ธ€๋“ค์„ ๋ณด๋ฉด ๋” ์ƒ์„ธํ•˜๊ฒŒ ์ž˜ ์„ค๋ช…๋˜์–ด ์žˆ์œผ๋‹ˆ ์ฐธ๊ณ ํ•  ๊ฒƒ

 

https://seungseop.tistory.com/35

 

BERT์˜ [CLS]ํ† ํฐ์€ ์–ด๋–ป๊ฒŒ sentence์˜ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์„๊นŒ?

BERT์™€ ์ด๋กœ๋ถ€ํ„ฐ ํŒŒ์ƒ๋œ ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ชจ๋ธ์—์„œ๋Š” ๊ฐ€์žฅ ์ฒซ ์œ„์น˜์— ๋ฌธ์žฅ ๊ณตํ†ต ํ† ํฐ์ธ [CLS]๋ฅผ ๋‘์–ด ํ•ด๋‹น ์œ„์น˜์˜ ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ๋ฅผ ๋Œ€ํ‘œ ์ž„๋ฒ ๋”ฉ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ์˜ˆ์ปจ๋Œ€, BERT-base ๋ชจ๋ธ์€ ํ† ํฐ์˜ ๊ธธ์ด๊ฐ€ 512์ด

seungseop.tistory.com

 

https://jimmy-ai.tistory.com/338

 

CLS ํ† ํฐ์ด๋ž€? / ํŒŒ์ด์ฌ BERT CLS ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ์ถ”์ถœ ์˜ˆ์ œ

[CLS] ํ† ํฐ์ด๋ž€? BERT, RoBERTa ๋“ฑ์˜ ์–ธ์–ด ๋ชจ๋ธ์—์„œ ๋ฌธ์žฅ ํ† ํฐ๋“ค์ด ์ธ์ฝ”๋”ฉ๋œ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ์œ ์šฉํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์€๋ฐ ๊ฐ€์žฅ ์ฒซ ์œ„์น˜์— ๋ฌธ์žฅ ๊ณตํ†ต ํ† ํฐ์ธ [CLS]๋ฅผ ๋‘์–ด ํ•ด๋‹น ์œ„์น˜์˜ ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ๋ฅผ ๋Œ€

jimmy-ai.tistory.com

 

 

๋”ฐ๋ผ์„œ ์ธ๋ฑ์‹ฑ [ :, 0, : ] ์€ ๋‹ค์Œ์„ ์˜๋ฏธํ•œ๋‹ค

last_hidden_state shape: torch.Size([2, 8, 768])
cls_embedding = outputs_feat.last_hidden_state[:, 0, :]
  • : → batch ์ „์ฒด ์„ ํƒ
  • 0 → sequence ๊ธธ์ด์—์„œ ์ฒซ ๋ฒˆ์งธ ํ† ํฐ [CLS] ์„ ํƒ
  • : → hidden_dim ์ „์ฒด ์„ ํƒ

 

๊ฒฐ๊ณผ:

cls_embedding.shape = (batch_size, hidden_dim)
  • ๊ฐ ๋ฌธ์žฅ๋งˆ๋‹ค ํ•˜๋‚˜์˜ feature vector๋ฅผ ์–ป์Œ
  • ์ด์ œ ์ด ๋ฒกํ„ฐ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ฑฐ๋‚˜ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ, clustering ๋“ฑ ํ™œ์šฉ ๊ฐ€๋Šฅ

 

์ž…๋ ฅ ๋ฌธ์žฅ: "์ด ์˜ํ™” ์ง„์งœ ์ตœ๊ณ ๋‹ค"
ํ† ํฐํ™”:       [CLS]   ์ด    ์˜ํ™”    ์ง„์งœ    ์ตœ๊ณ ๋‹ค  [SEP]
               |      |      |       |        |      |
               v      v      v       v        v      v
ํ† ํฐ ์ž„๋ฒ ๋”ฉ: e_CLS   e_์ด  e_์˜ํ™”  e_์ง„์งœ  e_์ตœ๊ณ ๋‹ค e_SEP
               |    |    |    |    |    |
               v    v    v    v    v    v
+---------------- Transformer Encoder Blocks ----------------+
|   Multi-Head Attention + Feed Forward + LayerNorm ๋ฐ˜๋ณต     |
|   ๊ฐ ํ† ํฐ ๋ฒกํ„ฐ๊ฐ€ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•˜๋„๋ก ์—…๋ฐ์ดํŠธ             |
+------------------------------------------------------------+
               |    |    |    |    |    |
               v    v    v    v    v    v
last_hidden_state (batch_size, seq_len, hidden_dim)
          |    
          v    
CLS ์„ ํƒ: cls_embedding = last_hidden_state[:, 0, :] 
          shape -> (batch_size, hidden_dim)
          |
          v
Feature vector / ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ
(๋ถ„๋ฅ˜๊ธฐ ์ž…๋ ฅ, ์‹œ๊ฐํ™”, ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋“ฑ ํ™œ์šฉ ๊ฐ€๋Šฅ)

์„ ํƒ์ ์œผ๋กœ:
cls_embedding → Linear Layer → Softmax → logits → ํด๋ž˜์Šค
(AutoModelForSequenceClassification)

 

 

 

 

 

 

 

2. Hard case feature embedding ์ถ”์ถœ

  • “hard case”๋Š” ๋ชจ๋ธ์ด ์ž์ฃผ ํ‹€๋ฆฌ๊ฑฐ๋‚˜ confidence๊ฐ€ ๋‚ฎ์€ ์ƒ˜ํ”Œ๋“ค์„ ๋งํ•œ๋‹ค
  • ๊ทธ ๋ฐ์ดํ„ฐ๋“ค์„ ์œ„์—์„œ ๋ฝ‘์€ feature extractor์— ํ†ต๊ณผ์‹œ์ผœ์„œ ์ž„๋ฒ ๋”ฉ์„ ์–ป๋Š”๋‹ค

๊ฒฐ๊ณผ์ ์œผ๋กœ hard_cases_embeddings.shape๋Š” [num_samples, hidden_dim]์ด ๋  ๊ฒƒ์ด๋‹ค
ex) : [2000, 768]

 

hard case sample์„ ๋ฝ‘๋Š” ๋ถ€๋ถ„์€ ์—ฌ๊ธฐ

https://bh-kaizen.tistory.com/61

 

ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ - Hard Example Mining

ํ…์ŠคํŠธ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•˜๋˜ ์ค‘, ์นœ๊ตฌ์—๊ฒŒ ์ข‹์€ ์•„์ด๋””์–ด๋ฅผ ํ•˜๋‚˜ ๋ฐ›์•˜๋‹ค๋จผ์ € ๊ต์ฐจ๊ฒ€์ฆ์„ ๋Œ๋ฆฌ๋ฉด์„œ ๊ฐ ํด๋“œ๋‹น ๊ฒ€์ฆ์…‹์— ๋Œ€ํ•ด ์˜ˆ์ธกํ•œ ๋ผ๋ฒจ๊ณผ ํ™•๋ฅ ๊ฐ’์„ ์ €์žฅํ•˜๋ฉด ๋ฒ ์ด์Šค๋ผ์ธ์ด ์›๋ณธ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์—

bh-kaizen.tistory.com

 

 

์œ„์˜ ๊ธ€๋Œ€๋กœ ๊ต์ฐจ ๊ฒ€์ฆ์„ ์ด์šฉํ•ด ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๋ฉด์„œ ์˜ˆ์ธก์ด ํ‹€๋ฆฐ ํ•˜๋“œ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ 38,713๊ฐœ๋ฅผ ์ €์žฅํ•ด๋‘์—ˆ๋‹ค

์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ํ•˜๋“œ ์ƒ˜ํ”Œ๋“ค์„ pre-trained๋œ BERT ๋ชจ๋ธ์„ ๊ฑฐ์ณ feature vector๋ฅผ ์ถ”์ถœํ•˜์˜€๋‹ค

 

from transformers import AutoTokenizer, AutoModel
import torch
from torch.utils.data import DataLoader, Dataset

# 1. ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ €
MODEL_NAME = "kykim/bert-kor-base"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModel.from_pretrained(MODEL_NAME)
model.eval()  # ์ถ”๋ก  ๋ชจ๋“œ๋กœ ์ „ํ™˜

# 2. ํ•˜๋“œ ์ผ€์ด์Šค ๋ฐ์ดํ„ฐ (์˜ˆ: list of strings)
hard_cases_texts = hard_df['review_normalized'].tolist()  # 38713๊ฐœ ๋ฌธ์žฅ

# 3. Dataset / DataLoader ์ •์˜
class TextDataset(Dataset):
    def __init__(self, texts):
        self.texts = texts
    def __len__(self):
        return len(self.texts)
    def __getitem__(self, idx):
        return self.texts[idx]

dataset = TextDataset(hard_cases_texts)
dataloader = DataLoader(dataset, batch_size=16, shuffle=False)

# 4. Feature ์ถ”์ถœ
all_embeddings = []

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

with torch.no_grad():
    for batch_texts in dataloader:
        # ํ† ํฌ๋‚˜์ด์ฆˆ
        inputs = tokenizer(batch_texts, padding=True, truncation=True, return_tensors="pt").to(device)
        outputs = model(**inputs)
        
        # CLS ๋ฒกํ„ฐ ์ถ”์ถœ
        cls_embeddings = outputs.last_hidden_state[:, 0, :]  # shape: (batch_size, hidden_dim)
        all_embeddings.append(cls_embeddings.cpu())

# 5. ํ•ฉ์น˜๊ธฐ
hard_cases_embeddings = torch.cat(all_embeddings, dim=0)
print("Hard cases embeddings shape:", hard_cases_embeddings.shape)

 

Hard cases embeddings shape: torch.Size([38713, 768])

 

  • ๊ฐ ๋ฌธ์žฅ๋งˆ๋‹ค 768์ฐจ์›์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฒกํ„ฐ๋กœ ์ถ”์ถœํ•˜์˜€๋‹ค
  • ์ž„๋ฒ ๋”ฉ ์ฐจ์›์ด 768์ธ ์ด์œ ๋Š” BERT-base ๊ณ„์—ด ๋Œ€๋ถ€๋ถ„์ด hidden_size = 768 ์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ
  • ์•„๋ž˜์ฒ˜๋Ÿผ ๋‚ด๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์ด ๋ช‡ ์ฐจ์›์œผ๋กœ ์„ค์ •ํ–ˆ๋Š”์ง€ ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค
from transformers import AutoModel

model = AutoModel.from_pretrained("kykim/bert-kor-base")
print(model.config.hidden_size)  # 768
=> 768

 

 

 

 

3. ์ฐจ์› ์ถ•์†Œ ๋ฐ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์‹œ๊ฐํ™”

  • 768์ฐจ์›์€ ์‚ฌ๋žŒ์ด ๋ณผ ์ˆ˜ ์—†์œผ๋‹ˆ๊นŒ, t-SNE๋‚˜ UMAP, PCA ๊ฐ™์€ ๊ฑธ๋กœ ์ถ•์†Œ.
  • ์ถ•์†Œ ํ›„ 2D ํ˜น์€ 3D๋กœ ํ‘œํ˜„ํ•ด์„œ ์ ๋“ค์ด ์–ด๋–ป๊ฒŒ ๋ชจ์ด๋Š”์ง€ ๋ณธ๋‹ค.

 

 t-SNE

import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import umap.umap_ as umap # ์„ค์น˜ ํ•„์š” / pip install umap-learn
from sklearn.cluster import DBSCAN

# embeddings: torch.Tensor (38404, 768)
embeddings = hard_cases_embeddings.cpu().numpy()
labels = hard_df['label'].values  # shape (38404,) 

# ๋ผ๋ฒจ๋ณ„ ์ƒ‰์ƒ ํŒ”๋ ˆํŠธ
colors = ['#FF6B6B', '#FFD93D', '#6BCB77', '#4D96FF']

# 2D
tsne_2d = TSNE(n_components=2, perplexity=30, n_iter=1000, random_state=42)
emb_2d_tsne = tsne_2d.fit_transform(embeddings)

plt.figure(figsize=(10,8))
for i in np.unique(labels):
    plt.scatter(emb_2d_tsne[labels==i,0], emb_2d_tsne[labels==i,1], 
                label=f'Label {i}', s=10, alpha=0.7, c=colors[i])
plt.title("t-SNE (2D)")
plt.legend()
plt.show()

# 3D
tsne_3d = TSNE(n_components=3, perplexity=30, n_iter=1000, random_state=42)
emb_3d_tsne = tsne_3d.fit_transform(embeddings)

fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111, projection='3d')
for i in np.unique(labels):
    ax.scatter(emb_3d_tsne[labels==i,0], emb_3d_tsne[labels==i,1], emb_3d_tsne[labels==i,2],
               label=f'Label {i}', s=10, alpha=0.7)
ax.set_title("t-SNE (3D)")
ax.legend()
plt.show()

 

 

 

 

 

 

PCA

# 2D
pca_2d = PCA(n_components=2, random_state=42)
emb_2d_pca = pca_2d.fit_transform(embeddings)

plt.figure(figsize=(10,8))
for i in np.unique(labels):
    plt.scatter(emb_2d_pca[labels==i,0], emb_2d_pca[labels==i,1],
                label=f'Label {i}', s=10, alpha=0.7, c=colors[i])
plt.title("PCA (2D)")
plt.legend()
plt.show()

# 3D
pca_3d = PCA(n_components=3, random_state=42)
emb_3d_pca = pca_3d.fit_transform(embeddings)

fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111, projection='3d')
for i in np.unique(labels):
    ax.scatter(emb_3d_pca[labels==i,0], emb_3d_pca[labels==i,1], emb_3d_pca[labels==i,2],
               label=f'Label {i}', s=10, alpha=0.7)
ax.set_title("PCA (3D)")
ax.legend()
plt.show()

 

 

 

 

 

 

 

 

UMAP

# 2D
umap_2d = umap.UMAP(n_components=2, n_neighbors=15, min_dist=0.1, random_state=42)
emb_2d_umap = umap_2d.fit_transform(embeddings)

plt.figure(figsize=(10,8))
for i in np.unique(labels):
    plt.scatter(emb_2d_umap[labels==i,0], emb_2d_umap[labels==i,1],
                label=f'Label {i}', s=10, alpha=0.7, c=colors[i])
plt.title("UMAP (2D)")
plt.legend()
plt.show()

# 3D
umap_3d = umap.UMAP(n_components=3, n_neighbors=15, min_dist=0.1, random_state=42)
emb_3d_umap = umap_3d.fit_transform(embeddings)

fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111, projection='3d')
for i in np.unique(labels):
    ax.scatter(emb_3d_umap[labels==i,0], emb_3d_umap[labels==i,1], emb_3d_umap[labels==i,2],
               label=f'Label {i}', s=10, alpha=0.7)
ax.set_title("UMAP (3D)")
ax.legend()
plt.show()

 

 

 

์Œ... ๋ณด๋‹ค์‹œํ”ผ ๋ผ๋ฒจ์˜ ๊ตฌ๋ถ„์ด๊ณ  ๋ญ๊ณ  ์—†๊ณ  ๊ทธ๋ƒฅ ๋‹ค ํ•˜๋‚˜์˜ ๋ผ๋ฒจ ๋งˆ๋ƒฅ ๋ญ‰์ณ์žˆ๋‹ค

ํ˜น์‹œ 3D plotly๋กœ ๋Œ๋ ค๊ฐ€๋ฉด์„œ ๋ณด๋ฉด ๋ ˆ์ด์–ด๋กœ ๋‚˜๋ˆ ์ ธ์žˆ์ง€ ์•Š์„๊นŒ...?

 

3D plotly ์‹œ๊ฐํ™” ์ฝ”๋“œ

๋”๋ณด๊ธฐ
import numpy as np
import plotly.express as px
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import umap.umap_ as umap

# embeddings: torch.Tensor (38404, 768)
embeddings = loaded_embeddings.cpu().numpy()
labels = hard_df['label'].values  # shape (38404,)
label_names = {0: "๊ฐ•ํ•œ ๋ถ€์ •", 1: "์•ฝํ•œ ๋ถ€์ •", 2: "์•ฝํ•œ ๊ธ์ •", 3: "๊ฐ•ํ•œ ๊ธ์ •"}

# ์ƒ‰์ƒ ํŒ”๋ ˆํŠธ (Plotly์šฉ)
colors = ['#FF6B6B', '#FFD93D', '#6BCB77', '#4D96FF']

# --- t-SNE 3D ---
tsne_3d = TSNE(n_components=3, perplexity=30, n_iter=1000, random_state=42)
emb_3d_tsne = tsne_3d.fit_transform(embeddings)

fig_tsne = px.scatter_3d(
    x=emb_3d_tsne[:,0], y=emb_3d_tsne[:,1], z=emb_3d_tsne[:,2],
    color=[label_names[l] for l in labels],
    title="t-SNE (3D) Interactive",
    color_discrete_sequence=colors,
    opacity=0.7
)
fig_tsne.update_traces(marker=dict(size=3))
fig_tsne.show()


# --- PCA 3D ---
pca_3d = PCA(n_components=3, random_state=42)
emb_3d_pca = pca_3d.fit_transform(embeddings)

fig_pca = px.scatter_3d(
    x=emb_3d_pca[:,0], y=emb_3d_pca[:,1], z=emb_3d_pca[:,2],
    color=[label_names[l] for l in labels],
    title="PCA (3D) Interactive",
    color_discrete_sequence=colors,
    opacity=0.7
)
fig_pca.update_traces(marker=dict(size=3))
fig_pca.show()


# --- UMAP 3D ---
umap_3d = umap.UMAP(n_components=3, n_neighbors=15, min_dist=0.1, random_state=42)
emb_3d_umap = umap_3d.fit_transform(embeddings)

fig_umap = px.scatter_3d(
    x=emb_3d_umap[:,0], y=emb_3d_umap[:,1], z=emb_3d_umap[:,2],
    color=[label_names[l] for l in labels],
    title="UMAP (3D) Interactive",
    color_discrete_sequence=colors,
    opacity=0.7
)
fig_umap.update_traces(marker=dict(size=3))
fig_umap.show()

t-SNE (3D)

 

PCA (3D)

 

UMAP (3D)

 

์–ด๋ฆผ๋„ ์—†์ง€!!

ํŒŒ๋ž€์ƒ‰ 3๋ฒˆ ๋ผ๋ฒจ, ๊ฐ•ํ•œ ๋ถ€์ •์ด ์ „์ฒด์ ์œผ๋กœ ํผ์ ธ์žˆ๊ณ  ๊ทธ๋‚˜๋งˆ ์ดˆ๋ก์ƒ‰ 2๋ฒˆ ๊ฐ•ํ•œ ๊ธ์ •์ด ์กฐ๊ธˆ์ด๋‚˜๋งˆ ๋ถ„๋ฆฌ๊ฐ€ ๋˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค

 

 

 

 

๊ทธ๋Ÿผ ํ‹€๋ฆฐ ๋ฐ์ดํ„ฐ๋“ค์ด ์•„๋‹ˆ๋ผ ์ž˜ ๋งž์ถ”๋Š” ๋ฐ์ดํ„ฐ๋“ค, ๋†’์€ ํ™•๋ฅ ๋กœ ์ •๋‹ต ๋ผ๋ฒจ์„ ๋งž์ถ˜ ๋ฐ์ดํ„ฐ๋“ค์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ณผ์—ฐ ํด๋Ÿฌ์Šคํ„ฐ๋ง์ด ์ž˜ ๋ ๊นŒ? 

27๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ค‘ ์ •์ƒ ๋ผ๋ฒจ์„ 90% ์ด์ƒ ํ™•๋ฅ ๋กœ ์˜ˆ์ธกํ•œ ๋ฐ์ดํ„ฐ๋“ค์€ 184,404๊ฐœ

 

 

 

์ด๋ ‡๊ฒŒ ๋ณด๋‹ˆ ํ•˜๋“œ ์ผ€์ด์Šค์™€๋Š” ๋‹ฌ๋ฆฌ ๋ผ๋ฒจ๋ณ„๋กœ ์–ด๋А์ •๋„ ์ž๋ฆฌ๊ฐ€ ์žกํ˜€์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค

 

 

 

 

 

 

๊ธฐ๋Œ€ํ–ˆ๋˜ ๊ฒƒ๋งŒํผ ๊น”๋”ํ•˜๊ฒŒ ๋ถ„๋ฆฌ๋˜์ง„ ์•Š์ง€๋งŒ ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์˜๋ฏธํ•œ ์ฐจ์ด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค

๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ํŒŒ์ธ ํŠœ๋‹์ด ์•„๋‹Œ pre-trained๋กœ ์‚ฌ์šฉํ•ด๋ณด์•˜์œผ๋‚˜ ํŒŒ์ธ ํŠœ๋‹์ด ๋œ ๋ชจ๋ธ๋กœ ๋ดค๋‹ค๋ฉด ์ด ๋ผ๋ฒจ๊ฐ„์˜ ๊ตฌ๋ถ„์„ ๋” ์ •ํ™•ํ•˜๊ฒŒ ํ•ด์ฃผ์ง€ ์•Š์„๊นŒ ์‹ถ์—ˆ๋‹ค

 

์ด๋Ÿฐ ์‹œ๊ฐํ™”๋Š” ํ•™์Šตํ•˜๊ธฐ ์ „ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”๋ผ๋ณด๋Š” ์‹œ์•ผ๋ผ๊ณ ๋„ ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค

 

 

 

 

 

 

 

4. ๋ผ๋ฒจ๋ณ„๋กœ ๊ตฌ๋ถ„ํ•ด์„œ ๊ด€์ฐฐ

 

 

ํ•œ ๋ฒˆ์— 4๊ฐœ ๋ผ๋ฒจ(๊ฐ•๋ถ€์ •·์•ฝ๋ถ€์ •·์•ฝ๊ธ์ •·๊ฐ•๊ธ์ •)์„ ๋‹ค ๋ณด๋Š” ๋Œ€์‹ :

  1. ๊ธ์ • vs ๋ถ€์ • (์ด์ง„)
  2. ์•ฝ vs ๊ฐ• (๊ฐ•๋„ ๊ตฌ๋ถ„)

์ด๋ ‡๊ฒŒ ๋‘ ์ถ•์œผ๋กœ ๋‚˜๋ˆ ์„œ ์‹œ๊ฐํ™”ํ•˜๋ฉด ๊ตฐ์ง‘์ด ์ข€ ๋” ๋ช…ํ™•ํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ๋‹ค

 

 

ํ˜„์žฌ ๋ผ๋ฒจ:

  • 0: ๊ฐ•ํ•œ ๋ถ€์ • 1: ์•ฝํ•œ ๋ถ€์ • 2: ์•ฝํ•œ ๊ธ์ • 3: ๊ฐ•ํ•œ ๊ธ์ •

๋‘ ์ถ•์„ ์ •์˜ํ•œ๋‹ค:

  • X์ถ•: ๊ธ์ •(+) ↔ ๋ถ€์ •(-)
  • Y์ถ•: ๊ฐ•(↑) ↔ ์•ฝ(↓)

 

import numpy as np
import matplotlib.pyplot as plt

label_names = {0: "๊ฐ•ํ•œ ๋ถ€์ •", 1: "์•ฝํ•œ ๋ถ€์ •", 2: "์•ฝํ•œ ๊ธ์ •", 3: "๊ฐ•ํ•œ ๊ธ์ •"}

# ์˜ˆ์‹œ: (38404,) ํ˜•ํƒœ์˜ ์ •๋‹ต ๋ผ๋ฒจ ๋ฒกํ„ฐ
# y = np.array([...])

# ๊ฐ์„ฑ polarity: ๋ถ€์ •(0,1)=0 / ๊ธ์ •(2,3)=1
polarity = np.where(y < 2, 0, 1)

# ๊ฐ์„ฑ intensity: ์•ฝ(1,2)=0 / ๊ฐ•(0,3)=1
intensity = np.where((y == 0) | (y == 3), 1, 0)

 

 

 

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

# # t-SNE 2D
# tsne_2d = TSNE(n_components=2, perplexity=30, n_iter=1000, random_state=42)
# emb_2d_tsne = tsne_2d.fit_transform(embeddings)

# ๐ŸŽจ ๊ณ ์ฑ„๋„ ์ปฌ๋Ÿฌ๋งต (์ง์ ‘ ์ •์˜)
from matplotlib.colors import LinearSegmentedColormap
bright_cmap = LinearSegmentedColormap.from_list("bright_red_green", [
    (0.0, "#FF0000"),   # pure red
    (0.5, "#FFFF00"),   # bright yellow
    (1.0, "#00FF00")    # pure green
])

# โœ… Polarity ์‹œ๊ฐํ™” (๋ถ€์ •=๋นจ๊ฐ•, ๊ธ์ •=์ดˆ๋ก)
plt.figure(figsize=(8, 6))
plt.scatter(
    emb_2d_tsne[:, 0], emb_2d_tsne[:, 1],
    c=polarity, cmap=bright_cmap,  # ๊ณ ์ฑ„๋„ ๋นจ๊ฐ•-๋…ธ๋ž‘-์ดˆ๋ก
    s=10, alpha=0.7, edgecolors='none'
)
plt.title("Polarity (Red=๋ถ€์ •, Green=๊ธ์ •)")
plt.xlabel("t-SNE Dim 1")
plt.ylabel("t-SNE Dim 2")
plt.grid(False)
plt.show()


# โœ… Intensity ์‹œ๊ฐํ™” (์•ฝ→๊ฐ•: ๋ฐ์€→์ง™์€, ๋” ์จํ•œ ๋ฒ„์ „)
vivid_viridis = LinearSegmentedColormap.from_list("vivid_viridis", [
    (0.0, "#B7E3FF"),  # ๋ฐ์€ ํ•˜๋Š˜์ƒ‰
    (0.5, "#4D9BE6"),  # ์ง„ํ•œ ํŒŒ๋ž‘
    (1.0, "#1E3A8A")   # ๋งค์šฐ ์ง™์€ ๋‚จ์ƒ‰
])

plt.figure(figsize=(8, 6))
plt.scatter(
    emb_2d_tsne[:, 0], emb_2d_tsne[:, 1],
    c=intensity, cmap=vivid_viridis,
    s=10, alpha=0.7, edgecolors='none'
)
plt.title("Intensity (Light=์•ฝ, Dark=๊ฐ•)")
plt.xlabel("t-SNE Dim 1")
plt.ylabel("t-SNE Dim 2")
plt.grid(False)
plt.show()

 

 

Hard Case

 

 

Hard Case

 

 

 

 

ํ•˜๋“œ ์ผ€์ด์Šค๋“ค์˜ ํ”ผ์ณ ๋ฒกํ„ฐ๋“ค๋กœ ์‚ดํŽด๋ณด๋ฉด ์ด๋Ÿฐ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค

์‚ฌ์‹ค์ƒ ๊ธ์ • ๋ถ€์ •๊ณผ ๊ฐ•์•ฝ ๋ชจ๋‘ ๋งŽ์ด ์„ž์—ฌ์žˆ์–ด์„œ ๊ตฌ๋ถ„์ด ์ œ๋Œ€๋กœ ๋˜์ง€ ์•Š๋Š”๋‹ค

๊ทธ๋Ÿผ ์ •๋‹ต ๋ผ๋ฒจ์„ 0.9 ์ด์ƒ์˜ ๋†’์€ ํ™•๋ฅ ๋กœ ๋งž์ถ˜ ๋ฐ์ดํ„ฐ๋“ค์€ ์–ด๋–ป๊ฒŒ ๋‚˜์˜ฌ๊นŒ??

 

 

 

True Case (0.9>)

 

 

True Case (0.9>)

 

 

์˜ค.. ํ™•์‹คํžˆ ๋ถ„ํฌ ์ž์ฒด๊ฐ€ ๋‹ค๋ฅด๋‹ค

๋” ์ •ํ™•ํ•˜๊ฒŒ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ทธ๋ž˜ํ”„๋“ค์„ ๋ถ™์—ฌ๋ณด๋ฉด ํ›จ์”ฌ ๋” ๋ˆˆ์— ์ž˜ ๋“ค์–ด์˜จ๋‹ค

 

Hard Case vs True Case (0.9>)

 

Hard Case vs True Case (0.9>)

 

์ •๋‹ต ํ™•๋ฅ ์ด 90% ์ด์ƒ์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์™ผ์ชฝ, ์œ„์ชฝ์˜ ํด๋Ÿฌ์Šคํ„ฐ์ฒ˜๋Ÿผ ๋ช…ํ™•ํžˆ ๊ตฌ๋ถ„์ด ์•ˆ๋˜๋Š” ์ง€์ ์ด ์กด์žฌํ•œ๋‹ค
์ €๊ธฐ์— ๋ถ„ํฌ๋œ ์ง€์ ์˜ ๋ฐ์ดํ„ฐ๋“ค์„ ๋ฝ‘์•„๋‚ด์„œ ๋ถ„์„ ํ›„ ์ฆ๊ฐ• ๋ฐ ์ „์ฒ˜๋ฆฌ๋ฅผ ํ•ด์ค€๋‹ค๋ฉด ์ด๊ฒƒ๋„ ์œ ํšจํ•œ ๋ฐฉ๋ฒ•์ด ๋  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค

 

 

 

 

 

 

 


 

 

 

 

 

 

 

๊ทธ๋ž˜์„œ ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋“ค์„ ํ†ตํ•ด ๋ฌด์—‡์„ ์•Œ ์ˆ˜ ์žˆ์„๊นŒ?

๊ทธ๋ƒฅ ๊ทธ๋ž˜ํ”„๋งŒ ์ฐ์–ด๋ณด๊ณ  ๋๋‚˜๋ฉด ๊ทธ๊ฑด ๋ถ„์„์ด ์•„๋‹ˆ๋ผ ์‹œ๊ฐํ™”์— ๋ถˆ๊ณผํ•˜๋‹ค

 

 

โ‘  ํด๋ž˜์Šค ๊ฐ„ ๋ถ„๋ฆฌ๋„ (Separability)

  • ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋ช…ํ™•ํžˆ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ๋‹ค๋ฉด:
    ๋ชจ๋ธ์˜ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ๊ฐ ๊ฐ์ • ํด๋ž˜์Šค๊ฐ€ ์ž˜ ๊ตฌ๋ถ„๋˜๊ณ  ์žˆ๋‹ค๋Š” ๋œป.
    → ์ฆ‰, ์ด๋ฏธ ๋ชจ๋ธ์ด ๊ฐ์ •์˜ ๊ฒฝ๊ณ„๋ฅผ ์ž˜ ํ•™์Šตํ•˜๊ณ  ์žˆ๋‹ค๋Š” ์˜๋ฏธ.
  • ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๊ฒน์ณ ์žˆ๋‹ค๋ฉด:
    ํ•ด๋‹น ๊ฐ์ •๋“ค ์‚ฌ์ด์˜ ํ‘œํ˜„์  ๊ฒฝ๊ณ„๊ฐ€ ๋ถˆ๋ถ„๋ช…ํ•จ.
    ์˜ˆ: ์•ฝํ•œ ๊ธ์ •(2)๊ณผ ์•ฝํ•œ ๋ถ€์ •(1)์ด ์„ž์—ฌ ์žˆ๋‹ค๋ฉด, ๋ชจ๋ธ์ด ๋ฌธ์žฅ์˜ "๊ฐ•์•ฝ" ํ‘œํ˜„์„ ๊ตฌ๋ถ„ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ.

 

๊ทธ๋ž˜ํ”„๋“ค์„ ๋ณด๊ณ ์„œ ํŠนํžˆ ๊ฒฝ๊ณ„๋ฅผ ์ž˜ ๊ตฌ๋ถ„ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ผ๋ฒจ์— ๋Œ€ํ•ด ์ง‘์ค‘์ ์œผ๋กœ ์ฆ๊ฐ•์„ ์‹œ์ผœ์ฃผ๋Š” ๊ฒƒ์ด ์œ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค

๋˜๋Š” ๊ทธ๋Ÿฐ ๋ผ๋ฒจ๋“ค์˜ ํ…์ŠคํŠธ๋ฅผ ํ™•์ธํ•˜์—ฌ ๋…ธ์ด์ฆˆ๋‚˜ ์ด์ƒ์น˜ ๋“ฑ์„ ํ™•์ธํ•ด ์ œ๊ฑฐํ•ด์ฃผ๋Š” ๋“ฑ์˜ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์„ ๊ฐœ์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค

 

 

โ‘ก ํด๋ž˜์Šค ๋‚ด๋ถ€ ์‘์ง‘๋„ (Intra-class compactness)

  • ๊ฐ™์€ ๋ผ๋ฒจ์˜ ์ƒ˜ํ”Œ๋“ค์ด ๋ฐ€์ง‘๋˜์–ด ์žˆ๋‹ค๋ฉด:
    ํ•ด๋‹น ํด๋ž˜์Šค ๋‚ด ๋ฌธ์žฅ ํ‘œํ˜„์ด ์ผ๊ด€์„ฑ ์žˆ๊ฒŒ ์ธ์ฝ”๋”ฉ๋˜๊ณ  ์žˆ์Œ.
  • ๊ฐ™์€ ๋ผ๋ฒจ์ž„์—๋„ ํผ์ ธ ์žˆ๋‹ค๋ฉด:
    ํ•ด๋‹น ๊ฐ์ •์˜ ํ‘œํ˜„ ๋ฐฉ์‹์ด ๋‹ค์–‘ํ•˜๊ฑฐ๋‚˜ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์ด ๋ถˆ๊ท ์ผํ•จ์„ ์˜๋ฏธ.

 

์œ„์™€ ๋น„์Šทํ•œ ๋‚ด์šฉ์ด์ง€๋งŒ ํผ์ ธ ์žˆ๋Š” ํŠน์ • ํด๋ž˜์Šค์˜ ์ƒ˜ํ”Œ๋“ค์„ ์‚ดํŽด์„œ, ๋ฐ์ดํ„ฐ ๋…ธ์ด์ฆˆ(๋ผ๋ฒจ ์˜ค๋ฅ˜, ์˜คํƒ€, ์ค‘์˜์  ํ‘œํ˜„)๋ฅผ ์ •์ œํ•œ๋‹ค

์˜ˆ๋ฅผ ๋“ค์–ด “์•ฝํ•œ ๊ธ์ •”์ด ๋„ˆ๋ฌด ํฉ์–ด์ ธ ์žˆ๋‹ค๋ฉด, ๊ฐ•๋„์— ๋Œ€ํ•œ ํ‘œํ˜„ ๊ธฐ์ค€์ด ๋ชจํ˜ธํ•  ์ˆ˜ ์žˆ๋‹ค → ๋ผ๋ฒจ๋ง ๊ธฐ์ค€ ์žฌ์ ๊ฒ€.

 

 

โ‘ข Hard Case์˜ ์œ„์น˜

  • ํ•˜๋“œ์ผ€์ด์Šค(hard samples)๋ฅผ ์‹œ๊ฐํ™”์— ํ•จ๊ป˜ ํ‘œ์‹œํ•˜๋ฉด,
    ์ด๋“ค์ด ํด๋Ÿฌ์Šคํ„ฐ ๊ฒฝ๊ณ„ ๊ทผ์ฒ˜์— ๋ชฐ๋ ค ์žˆ๋Š”์ง€, ์•„๋‹ˆ๋ฉด ํŠน์ • ์˜์—ญ์— ์ง‘์ค‘๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธ ๊ฐ€๋Šฅ
  • ๊ฒฝ๊ณ„ ๊ทผ์ฒ˜์— ๋ชฐ๋ ค ์žˆ๋‹ค๋ฉด:
    ๊ฐ์ • ๊ฐ„ ๊ฒฝ๊ณ„๊ฐ€ ์• ๋งคํ•œ ๋ฌธ์žฅ๋“ค์ด ๋ฌธ์ œ์˜ ์ฃผ ์›์ธ
  • ํŠน์ • ์˜์—ญ(์˜ˆ: “์•ฝํ•œ ๊ธ์ •” ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด๋ถ€ ํ•œ ๋ถ€๋ถ„)์— ์ง‘์ค‘๋˜์–ด ์žˆ๋‹ค๋ฉด:
    ํ•ด๋‹น ํ‘œํ˜„ ์œ ํ˜•(์˜ˆ: “์ข‹๊ธด ํ•œ๋ฐ ์•„์‰ฌ์›€์ด ์žˆ๋‹ค”)์„ ๋ชจ๋ธ์ด ํ•™์Šตํ•˜์ง€ ๋ชปํ•œ ๊ฒƒ

ํ•˜๋“œ์ผ€์ด์Šค ๋ฌธ์žฅ๋งŒ ๋”ฐ๋กœ ๋ชจ์•„ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(์—ญ๋ฒˆ์—ญ, synonym replacement) ์ˆ˜ํ–‰.

๋ชจ๋ธ์ด ์•ฝํ•œ ๊ฐ์ • ๊ฒฝ๊ณ„ ํ‘œํ˜„์„ ๊ฐ•ํ™”ํ•˜๋„๋ก ํ•™์Šต.

 

์ด ๋ถ€๋ถ„๊นŒ์ง€๋Š” ํ•ด๋ณด์ง€ ๋ชปํ–ˆ์œผ๋‚˜ ์ž˜ ๋งž์ถ˜ ์ƒ˜ํ”Œ vs ํ•˜๋“œ ์ผ€์ด์Šค 2๊ฐœ๋กœ ์‹œ๊ฐํ™”๋ฅผ ํ•ด๋ณด๋ฉด ๋˜ ๋‹ค๋ฅธ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ ์‹ถ๋‹ค

 

โžก๏ธ ์˜ˆ์ธก์„ ์™„์ „ํžˆ ํ‹€๋ฆฌ๊ฒŒ ํ•œ ํ•˜๋“œ ์ผ€์ด์Šค๋“ค, ๋Œ€๋žต 16000๊ฐœ์— ํ•ด๋‹นํ•˜๋Š” ํ…์ŠคํŠธ์— ๋Œ€ํ•ด back-translation์„ ํ†ตํ•ด ์•ฝ 16000๊ฐœ ํ…์ŠคํŠธ๋ฅผ ์ฆ๊ฐ•์‹œํ‚จ ํ›„ ๋ฒ ์ด์Šค๋ผ์ธ์— ๋Œ€ํ•ด ํ•™์Šต์‹œ์ผœ๋ณด์•˜์œผ๋‚˜ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ์—†์—ˆ๋‹ค. 

์•„๋งˆ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹ ์•ฝ 27๋งŒ๊ฐœ ์ค‘ ์ผ๋ถ€์— ๋ถˆ๊ณผํ•ด ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง„ ๋ชปํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค

 

 

 

โ‘ฃ Pre-trained vs Fine-tuned ๋น„๊ต

  • ์ง€๊ธˆ์€ pre-trained ์ž„๋ฒ ๋”ฉ์ด์ง€๋งŒ, fine-tuning ํ›„ ๊ฐ™์€ ์‹œ๊ฐํ™”๋ฅผ ๋‹ค์‹œ ํ•˜๋ฉด
    ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ํ›จ์”ฌ ๋” ๋ช…ํ™•ํ•˜๊ฒŒ ๋ถ„๋ฆฌ๋˜๋Š” ๊ฑธ ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฑฐ์•ผ.
  • ์ฆ‰, fine-tuning์ด “semantic space๋ฅผ ๊ฐ์ •๋ณ„๋กœ ์žฌ๋ฐฐ์น˜ํ•˜๋Š” ๊ณผ์ •”์ž„์„ ์‹œ๊ฐ์ ์œผ๋กœ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์Œ.

fine-tuning ์ „ํ›„ ์‹œ๊ฐํ™”๋ฅผ ๋น„๊ตํ•˜์—ฌ, ๋ชจ๋ธ ํ•™์Šต ํšจ๊ณผ๋ฅผ ์ง๊ด€์ ์œผ๋กœ ํ‰๊ฐ€ ๊ฐ€๋Šฅ.

“ํด๋Ÿฌ์Šคํ„ฐ ๋ถ„๋ฆฌ๋„ ↑”๊ฐ€ ์‹ค์ œ accuracy ํ–ฅ์ƒ๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€๋„ ํ™•์ธ ๊ฐ€๋Šฅ.

 

์ด ๋ถ€๋ถ„์€ ํ•™์Šต ํ›„ ๋ณด๊ณ ์„œ ์ž‘์„ฑ ๋“ฑ์— ์‹œ๊ฐํ™” ์ž๋ฃŒ๋กœ ๊ฐ™์ด ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์—๋„ ์œ ์šฉํ•ด๋ณด์ธ๋‹ค

๋ช…ํ™•ํ•œ ์„ฑ๋Šฅ ์ง€ํ‘œ์™€ ์ˆ˜์น˜๋“ค, ๊ทธ๋ž˜ํ”„ ๋“ฑ์œผ๋กœ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์œผ๋‚˜ ๊ฐ ๋ฌธ์žฅ๋“ค์ด ์ขŒํ‘œ์ƒ ์–ด๋””์— ์œ„์น˜ํ–ˆ์œผ๋ฉฐ ๋ชจ๋ธ์ด ์ด๋ฅผ ์ž˜ ํ•™์Šตํ•˜์—ฌ ์–ผ๋งˆ๋‚˜ ์ž˜ ๋ถ„๋ฅ˜ํ–ˆ๋Š”์ง€๋ฅผ ๋” ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ฒŒ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋‹ค

 

 

 

 

 

 

 


 

 

์ฐธ๊ณ  ์ž๋ฃŒ

 

https://wikidocs.net/302359

 

07. ์ฃผ์š” Auto ํด๋ž˜์Šค

## [์˜คํ†  ํด๋ž˜์Šค] ⇒ Transformer ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ๋ชจ๋ธ์ด๋‚˜ ํ† ํฌ๋‚˜์ด์ € ๊ฐ™์€ ๊ฐ์ฒด๋ฅผ ์ด๋ฆ„๋งŒ์œผ๋กœ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ํŽธ์˜ ํด๋ž˜์Šค - ํ—ˆ๊น…ํŽ˜์ด์Šค์— ์žˆ๋Š” ์ˆ˜๋งŽ…

wikidocs.net

 

 

https://huggingface.co/docs/transformers/ko/model_doc/auto

 

Auto ํด๋ž˜์Šค

(๋ฒˆ์—ญ์ค‘) ํšจ์œจ์ ์ธ ํ•™์Šต ๊ธฐ์ˆ ๋“ค

huggingface.co

 

 

https://jimmy-ai.tistory.com/539

 

ํŒŒ์ด์ฌ UMAP ์ฐจ์› ์ถ•์†Œ ๋ฐ ์‹œ๊ฐํ™” ์˜ˆ์ œ

์•ˆ๋…•ํ•˜์„ธ์š”.์ด๋ฒˆ ๊ธ€์—์„œ๋Š” python์—์„œ ๋Œ€ํ‘œ์ ์ธ ์ฐจ์› ์ถ•์†Œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ธumap์„ ํ†ตํ•ด์„œ ์ฐจ์› ์ถ•์†Œ๋ฅผ ํ•ด๋ณด๊ณ  ์‹œ๊ฐํ™”๋กœ ๊ฒฐ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๋Š” ์˜ˆ์ œ๋ฅผ ๋‹ค๋ฃจ์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ชจ๋“ˆ ์„ค์น˜UMAP ์‹œ๊ฐํ™”๋ฅผ ์œ„

jimmy-ai.tistory.com