liminal 🦠on Nostr: Now that's a statement i can't relate more to 😅 I first encountered LDA from a ...
Now that's a statement i can't relate more to 😅
I first encountered LDA from a previous project -> "find k topics in n documents" sounds great until you realize you need to be pretty confident in how many topics there are. Then, not wanting to be sure if i can impose that assumption, i go to heirarchical LDA, but in both cases i just didn't know how to interpret the clusters.
embedding models, being primarily trained on natural language (bert, gpt) just need to associate coocurring words together. You get some high dimensional vector and the idea is if concepts are similar, they'll be close to each other in this vector space. So you can cluster spatially without needing to assume how many there are 🙂
Published at
2024-06-04 01:50:36Event JSON
{
"id": "75862eaa43199bd367dcfd9164f8cb3844e890b7081b38b8ccf0d0b35233c66f",
"pubkey": "dc4cd086cd7ce5b1832adf4fdd1211289880d2c7e295bcb0e684c01acee77c06",
"created_at": 1717458636,
"kind": 1,
"tags": [
[
"e",
"e4a08194d1659dea4dfba4ed876730dca79cb7c0c1a1a5397b980998a053bd8d",
"",
"root"
],
[
"e",
"b704388d68bf1438e73a85d5a99e7841888fff7a6380ac75b35f0195ec50fdce"
],
[
"e",
"79d2537e3af67e0686a607e1da3e9aca6db896539f8b79a9d65d26443b5824d6",
"",
"reply"
],
[
"p",
"7bdef7bdebb8721f77927d0e77c66059360fa62371fdf15f3add93923a613229"
],
[
"p",
"dc4cd086cd7ce5b1832adf4fdd1211289880d2c7e295bcb0e684c01acee77c06"
]
],
"content": "Now that's a statement i can't relate more to 😅\n\nI first encountered LDA from a previous project -\u003e \"find k topics in n documents\" sounds great until you realize you need to be pretty confident in how many topics there are. Then, not wanting to be sure if i can impose that assumption, i go to heirarchical LDA, but in both cases i just didn't know how to interpret the clusters.\n\nembedding models, being primarily trained on natural language (bert, gpt) just need to associate coocurring words together. You get some high dimensional vector and the idea is if concepts are similar, they'll be close to each other in this vector space. So you can cluster spatially without needing to assume how many there are 🙂",
"sig": "07646a206f5528e52a67a2bbac8dff0789491cd37bd5d942be468493f662cb4ef945ace8d037e591392104b570037c32b8e7fa30a46d490925a4083ea0af1674"
}