[D] Research on Self-supervised fine tunning of “sentence” embeddings?
Typical transformer models can output per token embeddings, people will use the mean of all embeddings within a “sentence” to create a “sentence” embedding that can be used for low-data downstream tasks.
I feel a lot gets lost in just taking the mean.
Assuming you can’t change your transformer, what are ways of fine tunning the aggregation operation to a particular dataset (assuming no labels)?
Bonus would be reducing the dimensionality of the sentence embeddings.
I’m actually interested in non-NLP applications, so looking for general strategies.
submitted by /u/LetsTacoooo
[link] [comments]
Like
0
Liked
Liked