2023-11-29 18:27
Was reading https://arxiv.org/abs/2311.17035 and wondered: Is the degree of extractable data an indication of incomplete training?
If the model is outright memorizing large chunks of training data, that seems to indicate that some parameters aren't being used efficiently. That's model entropy that didn't go toward rules/generalization.
Which in turn suggests that some sort of modification to the training loss function. Maybe that the entropy over the final softmax shouldn't be _too_ low?