In deep learning-based classification tasks, the softmax function’s
temperature parameter $T$ critically influences the output distribution and
overall performance. This study presents a novel theoretical insight that the
optimal temperature $T^*$ is uniquely determined by the dimensionality of the
feature representations, thereby enabling training-free determination of $T^*$.
Despite this theoretical grounding, empirical evidence reveals that $T^*$
fluctuates under practical conditions owing to variations in models, datasets,
and other confounding factors. To address these influences, we propose and
optimize a set of temperature determination coefficients that specify how $T^*$
should be adjusted based on the theoretical relationship to feature
dimensionality. Additionally, we insert a batch normalization layer immediately
before the output layer, effectively stabilizing the feature space. Building on
these coefficients and a suite of large-scale experiments, we develop an
empirical formula to estimate $T^*$ without additional training while also
introducing a corrective scheme to refine $T^*$ based on the number of classes
and task complexity. Our findings confirm that the derived temperature not only
aligns with the proposed theoretical perspective but also generalizes
effectively across diverse tasks, consistently enhancing classification
performance and offering a practical, training-free solution for determining
$T^*$.
Cet article explore les excursions dans le temps et leurs implications.
Télécharger PDF:
2504.15594v1