We present a novel reinforcement learning (RL) approach for solving the
classical 2-level atom non-LTE radiative transfer problem by framing it as a
control task in which an RL agent learns a depth-dependent source function
$S(\tau)$ that self-consistently satisfies the equation of statistical
equilibrium (SE). The agent’s policy is optimized entirely via reward-based
interactions with a radiative transfer engine, without explicit knowledge of
the ground truth. This method bypasses the need for constructing approximate
lambda operators ($\Lambda^*$) common in accelerated iterative schemes.
Additionally, it requires no extensive precomputed labeled datasets to extract
a supervisory signal, and avoids backpropagating gradients through the complex
RT solver itself. Finally, we show through experiment that a simple feedforward
neural network trained greedily cannot solve for SE, possibly due to the moving
target nature of the problem. Our $\Lambda^*-\text{Free}$ method offers
potential advantages for complex scenarios (e.g., atmospheres with enhanced
velocity fields, multi-dimensional geometries, or complex microphysics) where
$\Lambda^*$ construction or solver differentiability is challenging.
Additionally, the agent can be incentivized to find more efficient policies by
manipulating the discount factor, leading to a reprioritization of immediate
rewards. If demonstrated to generalize past its training data, this RL
framework could serve as an alternative or accelerated formalism to achieve SE.
To the best of our knowledge, this study represents the first application of
reinforcement learning in solar physics that directly solves for a fundamental
physical constraint.
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:
2504.15679v1