A type of machine learning in which a model learns to optimize its behavior according to a reward function by interacting with and receiving feedback from an environment.
Sources:
NIST AI 100-2e2025