A recognition system performs actions such as feature extraction or object detection, and combines the results to give the final answer. The actions vary in cost, and different test instances benefit from different actions. For this reason, the selection of the optimal subset of actions must be dynamic (closed-loop) when a test-time cost budget is given. We focus on the Anytime evaluation setting, where the process may be stopped even before the budget is depleted. Action selection is formulated as a Markov Decision Process, solved with reinforcement learning methods. The state of the MDP contains the results (computed features, or output of object detectors). We present different ways of combining these values for the purpose of selecting the next action, and for giving the final answer. Results are given on an illustrative synthetic problem, visual scene recognition and object detection tasks, and a hierarchically-structured classification task. On the latter, we show that Anytime answers can be given for any desired cost budget and level of accuracy.