The dwell scheduling problem for a multifunctional radar system is led to the formation of corresponding optimization problem. In order to solve the resulting optimization problem, the dwell scheduling process in a scheduling interval (SI) is formulated as a Markov decision process (MDP), where the state, action, and reward are specified for this dwell scheduling problem. Specially, the action is defined as scheduling the task on the left side, right side or in the middle of the radar idle timeline, which reduces the action space effectively and accelerates the convergence of the training. Through the above process, a model-free reinforcement learning framework is established. Then, an adaptive dwell scheduling method based on Q-learning is proposed, where the converged Q value table after training is utilized to instruct the scheduling process. Simulation results demonstrate that compared with existing dwell scheduling algorithms, the proposed one can achieve better scheduling performance considering the urgency criterion, the importance criterion and the desired execution time criterion comprehensively. The average running time shows the proposed algorithm has real-time performance.