The boom of unmanned aerial vehicles (UAVs) is projected to fundamentally shift paradigms of transportations, logistics, agricultures, and public safety as a dominating unmanned application in following decades. To optimally process assigned tasks, each UAV requires prompt and ubiquitous information provisioning regarding the varying operation conditions, which renders exploiting base stations (BSs) of existing wireless infrastructures a tractable solution. To receive services from a BS, a UAV should stay within the coverage area of a BS, which however limits the operation range of a UAV. This obstacle thus drives the deployment of a special sort of UAV, known as an aerial base station (ABS), to relay signals between a BS and a UAV. Based on different flight paths of UAVs, an ABS should autonomously decide its own flight trajectory so as to maximize the number of UAVs which can receive wireless services. However, the inherently non-stationary environment renders the optimum autonomous deployment of an ABS a challenging issue. Inspired by the merit of interacting with the environment, we consequently propose a reinforcement learning scheme to optimize the flight trajectory of an ABS. To eliminate the engineering concern in the conventional Q-learning scheme that most state-action pairs may not be fully visited in the deployment of an ABS, in this paper, a state-amount-reduction (SAR) k-step Q-learning scheme is proposed to avoid the issue in the conventional Q-learning, so as to maximize the number of UAVs receiving services from an ABS. Through providing analytical foundations and simulation studies, outstanding performance of the proposed schemes is demonstrated as compared with that of the conventional reinforcement learning based ABS deployment.