NVIDIA's Graphics Processing Units (GPUs) have been widely adopted in many application domains to shorten the execution time by parallel processing and the Compute Unified Device Architecture (CUDA) platform enables high-performance, many-core parallel programming for NVIDIA GPUs. Various kinds of metaheuristic algorithms, aiming at finding an acceptable good solution rather than the optimum solution for NP-complete problems, have been studied for parallel execution on GPUs. The simulated annealing algorithm (SA) is one of metaheuristic algorithms and has been widely used on solving hard problems on many application areas. In general, when the number of iterations is decreased, the execution time is shortened but the solution quality becomes poorer. Therefore, it is a hard work for programmers to choose an appropriate number of iterations for the SA algorithm when they parallelize the sequential SA. This paper proposes an approach that optimizes the mapping of the simulated annealing algorithm onto CUDA-enabled GPUs. Unlike the previous research, our goal of this work is to parallel the SA algorithm by setting the number of iterations to that adopted in the sequential version, which results in high speedup and good solution quality.