Reinforcement learning (RL) algorithms use experiences learned from a single agent (decision maker) interacting in a dynamic environment to solve sequential decision-making difficulties. The RL algorithm often estimates an action-value function (Q-function). A decision-making strategy uses different function approximators to represent how a single action (choice) impacts the future outcomes (e.g., deep neural networks). This model can determine the best course of action for accomplishing a task in its current state. The success of RL algorithms in executing diverse tasks is selected from their ability to understand such temporal interactions between a future reward and an action selection. The aims of the RL algorithm are: (1) the agent is trained, and (2) the reward is obtained as much as possible. The figure above illustrates an overview of RL. The agent determines the task that needs to be completed based on the current situation. Moreover, the agent creates the state that describes the environment and receives a reward based on its actions. The functions that demonstrate the current situation corresponded to the "state", and the state space is the set of all states. The state at a given time is represented as St. The various available choices to an agent corresponded to the term "action". The action space is the set of possible actions, and actions in a specific state are symbolized as At. The value obtained from an action performed in a given state corresponded to the "reward". The larger the reward, the better, and the agent's goal is to enhance the total rewards as much as possible.