Q-Learning Implementation Basic implementation of q-learning in a 4*4 grid. Agent starts at (0,0). White cells have a negative reward. Episode ends when agent finds the black cell (positive reward).