How are the planning problems used in the learning process?

Hi all,

I’ve experimented around a bit with the code from the commonroad-rl tutorial.

What I don’t get is how the scenarios actually get used in the training process.
Is a random scenario just selected and then a RL agent learns a policy on it by trying to go from the start state to the goal states?

I noticed that in each scenario there is only one planning problem. How could I train an agent on a scenario with a lot of different planning problems?

Is it possible to train an agent on just one scenario but with many different planning problems?

Best,
Nando

Hi Nando,

each time env.reset() is called, a random scenario and planning problem pair is selected here. Then the planning problem is used to setup the initial state of the ego vehicle. The goal region in the planning problem is used to create a route for the ego vehicle to follow.

By a scenario with a lot of different planning problem, do you mean for multi-agent planning or use the different planning problem sequentially?

Extension to multi-agent RL is still under development.

Learning the same scenario with a different planning problem sequentially is not possible with the current version and in my opinion does not make much sense. You basically want to train an agent which could start from different initial state and reach different goals, but in the exact same traffic, which would cause overfitting to the traffic. It’s also not trivial to make sure that the different planning problems you created for the same scenario are feasible/solvable under the given traffic.

Best,
Xiao