How are the scenarios used during training?

annim · November 3, 2022, 8:01pm

Hi,

I am using the PPO2 algorithm to train my agent, and I am wondering exactly how the training process works. I have divided the HighD dataset into 3041 scenarios, each of which has a maximum of 1000 time steps. If I use 70% of these for training, I get a total of 2,104,550 time steps. If I train my agent for 1 million time steps, does that mean my agent will only “see” about half of all the training scenarios once?

I also have the problem that my RAM (88gb) is quickly used up during training. Therefore, I can only train about 70,000 time steps in one run before I have to start another subsequent training. In this case, does my agent always “see” the same scenarios in each subsequent training if I use a constant random seed?

Thanks in advance for the answer! And if you faced similar RAM issues during your training, I would be happy to hear how you dealt with it!

Best regards,
Annabel

xwang · November 14, 2022, 3:17pm

Hi Annabel,

Thanks for your post. The env loads all scenarios into a list stored in the specified folder at initialization. During training, each time reset method is called, env randomly selects from its list. This means that a scenario could be selected multiple times. Only if you set the option play=True when you initialize the env, it will pop up the scenario instead of select, so only for play mode each scenario only gets selected once. The play mode is for evaluating a train model.

As for the memory issue, I usually scatter the whole dataset into different subfolders using this script. Then for training I used multiple threads, where one env is created on each thread and each env only loads from its own subfolder. This way the RAM consumption is approximately the total size of the whole dataset instead of num_threads * size_of_dataset.

Hope this helps.

Best,
Xiao

annim · November 14, 2022, 9:26pm

Hi Xiao,

thanks for your reply!
Have I understood correctly if I conclude that not all training scenarios are necessarily used in training? E.g. the worse the agent is, the fewer time steps it is in one scenario and the faster it moves to the next and therefore runs through more scenarios in total for a fixed number of time steps?

As for the memory problem, I have only trained with one env. Therefore, the data set is loaded only once. However, I notice that the RAM consumption increases continuously during the training until the process is terminated by the Linux kernel.

Best,
Annabel

xwang · November 25, 2022, 10:22am

Hi Annabel,

Yes exactly. At the beginning of the training, the agent might use more scenarios since an episode terminates earlier because collision/offroad.

Does it increase a lot during the training? There are some stuff cached when first seeing a scenario in order to reduce runtime. But it shouldn’t increase a lot.

Best,
Xiao