LLMs for Reinforcement Learning: Prompted Policy Search (ProPS)

Developed ProPS and ProPS⁺ to prompt LLMs for generating parameterized RL policies after linguistic and numerical reasoning. The iteratively improve through closed loop feedback to the LLM. Relevant contextual and semantic information about the task is also provided through prompting. Explored 15 different tasks and compared the results with state of the art RL methods. Currently working on finetuning to improve RL optimization capabilities of smaller sized LLMs.

References