🦖 RExBench

Nicholas Edwards*, Yukyung Lee*, Audrey Mao, Yulu Qin, Sebastian Schuster† and Najoung Kim†
rexbench꩜googlegroups·com

A benchmark of machine learning research extensions for evaluating coding agents

Make submission

Instructions

To make a new submission, run your agents on all tasks in the benchmark and generate a patch file for each task that contains all the differences between the modified code and the base code.

Make sure to put each patch into a file named agent.patch into a separate directory for each task. Also include another file agent.log that records all the steps that the agent took, including the corresponding LLM prompts and responses.

We will execute the modified code based on your patches within a week and add the results to the leaderbord.

Upload a submission

ZIP file requirements:
  • Must contain one directory for each task: checkeval, cogs, entity-tracking-multimodal, ...
  • Each directory must contain: agent.patch and agent.log
Upload a ZIP file containing the required directories and files.
A name for your submission to be displayed on the leaderboard.
Confirmation emails will be sent to this address. (not public)
(not public)
Optional: If you are making this submission on behalf of an organization. (not public)
Optional: Describe the components of your agent and how it works.
Optional: If your agent outputs absolute paths, specify the base directory from which you ran the agent here so that we can automatically translate absolute paths to relative paths in your patches. Use the placeholder {task_name} to refer to the name of each task.
Optional: URL to your agent implemetation or documentation.
Optional: The total costs in USD incurred from running your agents (e.g., API costs).
Optional: Anything else you'd like us to know (not public).