Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

Published in arXiv preprint arXiv:2305.17306, 2023