
今天看到一个有意思的 repo,作者用一个 DSL 来评估 LLM 的能力
DSL
"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> \ LLMRun("What flag is shown in this image?") >> \ (SubstringEvaluator("United States") | SubstringEvaluator("USA") | SubstringEvaluator("America")) 其中有一个 python convert to c(这个我第一次没反应过来哈哈) 问题很有意思,所有的 llm 都错了.
def foo(x): sum = 0 for i in range(x): x += i sum += x return sum #include <stdio.h> int foo(int x) { int sum = 0; for (int i = 0; i < x; i++) { x += i; sum += x; } return sum; } int main() { int result = foo(5); // Example call, replace 5 with any integer to test with different values printf("Result: %d\n", result); re