Nice, I do like the direction. A prolog dialect does seem like a natural choice if we must pick only one kind of intermediate representation, but ideally there could be multiple. For example, I saw your "legal reasoning" example.. did you know about https://catala-lang.org/ ? I think I'd like to see an LLM experiment that only outputs formal specifications, but still supports multiple targets (say prolog, z3, storm, prism, alloy and what have you). After you can output these things you can use them in chain-of-code.
Anyway the basic point being.. it is no wonder LLM reasoning abilities suck when we have no decent intermediate representation for "thinking" in terms of set/probability primitives. And it is no wonder LLMs suck at larger code-gen tasks when we have no decent intermediate representation for "thinking" in terms of abstract specifications. The obsession with natural-language inputs/intermediates has been a surprise to me. LLMs are compilers, and we need to walk with various spec -> spec compilers first so that we can run with spec -> code compilers
Thank you, https://catala-lang.org/ looks very interesting. I've experimented a lot with LLMs producing formal representations of facts and rules. What I've observed is that the resulting systems usually lose a lot of the original generalization capabilities offered by the current generation of LLMs (Finetuning may help in this case, but is often impractical due to missing training data). Together with the usual closed world assumption in e.g. Prolog, this leads to imho overly restrictive applications. So the approach I am taking is to allow the LLM to generate Prolog code that may contain predicates which are interpreted by an LLM.
So one could e.g. have
is_a(dog, animal).
is_a(Item, Category) :- @("This predicate should be true if 'Item' is in the category 'Category'").
In this example, evaluation of the is_a predicate would first try to apply the first rule and if that fails fallback on to the second rule branch which goes into the LLM. That way the system as a whole does not always fail, if the formal knowledge representation is incomplete.
I've also been thinking about the Spec->Spec compilation use case. So the original Spec could be turned into something like:
I am honestly not sure where such an approach might ultimately be most valuable. "Anything-tools" like LLMs make it surprisingly hard to focus on an individual use case.
Anyway the basic point being.. it is no wonder LLM reasoning abilities suck when we have no decent intermediate representation for "thinking" in terms of set/probability primitives. And it is no wonder LLMs suck at larger code-gen tasks when we have no decent intermediate representation for "thinking" in terms of abstract specifications. The obsession with natural-language inputs/intermediates has been a surprise to me. LLMs are compilers, and we need to walk with various spec -> spec compilers first so that we can run with spec -> code compilers