What's the best way to benchmark neuro‑symbolic‑causal AI agents?

What's the best way to benchmark neuro‑symbolic‑causal AI agents?(github.com)

1 points by aytuakarlar 293 days ago | 1 comment

aytuakarlar 293 days ago |

I’m building Project Chimera, an open‑source neuro‑symbolic‑causal AI framework. The goal:

Combine LLMs (for hypothesis generation), symbolic rules (for safety & domain constraints), and causal inference (for estimating true impact) into a single decision loop.

In long‑horizon simulations, this approach seems to preserve both profit and trust better than LLM‑only or non‑symbolic agents — but I’m still refining the architecture and benchmarks.

I’d love to hear from the HN community:

• If you’ve built agents that reason about cause–effect, what design choices worked best?

• How do you benchmark reasoning quality beyond prediction accuracy?

• Any pitfalls to avoid when mixing symbolic rules with generative models?

GitHub (for context): https://github.com/akarlaraytu/Project-Chimera

Thanks in advance — I’ll be around to answer questions and share results from this discussion.