Benchmarking LLMs against human expert-curated biomedical knowledge graphs(sciencedirect.com) |
Benchmarking LLMs against human expert-curated biomedical knowledge graphs(sciencedirect.com) |
> In our case, the manual curation of a proportion of triples revealed that Sherpa was able to extract more triples categorized as correct or partially correct. However, when compared to the manually curated gold standard, the performance of all automated tools remains subpar.
1) weighting of each statement for probability of correctness and
2) citation for each source.