AI dev startups are struggling with one problem and I solved it - with POC *TL;DR;* Over one month ago I posted about a really hard problem that I "accidentally" solved (https://news.ycombinator.com/item?id=40460084). The problem is to resolve cross-file references for multiple programming languages. I can generate a graph representation of the codebase. *Why do you need to have a graph representation of the codebase?* - To understand how code references other code - Track how data is passed around I generated references for repo https://github.com/dj-stripe/dj-stripe, here is a gist: https://gist.githubusercontent.com/kannthu/6e1bdd2781d2e0a6ded30844d61f089e/raw/f1fa4bc0f34891834ce13ac256eec12f6cc671e1/dj-stripe-references.json The gist is a big JSON blob that contains definitions form the repository. Definitions are: - top-level functions - classes - methods and public properties - top-level variables - exports Each definition contains: - Snippet, path, and range within the file - "references" - a list of places where the definition is used - "expressions" - a list of resolved references (variables, functions, and classes) that are used within the body of the definition *How this data can be useful?* If you are building code generation, code intelligence, or code review products - your product needs to have an understanding of the codebase for many programming languages at once. The more accurate context you feed to LLM => the better output you will get, and doing it in-house is really expensive and resource-consuming. Let me know if it is interesting for any of you. |