AI dev startups are struggling with one problem and I solved it - with POC

2 points by kannthu 2 years ago | 1 comment

*TL;DR;*

Over one month ago I posted about a really hard problem that I "accidentally" solved (https://news.ycombinator.com/item?id=40460084).

The problem is to resolve cross-file references for multiple programming languages. I can generate a graph representation of the codebase.

*Why do you need to have a graph representation of the codebase?*

- To understand how code references other code

- Track how data is passed around

I generated references for repo https://github.com/dj-stripe/dj-stripe, here is a gist: https://gist.githubusercontent.com/kannthu/6e1bdd2781d2e0a6ded30844d61f089e/raw/f1fa4bc0f34891834ce13ac256eec12f6cc671e1/dj-stripe-references.json

The gist is a big JSON blob that contains definitions form the repository.

Definitions are:

- top-level functions

- classes

- methods and public properties

- top-level variables

- exports

Each definition contains:

- Snippet, path, and range within the file

- "references" - a list of places where the definition is used

- "expressions" - a list of resolved references (variables, functions, and classes) that are used within the body of the definition

*How this data can be useful?*

If you are building code generation, code intelligence, or code review products - your product needs to have an understanding of the codebase for many programming languages at once. The more accurate context you feed to LLM => the better output you will get, and doing it in-house is really expensive and resource-consuming.

Let me know if it is interesting for any of you.

kannthu 2 years ago |

Clickable links:

- https://news.ycombinator.com/item?id=40460084

- https://github.com/dj-stripe/dj-stripe

- https://gist.githubusercontent.com/kannthu/6e1bdd2781d2e0a6d...