The results will be "bad", which you have acknowledged as a possibility. Why do it then?
LLMs trained "merely" on just either wikipedia or reddit are probably going to be very limited in capability since there's not enough well rounded data (esp. for wikipedia). Of course you'll find differences. Reddit is going to contain more profanity, at the very least, so the reddit-trained LLM is going to swear and use slang more. Besides generating gibberish and comparing the gibberish doesn't seem to be any point with the exercise, unless that's a project you really want to do.
Without knowing how IB scores students' research papers I wouldn't be able to comment on whether this is feasible to get reasonable grades, but as I said, unless you really want to do it and somehow measure the reddit model understanding slang better and swearing more readily, I personally don't see a point in doing so given that the results will likely, as you mentioned, be somewhat "bad".
The thing about bleeding edge research on LLMs is that nobody really knows what will happen unless you actually try it out.
FWIW you generally don't have to do much proper "programming" to train models these days. There are many projects on github with code to train SoTA models (which in turn are just hundreds or low-thousands lines of code). The main difficulty is getting the hardware, the OS and the dependencies to work correctly, getting high quality training data (which you don't have to for your project), and tuning the hyperparameters (if you're concerned with performance).
So in terms of technical feasibility, yeah, but I am kind of concerned that the most likely main result would be reddit's knowledge of internet slang and swearing over wikipedia, which doesn't seem to mesh well with a high school project :D