Words growing or shrinking in Hacker News titles: a tidy analysis(varianceexplained.org) |
Words growing or shrinking in Hacker News titles: a tidy analysis(varianceexplained.org) |
"Using VR to train a deep learning neural network on driving and react correctly to unexpected conditions, a bot implemented via a microservices stack using aws as a container and of course connected with cars and related traffic devices via the IoT, logging unexpected events into a blockchain."
With the bigrquery R package (https://github.com/rstats-db/bigrquery), you can access the HN dataset directly from R, using dplyr syntax too. (for simple queries atleast; you can pass the raw SQL for complex queries)
As noted, the resulting dataset of words is large, so mapping the words in BigQuery itself may be more practical (using a combo of SPLIT and UNNEST with standard SQL), although of course you can't do complex operations like logistic regression or splines there.
Any guesses on this one?
Comments are getting longer over time on average (http://minimaxir.com/img/hn-comments/monthly_average_words.p...), and there is a slight positive correlation between comment score and comment length (http://minimaxir.com/img/hn-comments/distribution_comment_po...), but that can't be remade with the BigQuery dataset since comment scores are no longer public.