So I saw a post on LessWrong trying to build intuitions about the size of ML models. Yudkowsky has apparently computed the total DNA of a human to be about 750 megabytes. I thought this was interesting, at least in terms of building intuitions about how complex these formulas are. I created three different algorithms in R and computed their size by saving them as .rds file (saveRDS()). Anyone could load up these files and use them to make predictions. So, without further ado, here’s 3 ML algorithms and their size (unfortunately, wordpress won’t just let me upload them).
The first was a neural net which predicts which of 3 regions a wine originated from based on a chemical analysis of ~180 different wines. This was 78 kilobytes. I just used the code from r-Bloggers.
The second was also a neural predicted car quality in used cars based on 1800 observations. This one was 942 kilobytes and based on the cars data from the UCI Machine Learning Repository.
The last was a random forest that predicts outcomes for insurance appeals based on ~28,000 cases. This one was ~5.1 megabytes and I’m planning to use it in some research.
FYI, the first neural net compiled instantly, the second took a minute, but the third took about an hour and half to run on my poor little laptop.