CloudFlare doesn't block legit crawlers either. It does cache responses to crawlers so if a page hasn't changed and Google crawls it again the request doesn't burden the origin.
What's interesting about CDN in a Box is they're serving off a single IP. The problem with this strategy is Google classifies sites for crawl purposes by IP. That means if one site on CDN in a Box falters, all the other sites on CDN in a Box will suffer (e.g., Google turning down crawl velocity or completely removing them from the index). The same problem occurs if there's anything spammy or compromised by malware.
At CloudFlare, we tried the CDN in a Box strategy when we launched more that a year ago. We quickly found it had serious negative impacts on site rankings. We spent considerable time working directly with Google and the other search engine crawl teams on a solution. Today, sites on CloudFlare actually get the highest crawl velocity setting because of this work, which we've seen positively impact site rankings.
I'm curious to hear more about CDN in a Box's plans, discussions with search engine crawler teams, and technologies they've developed to overcome this challenge.