Reverse OCR

albertzeyer 11 years ago |

I was thinking about this: http://www.cs.toronto.edu/~graves/handwriting.html

sah2ed 11 years ago | |

Very impressive!

The author published an open source library -- RNNLIB [1] used for his neural network research but is the actual code for this hand writing demo published somewhere?

[1] http://sourceforge.net/p/rnnl/wiki/Home/

DINKDINK 11 years ago | |

Unfortunately it does not take acute accents. Such as the sentence: Ceci n'est pas écriture humaine.

geon 11 years ago | | |

Nor umlauts, like åäö.

amelius 11 years ago | |

I wonder if it also works for voice synthesis.

rogiervd 11 years ago | | |

Not yet, I think, but people are seriously trying. http://research.google.com/pubs/HeigaZen.html

the_cat_kittles 11 years ago | |

that is so impressive, wow. thank you for the link

padho 11 years ago | |

Thats nice, thank you for the link!

praptak 11 years ago |

This is similar to the project where images of clouds were fed to face recognition software: http://ssbkyh.com/works/cloud_face/

stefs 11 years ago | |

there's also http://iobound.com/pareidoloop/, a project that uses a genetic algorithm for breeding (random) polygons into a shape with a face detection algorithm as the fitness function.

nihakue 11 years ago | | |

That is totally cool. It feels like watching an oil painter slowly work from something very abstract (layers of brushstrokes) to something very recognizable (a human face).

shangxiao 11 years ago | |

I'm also intrigued by the cat vs human face recognition results!

jparishy 11 years ago |

Not strictly related, but reminded me of the exercise in genetic programming by Roger Alsing: http://rogeralsing.com/2008/12/07/genetic-programming-evolut...

It's a rather cool attempt to draw the Mona Lisa using random, semi-transparent polygons

darkFunction 11 years ago | |

I did this recently, the results were surprisingly good! https://github.com/darkFunction/PolygonPainter

Edit: Roger Alsing's implementation was a single entity population (mutated then reverted if the mutation was no good). I copied this approach in my first implementation, but found that much better results could be achieved with a breeding population of genes.

PavlovsCat 11 years ago | | |

> I copied this approach in my first implementation, but found that much better results could be achieved with a breeding population of genes.

Ohhh, interesting, thanks for posting this! I just started playing around with this myself a few days ago in Javascript (it has no UI so no link yet, but I uploaded some samples [0]), and it also uses the original simple approach. I wondered about an "actual" gene pool and cross-breeding, but shyed away from the additional effort for uncertain benefit... so this helps, greatly :)

One thing I intend to try is to get the fittest (in terms of likeness to target image), and then calculate the fitness of the other genomes by taking the difference (in terms of variables that determine the shapes) to that "champion". I see you take the two most fittest as is, maybe this could be useful for picking the second one?

Also, when the Mona Lisa thing was posted on HN, someone suggested marking areas of the target image as "more important", to maybe make facial features etc. more recognizable. I'll also see if making such a mask automatically, e.g. influenced by contrast, helps any.

[0] http://imgur.com/a/4duom

darkFunction 11 years ago | | |

> I see you take the two most fittest as is, maybe this could be useful for picking the second one?

Yep, we take the two most fittest unchanged, and breed the rest randomly along a non-uniform distribution tending towards the top.

> Also, when the Mona Lisa thing was posted on HN, someone suggested marking areas of the target image as "more important", to maybe make facial features etc. more recognizable.

This is a really good idea and would definitely make a difference. In my penguin example, with too few polys, sometimes we reach a local maxima before the eyes (small details) look any good. I combatted this somewhat by encouraging new polys to be (a) small and (b) regular. But I like the idea of a more guided approach.

There's an awesome javascript example you might find helpful: http://alteredqualia.com/visualization/evolve/

Edit: Your images look amazing. How do you make the vector shapes?

PavlovsCat 11 years ago | | |

I already knew the altered qualia link, and actually meant that one when I talked about the post on HN (the link to which I got from the bottom of that page) xD

Thanks! I just use what the HTML canvas has to offer: circles, n-gons, and n-gons made of bezier curves. Those can get drawn both filled and as outline, which in turn aren't drawn with plain colors, but gradients made of 3 HSLA colors (alpha ranges from 0.05 to 0.85, to make sure everything matters at least a bit and doesn't just get covered up or disappear), with the position of the middle color stop, as well as the coordinates that define the gradient direction, being variable.

The linewidth is variable, and it can also use dashed lines with a randomized pattern, which looks funky but I haven't gotten a good result with it yet. Another thing I'll do is add allowing picking a random compositing mode (and in the spirit of the expedition, do that for both fill and outline), but then it will be too unpredictable to look like being made of shapes, I expect.

The background also has a gradient consisting of no less than 6 colors, with 4 variable color stops.. what can I say, I like enums and copy and paste haha. Though of course adding more degrees of freedom nilly-willy might not be the best idea.. I have to improve the whole evolution/mutation stuff a lot first, and then I intend to throw just about anything at it I can find, at least as an option. Right now I added words, though that doesn't seem very promising. But having one basic common function to setup fill and outline gradients for any and all shapes I might come up with makes experimenting with this a breeze.

I kind of feel I should be doing this with WebGL, but then I wouldn't have all those convenient drawing functions.. but still, whenever WebGL 2.0 comes (which will make a lot of compute-ish things a lot easier AFAIK), I want to do at least a polgon/circle/bitmaps thing with it, because I feel a speedup factor of a gazillion might just make up for that :)

darkFunction 11 years ago | | |

Do you have a Github link? Or can you post back here when you're finished? I'd like to see the final result :)

tlarkworthy 11 years ago | | |

Yeah really good results, conceivably useful for compression. It be good to know the vertices count in your final penguin images.

darkFunction 11 years ago | | |

100, 6-sided polygons :) Though it looks pretty good with as few as 50

bane 11 years ago |

This could be a cool way to visually "encrypt" messages. They're readable, but only by the correct tool. I wonder how these squiggles might be creatively arranged steganographicly in an image and still be "read" by the OCR tool.

SilasX 11 years ago | |

Correct me if I'm wrong here, but that just seems like reinventing crypto with a large key, and requires you to implement a counterparty-provided algorithm, which could be malicious.

bane 11 years ago | | |

Believe it or not, there's still a lot of utility in putting encrypted messages on physical things (pieces of paper). One time pads work well for this, but imagine if the recognition algorithm was altered in different ways effectively acting as a key s.t. you had to have a similarly altered recognized on your side to see it. Yeah it's symmetric crypto in that sense but you can physically hide all kinds of stuff or divide up the message between different couriers or do other stuff in ways that are a bit unlike digital crypto. The simple fact of a message being the physical object might be enough to confuse an eavesdropper.

Or the message itself could also be encrypted with a more secure system, but then physically presented in an open area so that somebody with a tuned recognizer can get the encrypted data to later decrypt digitally.

sp332 11 years ago | | |

The point of steganoraphy is that no realizes there's a message.

SilasX 11 years ago | | |

And? It still has key sizes, the enemy still knows the system.

dragontamer 11 years ago | | |

Steganography is solidly a "security through obscurity" thing. Sure, we comp-sci people don't care about that, but spies do.

There was that Russian Spy who was transmitting data for years on her Facebook account through steganography pictures on her Facebook account.

http://www.technologyreview.com/view/419833/russian-spies-us...

The FBI didn't know about it until after she was caught. So believe it or not, Steganography _works_. If you're trying to hide the fact that you're a spy, encrypting all of your messages over TOR is a bad idea.

On the other hand, if you pretend to be a normal person and embed secret messages in your Facebook posts, you can be a spy for years and not get caught.

pmoriarty 11 years ago | | |

"believe it or not, Steganography _works_"

I think one of the reasons stego works is because of the sheer amount of data being generated and shared in the modern world.

It's kind of a blessing and a curse for spy agencies. On the one hand, they love to collect data, and the more the better, since with more data to analyze, they can potentially learn more things. But the more data there is, the more computing power they have to throw at it to make sense of it.

So it's really not surprising that data can be hidden from spy agencies (possibly by relatively primitive means even), because they probably don't have the computing power (vast as their computing power is) to effectively run every possible detection algorithm and all their highly sophisticated (and probably computationally expensive) steganalysis software on so much data.

Videos, since they are so huge compared to other media files like text or audio, have always seemed like an ideal medium for stego to me. Of course, it's more difficult to preserve one's hidden data on sites like youtube that re-compress the videos that get uploaded to them, but any site that hosts original videos unmolested should be ripe for stego.

SilasX 11 years ago | | |

>Steganography is solidly a "security through obscurity" thing.

Right, but that means it inherits all the problems of security by obscurity, like it breaking as soon as the public knows the technique, which they do now.

My other point was that this seems to be equivalent to traditional stego solutions but with a key size equal to the algorithm size.

(And I'm not sure why merely asking about they key size problem and obscurity problem hurt the discussion enough to get hammered so hard...)

marcosdumay 11 years ago | | |

> Steganography is solidly a "security through obscurity" thing.

I never really understood why is it so.

Encrypted data must be indistinguishible from random, thus, if you replace any random projection of a file with your data, the result should be completely unrecognizable. It shouldn't really matter if your algorithms are public.

Is the problem that it's hard to get random projections from modern data? If so, why not use older formats?

AngrySkillzz 11 years ago | | |

Wow, that article is pretty bad.

pmoriarty 11 years ago | | |

Disclaimer: [1]

From what I understand, ideally stego would be used in conjunction with encryption.

First, you would encrypt your message, then you would use stego to hide it.

If the stego is good, it would be a computationally intractable problem[2] for your adversary to determine whether there was indeed a message hidden within the data they were analyzing, with greater than 50% accuracy.

That said, I'm not sure how practical using an application like this would be for stego. It does not "whiten" the data it tries to hide, so unless the data's already whitened, it could potentially stand out like a sore thumb when subjected to steganalysis. And how would you propose actually using this?

This does present some intriguing possibilities, however, like maybe having Alice and Bob share a tweaked version of an OCR library and having Alice generate random images until her encrypted message has been "encoded" in such a way as to be recognizable by the tweaked OCR library that she shares with Bob. The tweaking of the library's character recognition parameters could be a sort of pre-shared key, and would not be available to Eve (the adversary).

[1] - this post comes from a hobbyist, not from any kind of security researcher, steganalyst, cryptoanalyst, etc. So please take what I say with a grain of salt and please correct me if I'm wrong.

[2] - "computationally intractable" being different for different adversaries, of course, which is one reason you need a good threat model.

Jun8 11 years ago | |

This is a good idea! The problem with these squiggles is that they look abnormal and would draw attention. It would be interesting if these can be tweaked somehow so that they are still bot readable but can also be interpreted as patterns by humans.

TheLoneWolfling 11 years ago | | |

...Here's a crazy thought.

Take handwriting, the more illegible the better. Then use a genetic algorithm where the fitness function is trying to find as small a perturbation as possible to the input such that the output is recognized as the letters you want.

xerophyte12932 11 years ago | | |

What if... we combine them with normal looking letters to make a captcha? humans see one thing, bots see some more?

Goopplesoft 11 years ago | |

QR code?

kitd 11 years ago |

Could be used for automated printing of doctors' prescriptions ;)

mrtbld 11 years ago |

Perhaps this could lead to a new kind of captcha that only bots can solve. I doubt it would be efficient, though.

bkirwi 11 years ago | |

Startup idea: create a CAPTCHA that humans recognize as one word, but OCR recognizes as a different word.

OscarCunningham 11 years ago | | |

This HN item: https://news.ycombinator.com/item?id=8544911 was about how to do this for neural nets.

medmunds 11 years ago | |

http://ssbkyh.com/works/fadtcha/

pimlottc 11 years ago | | |

Nice, a captcha to block colorblind users!

arjie 11 years ago | | |

That's a subset of the objective. The objective is to detect all humans because they wouldn't spot the 'face' in the coloured circles. Whether or not it's coloured doesn't change that.

Unless you're concerned about the rights of colour-blind computers for some reason.

pimlottc 11 years ago | | |

Ha, oops, I guess I didn't read it closely enough. As someone who's partially colorblind, just seeing those dot patterns pisses me off... :P

dreamlane 11 years ago | | |

How do you know if the individual being tested doesn't see a face? You have to trust them when they say "no face here"... May as well just ask "are you human?"

edwintorok 11 years ago | | |

What if the computer just chooses a random rectangular region, or it always answers 'no face here'?

justaman 11 years ago | |

If this bot can generate scribbles from words. Theoretically, couldn't this bot work to teach an OCR-bot to effectively recognize scribbles as letters?

marcosdumay 11 years ago | | |

Somebody just modded you down instead of answering...

No. It couldn't be used to tech an OCR. Well, technically, it could, but all the OCR will learn is how to read text from this bot, not how to read text written by people.

rmc 11 years ago | |

If a correct answer is given, presume it's a bot.

_lce0 11 years ago | | |

..simply solved making bots answer randomly the first time

cryowaffle 11 years ago | |

... that's a pretty interesting idea!

jimmytidey 11 years ago | |

Bot Honeypot

zz1 11 years ago | | |

HoneyBot

carsonreinke 11 years ago |

Looks like he has written tons of very creative bots. They are all very interesting ideas (e.g. http://randomshopper.tumblr.com)

mturmon 11 years ago | |

Agreed:

http://www.bostonglobe.com/ideas/2014/01/24/the-botmaker-who...

sgentle 11 years ago |

It would be pretty interesting to see one degree of abstraction up from this - what sets of lines are close enough to match a certain word?

If you averaged over all those sets, would the resulting blobby heatmap resemble the original word in a legible form? Or something else?

userbinator 11 years ago |

I can imagine generating a few pages or even an entire book of this, and some future generations attempting to figure out what sort of language it was written in... reminds me of this:

http://en.wikipedia.org/wiki/Voynich_manuscript

cosarara97 11 years ago |

I couldn't get that OCR to read my mouse-written E. It's a nice experiment nevertheless.

amelius 11 years ago | |

Indeed, the underlying OCR seems to need lots of work still. It's no wonder that the "reverse" operation results in such messy line art.

The demo is nice, though.

klausa 11 years ago |

I highly recommend watching talk Darius Kazemi (author of Reverse OCR) gave at this years XOXO: http://www.youtube.com/watch?v=l_F9jxsfGCw

emhart 11 years ago |

It has been fantastic watching Darius' myriad experiments over the past few years. His work always has a great mixture of whimsy and serious experimentation.

MrBra 11 years ago |

Nice. Finally computers approached the age of writing. :)

lucb1e 11 years ago |

I can already imagine the innovation:

> Type over this text to prove that you are a computer.

> Human detected. Shoo, shoo!

Aaronneyer 11 years ago |

Looks like my handwriting

z3t4 11 years ago |

I can't believe OCR has not been solved yet. The only one even close is OmniPage.

Tepix 11 years ago | |

Isn't OCR pretty good these days?

driverdan 11 years ago |

Here's the source code on github: https://github.com/dariusk/reverseocr

jostmey 11 years ago |

A generative model, although computationally expensive, would not suffer this problem. Essentially a generative model can run in reverse, which means that if you feed values into the output you get inputs that could explain the output. Check out "Boltzmann Machines" for an example. There are plenty of examples for the MNIST dataset of hand written digits.

k_sze 11 years ago |

I think one of the problems is that the OCR assumes the images to be (English) letters.

To be really really useful, the OCR would need to consider at least all characters in the Unicode Basic Multilingual Plane. And then it needs to be able to reject an image as containing any word, and then it needs to solve the halting problem.

zwass 11 years ago |

This reminds me of an experiment I played with using random search to "teach" the browser how to draw characters: http://zwass.github.io/Learn2Write/

bmh100 11 years ago |

This actually seems like a great program for automatically generating adversarial examples to improve OCR. A human could rate this text as being illegible or legible. Each example can then be added to the training data to improve its quality.

mng2 11 years ago | |

The Letter Spirit project (a Douglas Hofstadter thing) is sort of in that vein, but less prosaic in its objectives.

eurleif 11 years ago |

It would be neat to see the same thing, except using two OCR libraries instead of just one, and requiring both libraries to be able to read the message. I imagine the letters would start to look a bit less insane.

shangxiao 11 years ago |

This is pretty cool, although it makes me wonder what the real world applications could be. It does, at the very least, tantalise my curiosity and gets me thinking.

achr2 11 years ago |

Could this be used in a pseudo reverse CAPTCHA by showing a series of words, and asking the user to say which is not human readable?

methyl 11 years ago |

I wonder what would happen if you run this program letter-by-letter, possibly the readability could increase.