Show HN: 3D-Parallax, labelfree 3D experience from a 2D image using parallax

Show HN: 3D-Parallax, labelfree 3D experience from a 2D image using parallax(github.com)

52 points by crou68 5 years ago | 34 comments

xaedes 5 years ago |

"We offer an experience of 3D on a single 2D image using the parallax effect, i.e, the user is able move his real-time tracked face to visualize the depth effect."

Since there are no examples I can't be sure if this is is what I think it is, but IF it is:

I want this on huge monitor for any 3D game instead of clunky headgear VR or tiny smartphone AR.

Months ago I also tested this with a small 3D visualization and very crude head tracking.

The effect is damn awesome! To be able to move around in real space with the rendering adapting to it, makes it so immersive, even for my very crude tests.

In my opinion the resulting 3D effect is MUCH better than viewing in stereo with one picture per eye.

Here is an example from someone else from 2007:

https://www.youtube.com/watch?v=Jd3-eiid-Uw

From 2012:

https://www.youtube.com/watch?v=h9kPI7_vhAU

Obviously this works for only one person viewing it, but does that really matter? There are a LOT of use cases where only a single person uses a single monitor for viewing, especially in these times. In fact it is the standard.

matsemann 5 years ago | |

I remember being awestruck by that 2007 Wii video, and spent a lot of time playing around with the remote after that. That, and the Xbox Kinect, were really cool tools. Why hasn't those concepts seen more widespread use outside gaming?

As for 3D, I also remember seeing a TV at Toshiba HQ in 2013 or so that had 3D without glasses, and even worked for multiple people. No idea how.

vanderZwan 5 years ago | | |

> Why hasn't those concepts seen more widespread use outside gaming?

I wrote my master thesis on designing gesture interfaces (of the Xbox Kinect type) back in 2012, here's my two cents:

First of all, gesture detection still isn't reliable enough for serious input. Missing a beat one in a hundred times is acceptable in gaming settings (and even then only in more "casual" environments like party games), but for serious input the input device needs to be practically 100% reliable. A keyboard press is. A mouse click is. Detecting whether your hand is an open palm or a fist? Not so much. Hence the peripherals we typically see for VR games, which help of course. At the same time they also somewhat defeat the purpose.

A second major issue is a lack of haptic feedback. There is no such thing as touch-typing in the air.

Why this is such a big problem needs a bit of explaining: a practical way to think of our ability to manipulate our environment (literally, the manos in manipulate referring to our hands) is to think of them as a pair of kinematic chains[0][1]. Essentially this is a chain of ever-more finegrained "motors", going from coarse-grained to fine-grained precision: our shoulders, our elbows, our wrists, and finally the digits of our hands. The ingenuity of this chain is that it allows for extremely fine precision (the sub-millimeter precision of our fingertips) in large spatial volume (the reach of our arms), and it does so by having each "link" in the chain perform a bit of "error-correction" for the lower resolution of the previous link.

What does this have to do with gesture interfaces? Well, in order for that kinematic chain to work, it needs a precise feedback system to perform said error-correction. We basically have three senses for this: our visual system (that is, seeing where we are putting our hands), our haptic sense (feeling which button we're pressing with our finger-tips) and our "spatial sense". The problem with the latter sense is that is relative: I sense the sub-millimeter location of my fingers relative to my wrist. I sense the millimeter-precision location of my wrist relative to my elbow. I sense the centimeter-precise location of my elbow relative to my shoulder. So if I'm waving my hands in the air without looking, the effective "precision" they have is about as crude as the crudest link in the chain: my shoulder. Of course this spatial sense can be improved with training, but you know what we typically call people who are really good at that? Professional-level dancers. The ceiling of mastering this skill is pretty high, and there's a reason it's basically a profession all by itself (plus a ton of other things obviously, don't want to sell dancers short here).

Gesture input also will never be as easy on the motor skills as typing: not only does a keyboard provide the haptic feedback from the keys, the precision of my fingers is relative to the wrists that are resting on the desk, not to my shoulders.

Games somewhat get around this by representing a visual avatar to give us feedback, but it's not perfect. On top of that, this feedback is limited by the resolution of the gesture detection, which is ludicrously low compared to the potential precision of our limbs. And if that wasn't enough, it also needs a really low latency to fool our brains and really "feel" like an extension to our senses.

So basically, the fidelity requirements are just brutally high.

And finally, there is only a limited set of use-cases. There are basically just two big ones: "touchless" interfaces (very niche) and pointing and manipulating in 3D space (less niche, with a clear advantage over keyboard or even mouse input, but again having brutally high fidelity requirements). Because of that, as cool as gesture interfaces are, the industry-wide drive to solve all the aforementioned issues just isn't quite as high as we'd like it to be.

[0] http://cogprints.org/625/1/jmb_87.html

[1] https://en.wikipedia.org/wiki/Kinematic_chain

numpad0 5 years ago | | |

Sony has a head tracking OLED sub-display for VR/3D content creation in office setups, Looking Glass was doing crowd funding recently for an apparently very well done lenticular digital photo frame, those two comes off top of my head.

Lenticular 3D limits resolution to a fraction divided by double of the number of viewpoints in an axis, e.g. a 4K by 2K panel with 10 viewpoint along its width means effective resolution is 400 by 2K. Okay for demos but not practical atm.

zenir 5 years ago | |

I actually hacked something together like that and it surprisingly looks much better in a video than in real life. I guess the main reason being that we have two eyes and the brain still realizes it is flat (which is weird because it doesn't look flat in a video).

vanderZwan 5 years ago | | |

It works well in the video because in that context we are trying to estimate depth relatively to the camera.

In real life, we are trying to estimate depth relative to us. In order to fool our brains we need both a very low latency and a high frame-rate. That was one of the major hurdles to solve for VR as well, leading to John Carmack's famous complaint that he can ping across the Atlantic and back faster than he can send a pixel from his desktop to his screen[0][1].

Anyway, back to the video: basically, when comparing 3D movement relative to the camera our brains seem to be more "forgiving".

[0] https://twitter.com/id_aa_carmack/status/193480622533120001

[1] https://danluu.com/latency-mitigation/

karmakaze 5 years ago | |

Thanks for posting these similar demo links. I have a problem with visual programs/libraries that reference no examples: demo or GTFO. I realize that the repo may have been posted here by anyone but it's a good note to reference a demo if one is/can-be-made available.

fish44 5 years ago | |

looks like it uses this https://shihmengli.github.io/3D-Photo-Inpainting/#example which makes this super cool...

sjs382 5 years ago | |

I first saw that in person on a pinball machine like this: https://www.youtube.com/watch?v=64e7TQ5uj8g

The effect is fantastic.

RobertoG 5 years ago | |

I wonder if this could be use for better videoconferencing.

yunusabd 5 years ago |

Not exactly the same, but I made something a while ago that takes a 2D image and tries to infer a depth map to create a 3D effect: https://awesomealbum.com/depth

It's based on [1] and runs entirely in the browser, allthough it takes a moment to create the depth map. It's more of a toy project at this point. But I was surprised when I saw that Google is doing the same thing now in Google Photos [2].

[1] https://github.com/FilippoAleotti/mobilePydnet

[2] https://www.theverge.com/2020/12/15/22176313/google-photos-2...

mfDjB 5 years ago | |

This is great thanks for sharing!

yunusabd 5 years ago | | |

Glad to hear! Another thing you can try is saving the original and the depth map and creating a 3D photo from them on facebook [1] (you don't actually have to post it to see the effect). They do all kinds of things behind the scenes, so the 3D effect is a bit more pronounced.

[1] https://www.facebook.com/help/414295416095269

slingnow 5 years ago |

It cracks me up that in 2021 people are still posting fundamentally visual tools without so much as a single screenshot to help understand what it does

fish44 5 years ago |

https://munsocket.github.io/parallax-effect/examples/deepvie... here is a different library with a demo

phoe-krk 5 years ago |

Do you have any examples?

vmception 5 years ago | |

that was a nice way of saying it

phoe-krk 5 years ago | | |

I don't know what's the "it" that you mean. I simply think that this piece of work would greatly benefit from some image or video examples of the technique being applied in practice, especially to help people who are not very acquainted with 2D and 3D image processing.

For instance, I can only imagine right now how this technique works; I'd rather not leave that to my very imperfect imagination.

Moosdijk 5 years ago |

How do you calculate the angle between the persons eyes and the screen, in order to render the parallax effect?

pbhjpbhj 5 years ago | |

They mention how they do that if you RTFA, fwiw.

I only skimmed it, but looks like they track midpoint of eyes (having rejected pupil tracking), and spacing (to get viewer distance), and they use a webcam to do it.

Moosdijk 5 years ago | | |

That's what I read too. The reason I asked is because "either the distance on the first frame for the case interface : False or the distance when the set depth button is pressed for the case interface : True" is not entirely clear to me.

When I was working on such a project, I had no way of correctly guessing the distance between the screen and eyes.

dannyw 5 years ago |

Please give us some examples.

chrisseaton 5 years ago |

What does XP mean in this context?

xaedes 5 years ago | |

experience

criddell 5 years ago |

The iPhone and iPad have motion tracking tied to the acceleration of the device. Is this a similar effect except it's based on head movement rather than device movement?

user-the-name 5 years ago | |

iOS can also do head and eye tracking along with the motion detection, and combine both automatically.

nojvek 5 years ago |

Would be great if README had screenshots. I’m not entirely sure what this does.