Voice Assistant for VSCode

125 points by b4rtaz__ 4 years ago | 46 comments

learc83 4 years ago |

This looks pretty cool, but it’s windows only and doesn’t work outside of VS code.

If you want something more full features that works everywhere, I’ve used https://talonvoice.com/ for a while now.

GordonS 4 years ago | |

I just had a quick try of Talon on Windows, using wav2letter (I don't have Dragon) and the recommended scripts[0]. It... doesn't work well for me at all.

I was able to get a couple of simple commands to work in Chrome, sometimes, such as "reload" and "show history". In Visual Studio code, it just spouted a bunch of errors in the console [1], and in JetBrains Rider all it would do it type gobbledygook, like a cat had walked on the keyboard or something. Pretty dissapointing :(

The logs also fill up with "WARNING actions: skipped because they have no matching declaration: (user.select_next_token)".

It was a bit confusing to use too (apart from not really working, I mean!), as it wasn't clear if I had to use some kind of command to enable voice commands, or if it was litening all the time. Eventually I figured out that it seems to be the latter, but still, it's not clear what commands it has heard and understood - I found myself speaking and nothing was happening, and I had no idea what it had understood. Similarly, I'd say something like "close tab", and it would type some nonsense like "aa&" into the current file - again, no idea what command it was actually trying to use.

[0] https://github.com/knausj85/knausj_talon [1] "No such file or directory: 'C:\\Users\\MyUser\\AppData\\Local\\Temp\\vscode-port'"

lunixbochs 4 years ago | | |

I recommend asking about hiccups on the Slack [1]. My basic analysis is you're missing a vscode-side plugin, you might need to restart Talon once after dropping in knausj_talon, and that your statement of "it doesn't really work" comes mostly from you guessing command phrases that don't exist. It's a strict command system - you need to learn/know the commands, you can't say just anything and expect it to work.

It's a tool to be learned and practiced, it's not fully optimized for out of the box experience (yet), currently more optimized for customization and total control by people who have the time and motivation to go hands free (e.g. due to limited motor function).

This is what it can look like if you practice a bit: [2]

---

Some recommendations:

- say "say hello world"

- say "help alphabet"

- say "help context"

- say "command history"

- say "dictation mode" then speak freely, then say "command mode"

- Try chaosparrot's Talon Practice [3]

[1] https://talonvoice.com/chat

[2] https://twitter.com/lunixbochs/status/1378159234861264896

[3] https://chaosparrot.github.io/talon_practice/lessons/formatt...

xfer 4 years ago | | |

Yeah, my brief experience and impression with these voice assisted coding is they require $300 microphone and a quiet room to get acceptable level of accuracy.

It probably is worth for physically impaired people (but i fear what 6hrs daily of this will do to their vocal cord). I am more interested in BCI technology which is where i see the future.

Centigonal 4 years ago | |

Here's a good talk from a senior (now staff) engineer at Fastly who uses Talon daily: https://www.youtube.com/watch?v=YKuRkGkf5HU

deepstack 4 years ago | |

This is suppose to Free/Libre. Is the source code some where? Can't seem to find it. Also the homebrew seems to be broken.

https://talonvoice.com/update/pgUuEYK3vzmYQtF2PMgOyK/appcast...

Xevi 4 years ago | | |

Talon is not open source, but it's free to use.

dwiel 4 years ago | | |

The author of talon doesnt support homebrew installation. Other people try to add it, but it doesn't work to install that way. Have you tried the installer on the website?

Centigonal 4 years ago | | |

Talon is free/gratis - It's not open source.

melling 4 years ago | |

at least it appears to be high level.

For years people would always comment “I can type faster”, not realizing that we should be able also make it smarter than word by word, or character.

https://youtu.be/hGPNs5C1Lp0

Notice this guy is also using his “hat” as a pointing device

tmccrary55 4 years ago | | |

Neat but it seems like your neck would pay a price.

The voice commands are also cool but needing to pause between each one seems like a huge drawback, compared to typing where I can just blaze through.

traspler 4 years ago |

This is pretty nice :)

My first thought was that our eyes and hands do all the work; our mouth and ears are untapped resources in the quest to become true 100x engineers ;)

All joking aside, I am interested in how well this might work outside of a11y use-cases. Speaking is just so natural. It doesn't have to be used exclusively but I do want to find out if there are cases where it's just nicer to say a command during coding than remembering all kinds of keyboard shortcuts. I always wonder if a more hybrid approach of using touch, speaking and typing for various situations could feel better than keyboard all the way.

sumnole 4 years ago | |

Don't forget to use your feet https://news.ycombinator.com/item?id=26430466

croon 4 years ago | | |

Dance Dance Recursion

prawn 4 years ago | |

I've often imagined a voice-command layer for an OS or apps that doesn't steal focus. So, you could be working in a particular app, and be giving instructions by voice to prepare other parts of your workflow.

melling 4 years ago | | |

A voice layer on an iPad could make it a better productivity device.

elasticventures 4 years ago | |

As long as we're limited to reading western/English we're underutilizing our eyes. Try coding with Emoji & Pinyin using http://github.com/elasticdotventures/_b00t_

Our upper primate brains are actually MUCH better at pattern matching than reading!

halsom 4 years ago | |

Would be nice to use for opening files if that could be configured to work in a suitable fashion.

itsbits 4 years ago | |

hybrid approach should really be handy.

Don't think only voice coding for enabled, becoming standard anytime soon.

danpalmer 4 years ago |

There was a great talk from 2013 about coding with voice commands: https://www.youtube.com/watch?v=8SkdfdXWYaI

It's by a developer who developed RSI and had to find another way to write code. He uses a combination of Dragon and custom Python scripts to control Emacs.

The fascinating bit for me was the language he created around text navigation and manipulation. Lots of custom short words to optimise the amount of speaking he actually had to do.

Really worth a watch for anyone interested in this. If you want a quick demo, this part of the video is fairly representative: https://youtu.be/8SkdfdXWYaI?t=1034

therealunreal 4 years ago |

Looks nice! Here's the source for the server component: https://github.com/b4rtaz/voice-assistant-net-server/blob/ma...

doitLP 4 years ago |

If this this interesting, check out https://serenade.ai/

I’ve only taken it for a test run but it seems really good and smooth.

twh270 4 years ago |

Incompatible with "open plan" offices, one or the other will win.

Guess which one I'm rooting for. :-)

lazyresearcher 4 years ago |

Is this using IBM Watson for STT? Are you assuming the costs?

Or does it use some windows dictation api?

CrazyStat 4 years ago | |

It's using the Windows api.

farbodsaraf 4 years ago |

This makes it more accessible for many!

oaiey 4 years ago |

How is the multilinguality achieved? By code snippets or is there some language server support?

b4rtaz__ 4 years ago | |

You put voice-assistant.json with your snippets to the root folder of the project. You can define in this file what you want. Check this example: https://github.com/b4rtaz/voice-assistant/blob/master/media/...

oaiey 4 years ago | | |

I think the next step is getting a huge catalogue of these actions defined. We speak tons of "codes" already in our life's, learning some more to speak (programming) code will not be too hard I guess.

Xevi 4 years ago | |

I'm also wondering this. He says that it works with _every_ language, which leads me to believe that all it does is to paste in code snippets.

sandoche 4 years ago |

Looks very cool well done!

r0b05 4 years ago |

I cannot get it to recognise any commands unfortunately.

MauranKilom 4 years ago |

We have come a long way: https://www.youtube.com/watch?v=MzJ0CytAsec

melling 4 years ago | |

that’s 14 years ago. i would have expected voice programming to be solved by now. The original iPhone was introduced in 2007.

Ray Kurzweil’s predictions are taking longer than expected

https://singularityhub.com/2015/01/26/ray-kurzweils-mind-bog...