Voice Assistant for VSCode(github.com) |
Voice Assistant for VSCode(github.com) |
If you want something more full features that works everywhere, I’ve used https://talonvoice.com/ for a while now.
I was able to get a couple of simple commands to work in Chrome, sometimes, such as "reload" and "show history". In Visual Studio code, it just spouted a bunch of errors in the console [1], and in JetBrains Rider all it would do it type gobbledygook, like a cat had walked on the keyboard or something. Pretty dissapointing :(
The logs also fill up with "WARNING actions: skipped because they have no matching declaration: (user.select_next_token)".
It was a bit confusing to use too (apart from not really working, I mean!), as it wasn't clear if I had to use some kind of command to enable voice commands, or if it was litening all the time. Eventually I figured out that it seems to be the latter, but still, it's not clear what commands it has heard and understood - I found myself speaking and nothing was happening, and I had no idea what it had understood. Similarly, I'd say something like "close tab", and it would type some nonsense like "aa&" into the current file - again, no idea what command it was actually trying to use.
[0] https://github.com/knausj85/knausj_talon [1] "No such file or directory: 'C:\\Users\\MyUser\\AppData\\Local\\Temp\\vscode-port'"
It's a tool to be learned and practiced, it's not fully optimized for out of the box experience (yet), currently more optimized for customization and total control by people who have the time and motivation to go hands free (e.g. due to limited motor function).
This is what it can look like if you practice a bit: [2]
---
Some recommendations:
- say "say hello world"
- say "help alphabet"
- say "help context"
- say "command history"
- say "dictation mode" then speak freely, then say "command mode"
- Try chaosparrot's Talon Practice [3]
[1] https://talonvoice.com/chat
[2] https://twitter.com/lunixbochs/status/1378159234861264896
[3] https://chaosparrot.github.io/talon_practice/lessons/formatt...
It probably is worth for physically impaired people (but i fear what 6hrs daily of this will do to their vocal cord). I am more interested in BCI technology which is where i see the future.
https://talonvoice.com/update/pgUuEYK3vzmYQtF2PMgOyK/appcast...
For years people would always comment “I can type faster”, not realizing that we should be able also make it smarter than word by word, or character.
Notice this guy is also using his “hat” as a pointing device
The voice commands are also cool but needing to pause between each one seems like a huge drawback, compared to typing where I can just blaze through.
My first thought was that our eyes and hands do all the work; our mouth and ears are untapped resources in the quest to become true 100x engineers ;)
All joking aside, I am interested in how well this might work outside of a11y use-cases. Speaking is just so natural. It doesn't have to be used exclusively but I do want to find out if there are cases where it's just nicer to say a command during coding than remembering all kinds of keyboard shortcuts. I always wonder if a more hybrid approach of using touch, speaking and typing for various situations could feel better than keyboard all the way.
Our upper primate brains are actually MUCH better at pattern matching than reading!
Don't think only voice coding for enabled, becoming standard anytime soon.
It's by a developer who developed RSI and had to find another way to write code. He uses a combination of Dragon and custom Python scripts to control Emacs.
The fascinating bit for me was the language he created around text navigation and manipulation. Lots of custom short words to optimise the amount of speaking he actually had to do.
Really worth a watch for anyone interested in this. If you want a quick demo, this part of the video is fairly representative: https://youtu.be/8SkdfdXWYaI?t=1034
I’ve only taken it for a test run but it seems really good and smooth.
Guess which one I'm rooting for. :-)
Or does it use some windows dictation api?
Ray Kurzweil’s predictions are taking longer than expected
https://singularityhub.com/2015/01/26/ray-kurzweils-mind-bog...
That’s why he was forced to find other solutions
Eye tracking would be cooler but keyboard/mice alternatives are slow to appear
I'm hope you don't think I'm trying to be mean, it was just a really disappointing first experience. Might be my expectations were misaligned.
I had actually already installed the VSCode plugin and restarted both VSCode and Talon (I don't remember if I saw it in a comment in a .talon file, or if I saw it in the console logs, but somewhere it told me to install a plugin). Similarly, I installed the Rider/Idea.
I wasn't quite just saying anything :) Tho it wasn't clear what commands were actually accepted; from looking in the .talon files, I didn't see anything like a string literal, more something code-like. I had to guess at what commands were supported, for example "reload" in Chrome (assuming that's actually a correct command, even that simple command only worked some of the time).
I'm willing to give it another go, but is there a getting starting guide/tutorial for how to get started using it, and how to see what it's actually trying to do when it does something? I used the getting starting guide on the Talon site, but that only tells me how to install it, not actually how to use it.
For browser stuff, I'm actually really disappointed with how the "actions implemented in talon files" feature turned out in practice (which is the code-like action(...): syntax you saw), and I'm planning to deprecate it, which should clear that up a bit. Browser commands come from places like generic_browser.talon and tabs.talon. Looks like reload is "reload it"
Besides the tips I gave and the chaosparrot practice, knausj_talon does also have a getting started section in the readme https://github.com/knausj85/knausj_talon#getting-started-wit...
You can get significantly more insight into what's happening by both saying "command history", and opening the repl and running `events.tail()` in it.
There's also a much more accurate speech engine in beta right now, which will be released soon, but I suspect most of the confusion wasn't accuracy related.
Something else I wanted to ask about - does the voice recognition engine (either wav2letter or the new one you mention) adapt/learn according to the individual using it? I have a fairly strong Scottish accent, and would prefer to speak naturally if possible.
I view fully automatic online training as a sort of anti-pattern - Dragon does that and it will randomly forget entire words. Talon may eventually have some kind of process for self-serve model training. I do have some plans for what that might look like.
Even without automatic model training there's already a feature to automatically create a sort of "personal dataset" as you use Talon, which you can use to train speech models (Talon or otherwise) down the line, or even send me to improve the main model.
I also found that the microphone makes a big difference - with my Plantronics Voyager Legend bluetooth headset, it was basically unusable, misunderstanding almost everything I said. But if I used a cheap Logitech USB headset that I've had for a decade, alphabet accuracy was good.
Something else is that it does seem to struggle a bit with my accent. For example, with the alphabet I would say "air", and 75% of the time it would hear `oh`/`near` - however, if I said "air" in an American accent, it heard `air` correctly every time.
Will be interesting to see how your new engine fairs when it's released.
The wiki is something I currently try to introduce folks to later in the process, because it's unofficial and historically has had assumptions, inaccuracies, or very outdated information that caused me additional stress/support load. I know the community has been working on improving that.
Bluetooth mics are almost universally worse than cheap wired mics, due to bandwidth/power/compression constraints. If you make a file user/settings.talon containing "settings(): speech.record_all = 1", Talon will record successful utterances to recordings/ adjacent to user/, and you can compare what the mic sounds like to Talon. It's also very likely the mic works better with Conformer.
The alphabet is pretty easy to change. Check out the top of keys.py. There are some words that aren't really the engine's fault when it comes to accent, and some pairs like air/near are more of a configuration issue if your accent doesn't differentiate them.
I'm hoping to release v0.2 with Conformer sometime around July 1