So, imagine you’re in class or a meeting, and instead of taking notes manually, you could just turn on dictation and Windows will type everything for you. You don’t have to install any additional software. Windows has an in-built speech-to-text tool that you can use to type whatever you say. That makes them an awful lot like secrets. Okay, maybe they aren’t really “secret”, but not too many people know them. This can be used for user defined arguments which configuration scripts may read from the has a lot of secret tools. Level 2: report internal details (may be noisy).Level 1: report top level actions (dictation started, suspended.Verbosity level, defaults to zero (no output except for errors) STDOUT Bare stdout with Ctrl-H for backspaces.įor help on setting up ydotool, see readme-ydotool.rst in the nerd-dictation repository.YDOTOOL Compatible with all Linux distributions and Wayland but requires some setup.DOTOOLC Same as DOTOOL but for use with the dotoold daemon.DOTOOL Compatible with all Linux distributions and Wayland.XDOTOOL Compatible with the X server only (default).Program used to simulate keystrokes (default). STDOUT print the result to the standard output.īe sure only to handle text from the standard outputĪs the standard error may be used for reporting any problems that occur.SIMULATE_INPUT simulate keystrokes (default).Method used to at put the result of speech to text. See -pulse-device-name option to use a specific pulse-audio device.įor help on setting up sox, see readme-sox.rst in the nerd-dictation repository. Specify input method to be used for audio recording. Suppress number suffixes when -numbers-as-digits is specified.įor example, this will prevent "first" from becoming "1st". This provides for more formal writing and prevents terms like "no one" Minimum value for numbers to convert from whole words to digits. This is also used to add either a comma or a full stop when dictation is performed under theĬonvert numbers into digits instead of using whole words. This can be useful so punctuation it is added before entering the dictation(zero disables). The time-out in seconds for detecting the state of dictation from the previous recording, Where resume/suspend is used for dictation instead of begin/end. Intended for use when nerd-dictation is kept open Start the process and immediately suspend. This can be useful so "push to talk" setups can be released while you finish speaking The time to continue running after an end request. The default value is 0.1 (processing 10 times a second), which is quite responsive in practice Setting to zero is the most responsive at the cost of high CPU usage. Time to idle between processing audio from the recording. This can be used to avoid having to explicitly exit (zero disables). Time out recording when no speech is processed for the time in seconds. Only used when -defer-output is disabled.
Without this enabled, the entirety of this dictation session will be processed on every update. This prevents text being typed during speech (implied with -output=STDOUT)Įnable this option, when you intend to keep the dictation process enabled for extended periods of time. When enabled, output is deferred until exiting. The sample rate to use for recording (in Hz). See the output of "pactl list sources" to find device names (using the identifier following "Name:"). The name of the pulse-audio device to use for recording. See vosk_recognizer_new_grm in the API reference: This restricts the phrases recognized by VOSK forīetter accuracy.
Use an empty string to prevent the users configuration being read. Override the file used for the user configuration.
Location for writing a temporary cookie (this file is monitored to begin/end dictation). This creates the directory used to store internal data, so other commands such as sync can be performed. See nerd-dictation begin -help for details on how to access these options.
While suspended all data is kept in memory and the process is stopped.Īudio recording is stopped and restarted on resume. In this case suspend/resume can be useful. Suspend/Resume Initial load time can be an issue for users on slower systems or with some of the larger language-models, User Configuration Script User configuration is just a Python script which can be used to manipulate text using Python's full feature set. Output Type Output can simulate keystroke events (default) or simply print to the standard output. (without an explicit call to end which is otherwise required). Time Out Optionally end speech to text early when no speech is detected for a given number of seconds. So Three million five hundred and sixty second becomes 3,000,562nd.Ī series of numbers (such as reciting a phone number) is also supported. Optional conversion from numbers to digits. Specific features include: Numbers as Digits