AI Writing Tools for Zettlr?

hendrik · 2024-12-31T15:30:00+00:00

Hello everyone, I would like to start a discussion regarding AI integration into Zettlr. This has no priority right now, because (a) the community hasn't ye...

hendrik

professed What I definitely don't want is the Clippy-style pop-up that's appearing every other app I use, constantly nagging me to try new AI features or start a chat with the AI assistant.

Excuse me, what? 😃 I mean, I certainly do like clippy, but shoving it into one‘s face is really bad.

I think I now understand your reservations better. And I agree that any AI integration in Zettlr should be exactly as you say: available if users want it, but otherwise invisible. Apologies that I didn’t see that your primary fear was getting it shoved in your face, which I‘d never do, because I try to be nice to people 😃

GeorgEhaase

I've been using Zettlr for a while, and integrating AI tools has been a game-changer for my workflow. For generating text efficiently, I've experimented with different approaches, including combining it with an ai outfit description generator. The key is refining prompts to get the best results without needing excessive edits later.

mononym

One AI application I find extremely useful for writing is Text-to-Speech. Maybe it is counterintuitive, but it helps to switch roles during the process of writing: a step "from authorship to readership" (Something I heard over and over as a useful writing advice). That is by listening to what's writen, kind of in third-person mode. It could be as well that it is maybe a step towards accessibility and makes writing a bit more barrier free. The case of accessibility could also help to find a guideline for integrating AI.

I tried to implement TTS "via the command line". The problem was that it was difficult to pass something from Zettlr to another program. I had to abbandon the idea to read from the clipboard because of Markdown. Then the issue was feeding something to the TTS model which was not formatted in markdown (plain text), even thoug that results in losing some subtilities. Some of it is described here on Discord: https://discord.com/channels/609436111860793355/1168975313942888478

P.S. Thanks so much for opening this forum! I'm done with Discord (:

professed

mononym I agree it'd be useful to have some TTS functionality, as I often find that having a TTS app read my document to me to be helpful for proofreading.

If you're looking for an immediate workaround, you can pass a file from Zettlr to a command line application using a custom export command, which you can set up in Preferences > Import and Export > Custom Export Commands.

As an experiment, in response to your post, I created a custom command, Speak using the command, espeak -f and when I select this export option it indeed uses the espeak command line application to speak the document open in the active tab. There are nicer command line TTS apps, like mimic and piper, as well as ones that connect to APIs of AI TTS engines. So there would seem to be a lot of possibilities.

Obviously, this isn't a perfect workaround, though. There are no playback controls, since the command is launched in the background, and if I want the command-line application to stop speaking, I have to kill the process in a terminal window (e.g., killall espeak), which is less than ideal.

One possibility would be to have the custom export command launch the a terminal window and run the command therein. For example, xterm -e espeak -f or gnome-terminal -- espeak -f. All kludges, but in the absence of a more integrated feature these could be temporary workarounds.

professed

mononym Update: This is pretty slick, actually. The mimic command-line application uses a pretty decent natural-language voice, so I went with gnome-terminal -- mimic -f. This launches the TTS in a terminal window, so I can hit Ctrl-C to terminate it when I'm done with the playback. I'd just add that mimic, like these other command-line TTS apps, has an option to output to an audio file (the -o flag), so it's possible to do an actual export to an audio file and have it play back in your preferred audio player app. You could probably even chain all these commands, so that it launches your favorite MP3/WAV player on export with the TTS output queued up.

professed

mononym Voila! Got it fully working on Linux. Here's what I did:

I used mimic as my TTS command-line application and vlc as my GUI audio player.
I wrote wrote the following bash script, which I titled zet-speak.sh:

#!/bin/bash

mimic -f $1 -o /tmp/tts-output.wav; vlc /tmp/tts-output.wav

I made the bash script executable (chmod +x /path/to/zet-speak.sh).
I created a custom export command in Zettlr by going to Preferences > Import and Export > Custom Export Commands. For the command's display name, I used Speak and for the Command field, I entered /absolute/path/to/zet-speak.sh.

Now, when I want to have a document read aloud, I select Speak from the export menu and it launches VLC with the playback loaded. I can use VLC's playback controls to play, pause, rewind, or adjust the playback speed of the narration.

Because the exported audio is saved to /tmp, it doesn't take up permanent space on my drive. If I do want to save it for later, I can use VLC's save functionality to make a permanent copy.

YMMV on Windows or Mac, but I imagine it's possible to set up a very similar workflow there.

mononym

professed This looks fantastic, thanks !
@hendrik Maybe TTS could be a discussion on its own (in Workflow) apart from the topic of Zettlr & AI ?

I'll try out the snippet with the piper library. Two considerations upfront:

Formatting: what happens with citations, footnotes, comments, etc.
Chunck size: instead of processing the whole file, it is maybe more convenient to be able to process a selection of text, maybe from a right-click context menu ?...

mononym

I added a pandoc step to the script in order to convert the markdown export into a plain text file:

#!/bin/bash

pandoc "$1" --standalone --from markdown --to plain --no-highlight | uv run piper --model ~/apps/Speak/en_GB-cori-high.onnx --output_file /tmp/tts-output.wav; vlc /tmp/tts-output.wav

N.B.: piper was installed with uv: uv tool install --python 3.10 piper-tts

professed

mononym Sweet! Will definitely try this out. Converting the file to remove markdown characters from the text prior to speaking is super smart.

professed

mononym Done! I grabbed a couple custom voices for Piper that I liked from this repo and set up a custom export for each. Works beautifully, thank you.

mononym

professed

Finally, I made a second script which converts and reads an input from selected text. The script is called with a custom shortcut Ctrl+Shift+R after having selected the part to be read.

#!/bin/bash

wl-paste --primary | pandoc --standalone --from markdown --to plain --no-highlight | piper --model ~/apps/Speak/en_GB-cori-high.onnx --output_file /tmp/tts-output.wav; vlc /tmp/tts-output.wav

N.B.: I'm on Wayland and had to install wl-clipboard, the --primary gets the selected text (primary buffer); uv tool did actually install piper to ~/.local/bin/piper.

professed

mononym This looks super cool. I spend most of my time on Xorg because of some software compatibility considerations, so I'll have to look into an Xorg equivalent. Thank you for this, looks super handy.

professed

mononym Used your script for reading input from selected text. Was able to simply substitute xsel for wl-paste.

Also, as another quality of life improvement, I added the --one-instance flag to vlc in all my scripts, so it doesn't spawn a new instance of the player every time I hit the hotkey or do an audio export.

denverhandy

As a writer that left a career of IT, the latter years managing teams, I was forced out voluntarily for not embracing Ai. Not for lack of merit but was disgusted with the deception and the meritocracy it manifested and lack of guardrails. I'm writing a fiction novel and have no intention of using Ai shortcuts in my work. I could not agree more with sensologica in his perception & motivation. Best of luck gentlemen.

sensologica

hendrik

I feel validated by opening up this discussion instead of just dumping AI stuff into the app without thinking; here's a nice video that summarizes my feelings towards the AI slop heading our way. (But please ignore the sponsor segment at the end, the video sponsor is a really, really disturbing company.)

In general, I found his discussion of email summaries in Outlook interesting, because here's why I personally like Apple's stance towards AI. The email summaries in my mailbox are entirely automatic, the computer just produces them, and then even transactional emails of the "someone has shared something with you" type actually benefit from it. That is because the summaries usually mean I don't even have to open the email at all. I generally feel that Apple has a better vision for AI than MS or any other company, and that's what I'm hoping to achieve with any Zettlr AI integration, if that ever comes.

UsmanAkram

hendrik
Great points! I think AI integration into Zettlr could open up powerful workflows, especially for researchers and writers.

Some ideas that come to mind:

Inline rewriting & tone control → perfect for refining drafts quickly.

Summarization tools → very useful for parsing older notes or long research documents.

Abstract generation → even if only a rough draft, it makes starting much easier.

I also agree with you on local-first AI like Ollama. Privacy and data security matter a lot for academics, so this is the right direction.

Another thought: it would be great if Zettlr allowed custom prompt templates (e.g “Summarize into an academic abstract” vs. “Simplify for a blog”). That way, users can shape AI to their specific workflows.

By the way, if you’re interested in automation + AI workflow solutions, check this out: Zaytrics

Excited to see how this evolves!

ama

I've just recently learned about Zettlr and I'm starting to quite like. I was looking for answers to a different question and have stumbled upon this thread. I've been using Logseq for some time already, which I also like a lot, but I'm getting the feeling that I might want to move to Zettlr or perhaps use both.

Anyways, I'm telling you guys all this, because Logseq integrates AI at any point you want, whether a daily log, a dedicated page, etc., very easily and it works quite nicely. Perhaps their implementation could be of help or inspiration to add that functionality to Zettlr. As I understand it, they have implemented as a plug-in which adds functionality in a modular fashion.

As I said, I'm very new to Zettlr, but I've thought that this info could help somehow...

« Previous Page