AI Writing Tools for Zettlr?

hendrik · 2024-12-31T15:30:00+00:00

Hello everyone, I would like to start a discussion regarding AI integration into Zettlr. This has no priority right now, because (a) the community hasn't ye...

IgnacioHeredia

hendrik
Indeed, didn't thought about it, but word embeddings should to the job indeed (and probably faster 🚀️).

In addition, I found two "AI-powered" open-source search tools in case you might want to take a look at them: Meilisearch and Typesense. They seem to have a decent amount of language integrations.

But simplicity often works out, so maybe a custom word embeddings implementation could work even better, who knows. 😉️

hendrik

Yes, I think simple word embeddings are already miles better than regular search for such instances. The main problem is rather that people always fly on the next new cool thing, which now are LLM embeddings.

But when it comes to word embeddings … the code is there: nathanlesage/node-sgns

professed

@IgnacioHeredia, @hendrik +1 to the idea of adding word embeddings! I love the idea.

@sensologica's views on AI track my own very closely, but I do have colleagues I respect who come down on the other side of this, so I mostly treat opting out as a personal decision. As someone who teaches about the politics of technology, though, I do very much agree that design/software is a political act, and I appreciate @sensologica's highlighting that the decision to integrate generative AI would be to take a non-ambiguous position here.

Perhaps a middle path would be to double down on Zettlr's capacity to integrate shell scripts — similar to code editor features like "filter through command" or "pipe to terminal." If someone wants to use these tools with a shell script that calls an AI tool, that's their decision. And for everyone else, it opens up the possibility of writing ordinary shell scripts that might perform useful transformations on their text (without introducing a whole plugin architecture that, for reasons @hendrik has expressed previously, might be wisely avoided). The Custom export commands feature is already a start here.

hendrik

I appreciate your concerns, and @sensologica‘s. I value having a critical community that voices critique and expresses their opinions. However, I believe that this stance is a rather privileged one and one that I personally cannot afford to have.

Essentially — if I understand you correctly — you are saying that for now it’s best to stay by the wayside and let the AI wave ebb away. But there is a counter argument to make to this point.

Since you mentioned that developing always involves political decisions — and I emphatically agree with this — here are the options we have to make a decision for Zettlr.

First: Do nothing, as you have suggested. If someone wants to integrate genAI into their workflows, users can already do so. Second: embrace genAI and integrate it into the app in useful ways, that we are discussing here.

The first decision means that we leave the genAI space to greedy corporations like OpenAI. But there are also many smaller startups that try to make a quick buck with the current AI wave. Users will see their advertisement, and use them. We can already see the fallout from this with students cheating on essay assignments and the powerlessness of instructors to reign this in.

The other option, however, would mean something similar to Apple: don’t just create roadkill after roadkill by unleashing underbaked technology onto the world, but think about proper ways of making genAI work for us. This would then have educational aspects, showing users how genAI can actually work beyond chatting with some cool chatbot.

I believe that I have the responsibility to make the latter decision if I want to be true to my values. In addition, by including access to Open approaches, not closed ones, we can support proper approaches (think of why I integrated Zettlr with LanguageTool, but not grammarly).

I believe that not integrating genAI into Zettlr would be a grave mistake. Other apps already feature this, but in often haphazard ways where genAI can actually cause harm. I aim to include genAI into Zettlr in a way that has educational aspects as well as supporting open AI and not leave the space to actors whom I believe do not attempt to mold genAI into an actual tool and remain in the sandbox stage.

Lastly, there is certainly a personal bias of mine in there. I work with generative AI every day, both professionally and personally. I observe my field as it embraces gen AI, and many researchers do use genAI in haphazard ways. But, journals are publishing their work, because they don’t see too much wrong with it. I believe the only way to make genAI good is by acting, not by waving from the docks as the ship sets sail.

Does this make sense to you…?

professed

I get what you're saying and I'm not naive about the sophistication or trajectory of generative AI — as I say, I have colleagues that are using it. And while I don't use it in my writing, I tinker with it enough to competently discuss it in the media-industries and technology-and-culture courses that I teach. I don't think it's going to ebb away.

I think the way you've added LanguageTool could be a model for incorporating AI writing tools. The integration is available in the settings menu, but unless the user activates it, it's not part of their experience of the software. As I say, I'm not going to use the tool for my own writing, so I'd prefer not to have AI options cluttering the user interface unless I actually request them in the software preferences.

What I definitely don't want is the Clippy-style pop-up that's appearing every other app I use, constantly nagging me to try new AI features or start a chat with the AI assistant. If you think Zettlr needs AI options to stay competitive, I can understand that. But I appreciate that Zettlr is flexible enough for us each to find our own work styles within it, and I wouldn't want AI to become an aggressive part of the interface, such that it becomes less convenient to use the software without engaging with it.

hendrik

professed What I definitely don't want is the Clippy-style pop-up that's appearing every other app I use, constantly nagging me to try new AI features or start a chat with the AI assistant.

Excuse me, what? 😃 I mean, I certainly do like clippy, but shoving it into one‘s face is really bad.

I think I now understand your reservations better. And I agree that any AI integration in Zettlr should be exactly as you say: available if users want it, but otherwise invisible. Apologies that I didn’t see that your primary fear was getting it shoved in your face, which I‘d never do, because I try to be nice to people 😃

GeorgEhaase

I've been using Zettlr for a while, and integrating AI tools has been a game-changer for my workflow. For generating text efficiently, I've experimented with different approaches, including combining it with an ai outfit description generator. The key is refining prompts to get the best results without needing excessive edits later.

mononym

One AI application I find extremely useful for writing is Text-to-Speech. Maybe it is counterintuitive, but it helps to switch roles during the process of writing: a step "from authorship to readership" (Something I heard over and over as a useful writing advice). That is by listening to what's writen, kind of in third-person mode. It could be as well that it is maybe a step towards accessibility and makes writing a bit more barrier free. The case of accessibility could also help to find a guideline for integrating AI.

I tried to implement TTS "via the command line". The problem was that it was difficult to pass something from Zettlr to another program. I had to abbandon the idea to read from the clipboard because of Markdown. Then the issue was feeding something to the TTS model which was not formatted in markdown (plain text), even thoug that results in losing some subtilities. Some of it is described here on Discord: https://discord.com/channels/609436111860793355/1168975313942888478

P.S. Thanks so much for opening this forum! I'm done with Discord (:

professed

mononym I agree it'd be useful to have some TTS functionality, as I often find that having a TTS app read my document to me to be helpful for proofreading.

If you're looking for an immediate workaround, you can pass a file from Zettlr to a command line application using a custom export command, which you can set up in Preferences > Import and Export > Custom Export Commands.

As an experiment, in response to your post, I created a custom command, Speak using the command, espeak -f and when I select this export option it indeed uses the espeak command line application to speak the document open in the active tab. There are nicer command line TTS apps, like mimic and piper, as well as ones that connect to APIs of AI TTS engines. So there would seem to be a lot of possibilities.

Obviously, this isn't a perfect workaround, though. There are no playback controls, since the command is launched in the background, and if I want the command-line application to stop speaking, I have to kill the process in a terminal window (e.g., killall espeak), which is less than ideal.

One possibility would be to have the custom export command launch the a terminal window and run the command therein. For example, xterm -e espeak -f or gnome-terminal -- espeak -f. All kludges, but in the absence of a more integrated feature these could be temporary workarounds.

professed

mononym Update: This is pretty slick, actually. The mimic command-line application uses a pretty decent natural-language voice, so I went with gnome-terminal -- mimic -f. This launches the TTS in a terminal window, so I can hit Ctrl-C to terminate it when I'm done with the playback. I'd just add that mimic, like these other command-line TTS apps, has an option to output to an audio file (the -o flag), so it's possible to do an actual export to an audio file and have it play back in your preferred audio player app. You could probably even chain all these commands, so that it launches your favorite MP3/WAV player on export with the TTS output queued up.

professed

mononym Voila! Got it fully working on Linux. Here's what I did:

I used mimic as my TTS command-line application and vlc as my GUI audio player.
I wrote wrote the following bash script, which I titled zet-speak.sh:

#!/bin/bash

mimic -f $1 -o /tmp/tts-output.wav; vlc /tmp/tts-output.wav

I made the bash script executable (chmod +x /path/to/zet-speak.sh).
I created a custom export command in Zettlr by going to Preferences > Import and Export > Custom Export Commands. For the command's display name, I used Speak and for the Command field, I entered /absolute/path/to/zet-speak.sh.

Now, when I want to have a document read aloud, I select Speak from the export menu and it launches VLC with the playback loaded. I can use VLC's playback controls to play, pause, rewind, or adjust the playback speed of the narration.

Because the exported audio is saved to /tmp, it doesn't take up permanent space on my drive. If I do want to save it for later, I can use VLC's save functionality to make a permanent copy.

YMMV on Windows or Mac, but I imagine it's possible to set up a very similar workflow there.

mononym

professed This looks fantastic, thanks !
@hendrik Maybe TTS could be a discussion on its own (in Workflow) apart from the topic of Zettlr & AI ?

I'll try out the snippet with the piper library. Two considerations upfront:

Formatting: what happens with citations, footnotes, comments, etc.
Chunck size: instead of processing the whole file, it is maybe more convenient to be able to process a selection of text, maybe from a right-click context menu ?...

mononym

I added a pandoc step to the script in order to convert the markdown export into a plain text file:

#!/bin/bash

pandoc "$1" --standalone --from markdown --to plain --no-highlight | uv run piper --model ~/apps/Speak/en_GB-cori-high.onnx --output_file /tmp/tts-output.wav; vlc /tmp/tts-output.wav

N.B.: piper was installed with uv: uv tool install --python 3.10 piper-tts

professed

mononym Sweet! Will definitely try this out. Converting the file to remove markdown characters from the text prior to speaking is super smart.

professed

mononym Done! I grabbed a couple custom voices for Piper that I liked from this repo and set up a custom export for each. Works beautifully, thank you.

mononym

professed

Finally, I made a second script which converts and reads an input from selected text. The script is called with a custom shortcut Ctrl+Shift+R after having selected the part to be read.

#!/bin/bash

wl-paste --primary | pandoc --standalone --from markdown --to plain --no-highlight | piper --model ~/apps/Speak/en_GB-cori-high.onnx --output_file /tmp/tts-output.wav; vlc /tmp/tts-output.wav

N.B.: I'm on Wayland and had to install wl-clipboard, the --primary gets the selected text (primary buffer); uv tool did actually install piper to ~/.local/bin/piper.

professed

mononym This looks super cool. I spend most of my time on Xorg because of some software compatibility considerations, so I'll have to look into an Xorg equivalent. Thank you for this, looks super handy.

professed

mononym Used your script for reading input from selected text. Was able to simply substitute xsel for wl-paste.

Also, as another quality of life improvement, I added the --one-instance flag to vlc in all my scripts, so it doesn't spawn a new instance of the player every time I hit the hotkey or do an audio export.

denverhandy

As a writer that left a career of IT, the latter years managing teams, I was forced out voluntarily for not embracing Ai. Not for lack of merit but was disgusted with the deception and the meritocracy it manifested and lack of guardrails. I'm writing a fiction novel and have no intention of using Ai shortcuts in my work. I could not agree more with sensologica in his perception & motivation. Best of luck gentlemen.

sensologica

hendrik

I feel validated by opening up this discussion instead of just dumping AI stuff into the app without thinking; here's a nice video that summarizes my feelings towards the AI slop heading our way. (But please ignore the sponsor segment at the end, the video sponsor is a really, really disturbing company.)

In general, I found his discussion of email summaries in Outlook interesting, because here's why I personally like Apple's stance towards AI. The email summaries in my mailbox are entirely automatic, the computer just produces them, and then even transactional emails of the "someone has shared something with you" type actually benefit from it. That is because the summaries usually mean I don't even have to open the email at all. I generally feel that Apple has a better vision for AI than MS or any other company, and that's what I'm hoping to achieve with any Zettlr AI integration, if that ever comes.