We switched to a manual approval system for registration on this forum, as this makes it easier to keep bots out of the forum. We'll approve everybody who doesn't look like a bot!

Support of character sets

As a first try, I decided to open a .txt file I had made with a text editor (Windows) with Zettlr. I noticed that all the accented character became diamonds with a question mark. I exited Zettlr, and it silently saved the result of teh edition session, meaning ... diamons instead of "é" or "à" (I wrote it in French). Any luck this could be reversed ? Could Zettlr include an automatic recognition of the character codes (UTF-8 vs. Unicode vs. ...) in an next version ?

Comments

  • Puhh… there once was a great talk by a Python engineer that you basically can't really guess the character set of some input file and have to rely on wild guesses one way or the other. I opt for only supporting UTF-8-files.

    Why? First, UTF-8, sometimes coming with mb4-support in the context of databases, contains all Glyphs that are known to the world (this is basically the fundamental reason for why Unicode was created in the first place). Second, UTF-8 support is now given in all software suites and operating systems. And third, it's the most compatible, as nearly everywhere software falls back to UTF-8 (i.e. wildly guessing the character set will mean most softwares assume UTF-8). Interestingly with one exception: Windows, which also doesn't do it out of bad faith, but simply because Microsoft wants to provide a weird backward-support for users having created Textfiles with Windows 95 or something like that. The ANSI- and Western-character sets are simply outdated, and except the Windows Notepad, I don't know of any software that is by standard saving files without UTF-8.

    So to answer your question in short: No. Zettlr will always without exception save files using UTF-8 character encoding. If you have files using a different character set, you'd first need to convert them. In some way this argument goes towards what I've been reiterating over and over on Twitter on why I won't support 32 bit systems: It's simply old, and if you rely on 32 bit, you're not only endangering your data, but it's also not necessary anymore.

    I'm not Apple so I won't introduce weird formats or something else that's "trendy" but not supported, but I'll always use the newest/most compatible/best-choice solutions to problems, and only supporting UTF-8 also belongs into that realm, just as only supporting 64 bit.

  • Fair enough. I'm not using Notepad, but Notepad++ and I didn't even check it was apparently choosing ANSI by default (but supporting UTF-8 as well without issue).

Sign In or Register to comment.