Windows system >> Linux system Tutorial >> About Linux

Linux vim shows utf-8 document garbled how to do?

In the Linux system operation certificate, Vim is a text editor. When using Vim, it actually displays utf-8 documents garbled. How to solve this situation? The following small series will tell you how Linux solves the garbled problem of Vim display UTF-8 documents. Let's take a look.

1. basic knowledge introduction

In Vim, there are four options related to the encoding, they are: fileencodings, fileencoding, encoding and termencoding. In actual use, any one of the options will cause garbled characters. Therefore, every Vim user should be clear about the meaning of these four options. Below, we will detail the meaning and role of these four options.

(1) encoding

encoding is the character encoding method used internally by Vim. When we set the encoding, all the buffers, registers, strings in the script, etc. inside Vim all use this encoding. When Vim is working, if the encoding is inconsistent with its internal encoding, it will first convert the encoding into an internal encoding. If the working code contains characters that cannot be converted to internal encoding, these characters are lost. Therefore, when choosing Vim's internal encoding, be sure to use a code with sufficient performance to avoid affecting normal operation.

Since the encoding option refers to the internal representation of all characters in Vim, it can only be set once when Vim is started. Modifying encoding during Vim's work can cause a lot of problems. It is recommended in the user manual to change its value only in .vimrc. In fact, it seems that it only makes sense to change its value in .vimrc. If there is no special reason, always set the encoding to utf-8. In order to avoid garbled menus and system prompts on non-UTF-8 systems such as Windows, you can do these settings at the same time:

set encoding=utf-8

set langmenu=zh_CN .UTF-8

language message zh_CN.UTF-8

(2) termencoding

termencoding is the code that Vim uses for on-screen display. When displayed, Vim will Convert the internal code to screen code and use it for output. When an internal code contains a character that cannot be converted to a screen code, the character becomes a question mark, but does not affect the editing operation on it. If termencoding is not set, then directly use encoding without conversion.

For example, when you log in to a Linux workstation via telnet under Windows, because Windows telnet is GBK encoded, and Linux uses UTF-8 encoding, you will be garbled in Vim under telnet. . At this point, there are two ways to eliminate garbled characters: one is to change Vim's encoding to gbk, the other is to keep encoding as utf-8, change termencoding to gbk, and let Vim transcode when displayed. Obviously, when using the former method, these characters will be lost if they encounter characters in the edited file that cannot be represented by GBK. However, if the latter method is used, although these characters cannot be displayed due to the limitation of the terminal, these characters are not lost during the editing process.

For GVim under the graphical interface, its display does not depend on TERM, so termencoding has no meaning for it. In GVim under GTK2, termencoding is always utf-8 and cannot be modified. GVim under Windows ignores the existence of termencoding.

(3)fileencoding

When Vim reads a file from disk, it will detect the encoding of the file. If the file is encoded differently than Vim's internal encoding, Vim will convert the encoding. After the conversion is complete, Vim will set the fileencoding option to the encoding of the file. When Vim is saved, if encoding and fileencoding are different, Vim will perform encoding conversion. Therefore, by setting fileencoding after opening the file, we can convert the file from one encoding to another. However, as can be seen from the previous introduction, fileencoding is automatically set by Vim when the file is opened. Therefore, if garbled, we can't correct the garbled by resetting fileencoding after opening the file.

In short, fileencoding is the character encoding of the currently edited file in Vim. Vim also saves the file as this character encoding when saving the file (regardless of whether it is a new file or not).

(4) fileencodings

The automatic recognition of the code is achieved by setting fileencodings, pay attention to the plural form. Fileencodings is a comma-separated list, with each item in the list being an encoded name. When we open the file, VIM uses the encoding in fileencodings to try to decode it. If it succeeds, it uses the encoding to decode and set fileencoding to this value. If it fails, we will continue to test the next encoding.

Therefore, when setting fileencodings, we must put the coding method that is stricter when the file is not this code is more likely to occur, and put the loose coding method behind. For example, latin1 is a very loose encoding method. Any text obtained by encoding will be decoded with latin1, and no decoding failure will occur. —— Of course, the result of decoding is naturally taken for granted. ”. Therefore, if you put latin1 in the first place of fileencodings, it is a matter of course to open any Chinese file that is garbled.

The following is a fileencodings setting recommended online:

set fileencodings=ucs-bom, utf-8, cp936, gb18030, big5, euc-jp, euc-kr, latin1

Among them, ucs-bom is a very strict encoding, and it is almost impossible for the encoded file to be misjudged as ucs-bom, so it is placed first.

utf-8 is also quite strict, except for very short files (for example, many people relish the GBK coded "Unicom", which was misjudged as a classic error in UTF-8 encoding), real life The general file is almost impossible to be misjudged, so it is placed in the second place.

Next is cp936 and gb18030. These two codes are relatively loose. If you put them in front, there will be a lot of misjudgments, so let them fall behind. The encoding space of cp936 is smaller than gb18030, so put cp936 in front of gb18030.

As for big5, euc-jp and euc-kr, their rigor is similar to that of cp936. Put them behind, there will be a lot of misjudgments when editing these encoded files, but this is Vim built-in. There is nothing that can be solved by the code detection mechanism. Since Chinese users rarely have the opportunity to edit these encoded files, we decided to put cp936 and gb18030 in front to ensure the identification of these codes.

The last is latin1. It is an extremely loose code, so we have to put it in the last place. But unfortunately, when you encounter a file with a true latin1 encoding, in most cases, it has no chance to fall-back to latin1, which is often misjudged in the previous encoding. However, as mentioned earlier, Chinese users do not have much access to such documents.

If the code is misjudged, the decoded result cannot be recognized by humans, so we say that the file is garbled. At this point, if you know the correct encoding of this file, you can use ++enc=encoding to open the file when opening the file, such as:

:e ++enc=utf-8 myfile. Txt

The above is the Linux solution to Vim display utf-8 document garbled method introduced, after the garbled problem, you can solve by re-setting fileencodings, I hope to help you.