View file encoding, file encoding format conversion and file name encoding conversion under Linux

  

If you need to manipulate files under Windows in Linux, you may often encounter problems with file encoding conversion. The default file format in Windows is GBK (gb2312), while Linux is generally UTF-8. Here's how to view the encoding of a file and how to encode and convert a file in Linux. Viewing file encodings There are several ways to view file encodings in Linux: 1. You can view the file encoding directly in Vim: set fileencoding to display the file encoding format. If you just want to see files in other encoding formats or if you want to solve the problem of garbled files with Vim, you can add the following to the ~/.vim rc file: set encoding=utf-8 fileencodings=ucs-bom,utf- 8, cp936 This way, you can let vim automatically recognize the file encoding (can automatically identify UTF-8 or GBK encoded files), in fact, according to the code list provided by fileencodings try, if you do not find the appropriate encoding, use latin-1 ( ASCII) encoding is turned on. File encoding conversion 1. Convert file encoding directly in Vim, such as converting a file to utf-8 format: set fileencoding=utf-8 2. iconv conversion, iconv command format is as follows: iconv -f encoding -t encoding inputfile For example, convert a UTF-8 encoded file into GBK encoding iconv -f GBK -t UTF-8 file1 -o file2 File name encoding conversion: Copy files from Linux to Windows or copy files from Windows to Linux, sometimes Chinese files appear In the case of garbled characters, the reason for this problem is because the file name of Windows is defaulted to GBK, and the default file name in Linux is UTF8. Due to inconsistent encoding, the file name is garbled and the problem is solved. The file name needs to be transcoded. In Linux, a tool convmv is specially provided to convert the file name encoding. The file name can be converted from GBK to UTF-8 encoding, or from UTF-8 to GBK. First look at whether convmv is installed on your system. If it is not installed, use: yum -y install convmv to install. Let's take a look at the specific usage of convmv: convmv -f source code -t new code [options] file name Common parameters: -r recursive processing subfolder --notest Real operation, please note that by default, the file is not true Operational, but only trial. --list shows all supported encodings --unescap can be escaping, such as turning %20 into a space. For example, we have a utf8 encoded filename and convert it to GBK encoding. The command is as follows: convmv -f UTF-8 -t GBK --notest utf8 encoded file name After this conversion, the "utf8 encoded file name" will be converted to GBK encoding (just the file name encoding conversion, the file content will not change) vim encoding mode settings and all Like popular text editors, Vim can easily edit a variety of character-encoded files, which of course includes popular Unicode encodings such as UCS-2 and UTF-8. However, unfortunately, like many software from the Linux world, this requires you to set it yourself. Vim has four options related to character encoding, encoding, fileencoding, fileencodings, termencoding (see the Vim online help: help encoding-names for possible values). Their meanings are as follows: * encoding: Vim internal use Character encoding, including Vim's buffer, menu text, message text, and more. The default is based on your locale selection. It is recommended in the user manual to change its value only in .vimrc. In fact, it seems that it only makes sense to change its value in .vimrc. You can edit and save the file with another encoding. For example, your vim encoding is utf-8, the edited file is encoded with cp936, and vim will automatically convert the read file into utf-8 (vim can read Understand the way), and when you write the file, it will automatically be converted back to cp936 (the file's save code). * fileencoding: The character encoding of the currently edited file in Vim, Vim will also save the file when saving the file. For this type of character encoding (regardless of whether it is a new file or not). * fileencodings: Vim automatically detects the sequential list of fileencoding. At startup, it will detect the character encoding of the file to be opened one by one according to the character encoding method listed in it, and set fileencoding to the final detected character encoding. So it's best to put the Unicode encoding at the top of the list and put the Latin encoding method latin1 to the end. * termencoding: The character encoding of the terminal that Vim works on (or the Console window of Windows). If the term in which vim is located is the same as the vim encoding, no setting is required. If not, you can use vim's termencoding option to automatically convert to the term encoding. This option is not valid for gVim in our usual GUI mode under Windows, and is the code page for the Windows console for Vim in Console mode, and Usually we don't need to change it. Ok, let's explain the bunch of parameters that make it easy for novices to get confused. Let's see how Vim's multi-character encoding support works. 1. Vim starts, and sets the character encoding of buffer, menu text, and message text according to the value of encoding set in .vimrc. 2. Read the files that need to be edited, and detect the file encoding method one by one according to the character encoding method listed in fileencodings. And set fileencoding to be detected, it seems to be correct (Note 1) character encoding. 3. Compare the values ​​of fileencoding and encoding. If they are different, call iconv to convert the file content to the character encoding described by encoding, and put the converted content into the buffer opened for this file. Now we can start editing. This file is gone. Note that to complete this step you need to call the external iconv.dll (Note 2). You need to ensure that the file exists in $VIMRUNTIME or other directories listed in the PATH environment variable. 4. When saving the file after editing is complete, compare the values ​​of fileencoding and encoding again. If it is different, call iconv again to convert the text in the buffer to be saved to the character encoding described by fileencoding and save it to the specified file. Again, this requires calling iconv.dll because Unicode can contain characters from almost all languages, and Unicode's UTF-8 encoding is a very cost-effective encoding (small space consumption is less than UCS-2), so the value of encoding is recommended. Set to utf-8. Another reason to do this is that when encoding is set to utf-8, Vim automatically detects how the file is encoded more accurately (perhaps this reason is the main one ;). The files we edited in Chinese Windows, in order to balance the compatibility with other software, the file encoding is still set to GB2312/GBK is more appropriate, so the fileencoding recommendation is set to Chinese (chinese is the individual name, in Unix, gb2312, in Windows) Cp936, which is the code page of GBK).

Copyright © Windows knowledge All Rights Reserved