Windows system >> Linux system Tutorial >> About Linux

How to convert UTF8 encoding into GB2312 encoding in Linux

UTF8 encoding and GB2312 encoding are different. In sqlplus, importing UTF8 encoded sql script will cause garbled error. At this time, UTF8 encoding needs to be converted into GB2312 encoding, but the conversion is very troublesome. Edit teaches you how to convert UTF8 encoding into GB2312 encoding in Linux. When

Background

I bulk import sql script UTF8-encoded using the oracle sqlplus, do not know how to set due recognition to make sqlplus UTF8 format, resulting in garbled, errors such as the wrong line And so that the work can not continue, in the case of google no fruit, I had to find a way to convert the code.

Due to the large number of files, manual conversion is too cumbersome, so I thought of batch conversion with scripts. Fortunately, there are many related scripts on the Internet. The only trouble to implement is the UTF8 BOM markup.

Content:

The code is as follows:

#! /bin/bash

for loop in `find . -type f -name “*.sql” -print`do

echo $loop

mv -f $loop $loop.tmp

dos2unix $loop.tmp

file_check_utf8=‘file_check_utf8.log’

sed -n ‘1l’ $loop.tmp 》$file_check_utf810. if grep ‘^\\\\357\\\\273\\\\277’ $file_check_utf8 》/dev/null 2》&111. then

echo ‘UTF-8 BOM’

sed -n -e ‘1s/^. . //’ -e ‘w intermediate.txt’ $loop.tmp14. iconv -f UTF-8 -t GB2312 -o $loop intermediate.txt15. rm -rf intermediate.txt

rm -rf $loop.tmp

elif iconv -f UTF-8 -t GB2312 $loop.tmp 》/dev/null 2》&118. then

echo ‘UTF-8’< Br>

iconv -f UTF-8 -t GB2312 -o $loop $loop.tmp21. rm -rf $loop.tmp

else

echo ‘ANSI’

mv -f $loop.tmp $loop

rm -rf $file_check_utf8

#simulate unix2dos, requiring the last line of the text file to have a newline 28. sed -n -e ‘s/$/\ /g’ -e ‘w ’$loop.tmp $loop29. mv -f $loop.tmp $loop

done

#! /bin/bash

for loop in `find . -type f -name “*.sql” -print`do

echo $loop

mv -f $loop $loop.tmp

dos2unix $loop.tmp

file_check_utf8=‘file_check_utf8.log’

sed -n ‘1l’ $loop.tmp 》$file_check_utf810. if grep ‘^\\\\357\\\\273\\\\277’ $file_check_utf8 》/dev/null 2》&111. then

echo ‘UTF-8 BOM’

sed -n -e ‘1s/^. . //’ -e ‘w intermediate.txt’ $loop.tmp14. iconv -f UTF-8 -t GB2312 -o $loop intermediate.txt15. rm -rf intermediate.txt

rm -rf $loop.tmp

elif iconv -f UTF-8 -t GB2312 $loop.tmp 》/dev/null 2》&118. then

echo ‘UTF-8’< Br>

iconv -f UTF-8 -t GB2312 -o $loop $loop.tmp21. rm -rf $loop.tmp

else

echo ‘ANSI’

mv -f $loop.tmp $loop

rm -rf $file_check_utf8

#simulate unix2dos, requiring the last line of the text file to have a newline 28. sed -n -e ‘s/$/\ /g’ -e ‘w ’$loop.tmp $loop29. mv -f $loop.tmp $loop

done

Explanation

1. To deal with the UTF8 BOM, I have not found a good way, and finally judged with sed+grep, if the first three bytes are \\\\357\\\\273\\\\277, then The file must be UTF8, use sed to remove the three bytes and convert

2. To avoid duplication or omission, use iconv in the script. The file without the BOM tries to convert one. The conversion success indicates that the file is UTF8, otherwise the description is ANSI or GB2312

3. Regarding the last sed command, it is because there is no unix2dos command on my system, so The simulation was carried out in order to facilitate viewing and editing under Windows.

The above is the introduction of UTF8 encoding batch conversion to GB2312 encoding under Linux. After conversion, it can solve garbled problems, etc. Mirror batch conversion, have you learned?