Thursday, December 6, 2007

Importing Unicode Data into MySQL

Phew! I have finally solved a big unicode problem for importing chinese characters into MySQL. The problem is like this: My client has a huge Access database containing English and Chinese data which I need to migrate into MySQL.

In the past, I used MySQL Migration Toolkit to import from Access into MySQL. However, this time, it doesn't work because the toolkit turns all the Chinese characters into ???. Yup, exactly that! Question marks.

So I tried to export the Access table into CSV. I opened it with Wordpad and it looks great. So after playing around with the CSV to turn it into SQL statements, I am ready to use command line import style to import the data in. The idea is to go to MySQL command prompt and type:
mysql -u root -p database_name < data.csv

This ought to do the job... but it doesn't. It has some very very funny MySQL Syntax error which is like some funny characters. I find it extremely funny so I opened up the file using MySQL Query Analyser instead. What I discovered is that the data is suddenly shortened significantly to just 3 funny characters. My best guess is that wordpad actually compresses the data when you saved it.

Never mind! Chill! I rework the data in MySQL Query Analyser and save it properly. It is neat to know that MySQL Query Analyser will save your sql file in proper UTF encoding. With the proper UTF file, I tried the command line import again.

There shouldn't be any more problem... but there is. Now MySQL told me "Error 2006: SQL Server has gone away". What? Gone away? For a smoking break? OMG! After some searching on the net, I realize that the problem could be with the MySQL server setting to automatically close a connection when there is a large transfer. So I fired up MySQL Administrator GUI, go to "Startup Variables", under "Advanced Networking" tab, I checked and set "Max Packet Size" and "Max Buffer Length" to 5M each.

Back to the command line import, YUP it works! FINALLY! I am writing this out to throw it in the net hoping that it will just help some poor soul who is searching for such a solution.

4 comments:

Shi Heng Cheong said...

So you are blogging actively on Joomla stuff now. Well done!

Anonymous said...

actually with a csv, you can use LOAD DATA INFILE, so you won't need to waste time converting it into sql statements. =)

not sure if this will solve the server gone away error though..

Steven Yap said...

Yup that should works too but probably won't solve the "Sever has gone away" problem...

Thanks for the comments!

Jack said...

Hey,
Nice information in this blog posting.