You are here:  » Character encoding issues


Character encoding issues

Submitted by marco@flapper on Wed, 2012-02-01 20:20 in

Hi,
I had some character encoding issues (see http://www.pricetapestry.org/node/82). We found a workaround by ommiting utf8 from define('DB_CHARSET','utf8) but this give issues (error messages/problems creating tables) when I update plugins.

I checked in my database and saw that all the tables are type MyISAM with the collation utf8_general_ci but the Pricetapestry tables have the latin1_swedish_ci collation.

Can this explain the character encoding issues and what to do next?

Submitted by support on Thu, 2012-02-02 08:52

Hi Marco,

It may be worth trying against the products table... The following dbmod.php will change the collation of the products table (run from main Price Tapestry installation folder). After it completes, re-import all feeds in order to apply the new collation and check the display within WordPress...

dbmod.php

<?php
  
require("includes/common.php");
  
$sql "ALTER TABLE `".$config_databaseTablePrefix."products`
            CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci"
;
  
database_queryModify($sql,$result);
  print 
"Done.";
?>

Cheers,
David.
--
PriceTapestry.com

Submitted by marco@flapper on Thu, 2012-02-02 09:35

A pitty. It isn't working.

I ran the dbmod.php. Checked if the table collation was changed. And then imported two feeds which I know had problems.

The strange characters are still there.

I also filled in the define('DB_COLLATE', ''); with utf8_general_ci but this didn't work too.

The minute I 'emptied out' the ('DB_COLLATE', '') and 'DB_CHARSET', ''); the strange charactes dissappeared. I didn't need to reimport.

I also noticed the following. As I started the blog with an empty 'DB_CHARSET', ''); in wp-config.php the posts with special characters are misformed if I change the wp-config.php to 'DB_CHARSET', 'utf8');

Hope you find something.

Submitted by support on Thu, 2012-02-02 10:07

Thanks for the feedback, Marco - I'll try to recreate on my test server and get back to you.

Cheers,
David.
--
PriceTapestry.com

Submitted by marco@flapper on Thu, 2012-02-02 12:35

Hi,
I was checking it a bit further.

I took one example feed and saw that the source xml feed has encoding="UTF-8".
The caracters in the feed are ok. But after import I see that the special characters don't show in the database but get misformed.

Maybe this is something?
http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

"Applications that use the database should also configure their connection to the server each time they connect. This can be done by executing a SET NAMES 'utf8' statement after connecting. In the absence of other information, the programs use the compiled-in default character set, usually latin1."

So I guess it is now imported in latin1. If I configure wp-config to utf8 it displays strange because wordpress uses utf8 but if I leave it empty it uses latin1 in absence of other informations, so it displays 'properly'.

Can this be the cause?

Submitted by support on Thu, 2012-02-02 12:41

Hi Marco,

SET NAMES can be added easily; in includes/database.php look for the following code at line 47:

$link = @mysql_connect($config_databaseServer,$config_databaseUsername,$config_databasePassword);

...and REPLACE with:

$link = @mysql_connect($config_databaseServer,$config_databaseUsername,$config_databasePassword);
mysql_query("SET NAMES 'utf8'",$link);

(and then re-import of course)

Similar can be added to the read side of database.php but as you're viewing in WordPress adding this to the modify side will be enough to test...

Cheers,
David.
--
PriceTapestry.com

Submitted by marco@flapper on Thu, 2012-02-02 13:26

ok, I'm more lost now.

I tested it.

I imported it using the modified database.php and define('DB_CHARSET', 'utf8');
-> Special characters in database misformed and displays misformed on webpage

Now I also changed in wp-config.php the define('DB_COLLATE', 'utf8_general_ci'); and reimported
-> Special characters in database misformed and displays misformed on webpage

Now I removed the settings in wp-config.php define('DB_CHARSET', ''); and define('DB_COLLATE', ''); and checked without importing it again.
-> Special characters in database misformed but displays allright on webpage

Now I imported it again with the settings in wp-config.php define('DB_CHARSET', ''); and define('DB_COLLATE', ''); and checked
-> Special characters in database misformed but displays allright on webpage

Submitted by support on Thu, 2012-02-02 13:31

Hi Marco,

When viewing in database; are you using something like phpMyAdmin? Where they appear misformed (but when working on webpage) could you look at the character set being used by the browser (View > Character Encoding) and seeing if that is utf-8 or something different?

Cheers,
David.
--
PriceTapestry.com

Submitted by marco@flapper on Thu, 2012-02-02 13:56

I'm viewing it in phpMyAdmin.

It is utf-8.

I experimented a bit by copying the site over. Here the database tabel pt_products is still latin1_swedish_ci and import is done without the modified database.php and wp-config.php has define('DB_CHARSET', 'utf8');. Strangely enough there is a difference between the display of special characters on the wordpress page and the pricetapestry page.

Submitted by support on Thu, 2012-02-02 14:02

Thanks Marco; also received your email.

I'm going to set this up on my test server now.

Cheers,
David.
--
PriceTapestry.com

Submitted by marco@flapper on Thu, 2012-02-02 16:33

Hi,
The modified includes/database.php which uses mysql_set_charset() works in showing the special characters.

Are you including this in the next PT version or is this a specific mod I should remember?

Thanks a lot.

Submitted by support on Thu, 2012-02-02 16:47

Hi Marco,

Thanks for the update - glad you're up and running. I'm going to work on this further in order to completely understand the technicalities and will of course update where necessary as required.

Cheers,
David.
--
PriceTapestry.com

Submitted by marco@flapper on Mon, 2012-02-06 08:46

Hi,
I want to use this elsewhere on an existing PT installation. Should I use it with the dbmod.php that hanges the collation of the products table to utf8_general_ci or isn't that necessary?

Submitted by support on Mon, 2012-02-06 09:12

Hi Marco,

Whilst latin_ is generally utf8_ compatible (the ultimate fix above was setting the MySQL connection character set) it shouldn't be necessary but wouldn't do any harm if you did want to run it to update the table...

Cheers,
David.
--
PriceTapestry.com