WordPress, Unicode, and ‘?’s

I previously had some problems when I mixed Unicode with WordPress. Every time I typed a Unicode character, (after posting) it would display as a ‘?’. This post will describe how to fix this.

Basically, the problem is that WordPress is not comprehending this, and instead of telling the database to store the Unicode characters, it just says, “Heck, just stick a bunch of question marks in there.”

Of course, this can be easily fixed in two steps. All you’ll need is FTP access to your server and a fair comprehension of how to type. So, let’s get started!

  1. Open up ‘wp-config.php’ from the root directory of your WordPress installation.
  2. Add ‘//’ at the very beginning of these two lines:
    define('DB_CHARSET', 'utf8');
    define('DB_COLLATE', '');

So that section should now look like this:
//define('DB_CHARSET', 'utf8');
//define('DB_COLLATE', '');

You’re already finished. How easy was that?

Important notes:

  1. The quotes surrounding // in step 2 should not be inserted. Those are just indicating that the // is the part you should insert.
  2. If you’ve meddled with that part of ‘wp-config.php’ before, it may look a bit different. But pay no attention to the differences. Just be sure add // to the lines containing DB_CHARSET and DB_COLLATE.

Did you find this article useful? Please leave a comment to let me know. Don’t worry, you don’t need to register for a simple comment.

27 Responses to “WordPress, Unicode, and ‘?’s”


  1. 1 geoffreyking October 13, 2007 at 2:36 pm

    Looks to me like you’re going a long way round here. My computer just does Unicode unless I tell it to do something else. A few programs still come with Latin 3 or even Latin 1 by default but you can easily change this is the ‘Tools’. Or am I missing something?

  2. 2 engel October 13, 2007 at 4:27 pm

    But by default WordPress won’t parse Unicode characters correctly. At least with Esperanto characters, that’s what I used to try it..

  3. 3 CowDir October 31, 2007 at 7:38 pm

    Pretty awesome article. Thanks! – CowDir

  4. 4 Nolawi December 25, 2007 at 9:26 pm

    thank you so so so much for this… i wasted an hour trying to fix till i found your post and then violla… FIXED

  5. 5 Chris January 24, 2008 at 10:59 am

    Are you kidding me?!? After hours of trying to figure out what in the blazes was going on, all it took was a few //s? Incredible. You are a genius in my mind. Thanks!

  6. 6 dinu May 14, 2008 at 4:20 am

    it worked without making this change… for malayalam…

    :)

  7. 7 fairbro June 28, 2008 at 12:00 am

    I was pulling my hair on this one – the source code for the web page says that WordPress IS doing Unicode – but it isn’t. Bill Gates to thank for why this is a problem only on some computers.

    I am so glad to fix this after only one day.

    Actually, you fixed it.

    Thanks!

    (If you think Bush is bad, wait’ll you see what’s next…)

  8. 8 vamana October 2, 2008 at 4:09 am

    Wonderful insight. This made such a huge difference to the effort I was putting to get this working.

    Thank you so much. You are making the blog a wonderful learning and sharing tool.

  9. 9 Vinayak Anivase October 6, 2008 at 9:36 am

    Thanks a lot,its so easy n working.
    Thanks once again.:)

    keep goin!!

  10. 10 David October 26, 2008 at 9:49 am

    Thanks. There must be a reason why unicode is not enabled by default. I was very puzzled initially because the upper characters would display properly while editing the post initially. Only later did I find that they were converted to ???s when I saved/published the post. That was the clue that led me to your post.

    I’ll be reading more of your site – thanks for documenting your insights!

    David

  11. 11 rithy November 4, 2008 at 8:47 pm

    i want to create blog with khmer language.
    can u help me, how to do?

  12. 13 web design December 21, 2008 at 6:54 pm

    Thanks! It was very helpful!

  13. 14 tyson April 27, 2009 at 10:35 am

    Just surfing the web and found your site,I am also involved with people search and background checks.Your site has been really helpful thanks.

  14. 15 ramag June 10, 2009 at 12:26 pm

    Thank you very very much dear, I was strugulling lot to fix this problem, how easily you described, thanks lot,

  15. 16 phaseill September 7, 2009 at 5:26 am

    Awesome, my site in in the maori language using macrons etc. Yours is a tip i’ll no doubt use time and time again :) Thanks! (Unless WP fixes it for us?!?!?)

  16. 17 kanishka September 27, 2009 at 9:26 am

    i tried making such change and reviwed twice to make sure, i didnt commit any mistake..
    its not working for me. i just started developing a website and you may see it ..the demo on http://blogprahari.in

    i wish to show hindi unicode characters.. and the same ? ? ?????
    signs appear.. please help me at earliest.

  17. 18 kanishka September 27, 2009 at 9:34 am

    ooh!
    it did .. but for the newer posts I made..
    it didnt work for the earlier posts..
    Thanks a lot..

  18. 19 vamshee October 1, 2009 at 11:12 am

    I had this same problem – couldn’t get my new blog to show Unicode characters. Then I figured out where the problem is. I just wanted to share it here for future reference.

    At the time of installing WP, in the config file the following settings need to be present
    define(‘DB_CHARSET’, ‘utf8′);
    define(‘DB_COLLATE’, ‘utf8_general_ci’);

    By default the collate setting is left as ”.
    That is the problem. In the MySQL database tables that wordpress creates, all the table fields will be left as default collation which is the ‘latin1_swedish_ci’ .

    This causes an inconsistency. You are writing UTF8 chars into a latin collation. so the data gets lost(turns to ‘???’)

    So if you are installing fresh, make sure you set both the settings.

    Now as suggested here, when you comment out the first DB_CHARSET setting, then you will be using a ASCII and latin collation combo which works. (because, you are no longer storing the data as Unicode – they will be stored as some funky characters – something like à°°à°¾à ±‡…)

    And, another point, no matter what you do, you will not be able to get your UTF8 data once its corrupted.(ie. when they turn to “???” and not these -”à°° ±‡” ).

    Hope this helps.

    -V


  1. 1 katagrapho » Punctuating Eph 2:14b-15a Trackback on January 24, 2008 at 11:03 am
  2. 2 árvíztűrő tükörfúrógép at Íráskényszer Trackback on April 18, 2008 at 5:40 am
  3. 3 Krunk4Ever! » Blog Archive » Upgraded to WordPress 2.5.1 Trackback on June 12, 2008 at 3:57 am
  4. 4 HD-Trailers.net Blog » Blog Archive » Upgraded to WordPress 2.5.1 Trackback on June 12, 2008 at 3:58 am
  5. 5 Using Unicode - Blog Test 2 Trackback on August 28, 2008 at 1:29 am
  6. 6 WordPress, Unicode, and ‘?’s « Obsessed with the Press Trackback on September 4, 2008 at 1:07 am
  7. 7 project-2501.net » Blog Archive » obsessed with anonymous functions Trackback on January 21, 2009 at 1:32 am
  8. 8 Puppet Kaos » Blog Archive » Upgraded to WordPress 2.7 Trackback on February 11, 2009 at 5:12 am

Leave a Reply