Charset - French accents in particular

I uploaded my website from 000webhost to the actual domain.

On 000webhost the text with accents (particularly in French) was completely as it should be, but now on the actual domain this isn’t correct: it shows question marks - or empty squares in IE.

The mysql-database is identical as on 000webhost: it shows the text with the correct accents, and the tables still have utf8_unicode_ci as collation.

In the meta-tags I still mention:
DOCTYPE html
meta http-equiv=“Content-Type” content=“text/html; charset=utf-8”

Hi as you are already using HTML5 standards.
Put this piece of code at top of each page. This code works fine for me. You can give it a try.

<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
2 Likes

Downgrading the doctype is unlikely to be the solution, @Ayu. XHTML really is dead, any way you cut it.

I would say the question author would be better off trying to create a new page containing UTF-8 characters not from the database, to ensure that works, and then to retrieve material from the database, again in a blank and unstyled page. It is possible their connection code is changing the default character set.

2 Likes

If the code doesn’t work then follow this tutorial and it will work fine

I have a template for other languages: in there I use for example p&egrave;re and it shows père correctly. So only the text from the db gives problems.

I just noticed if I add utf8_encode(string), the accents are shown correct too.
But would this be the best practice to add this function to all necessary strings?

That sounds like a file encoding issue. Make sure:

  • Your editor on your dev machine is saving in UTF-8
  • Your FTP client is sending files in UTF-8. I found that FileZilla had to be set to “force” UTF-8 mode in preferences, since the 000webhost FTP server does not correctly declare the charsets it will accept.

Following your instant message, it is ideal to keep the assistance public:

  • “Your editor” means a program that you use to edit PHP scripts
  • “dev machine” is short for “development machine”, i.e. your PC or laptop that you use for development

You said that you created the site on 000webhost. You’ve not said how, but I assume you mean a dynamic web builder system of some kind. If you downloaded the HTML files using FTP (to your computer) and then uploaded to the server (from your computer) then I suspect this is where the problem lies - the charset has become corrupted in transit.

That leads me to the suggestion that you need to change your charset in your FTP program. Let us know what FTP program you are using, and someone will help from there. Are you using FileZilla?

I used an old version of leechftp (on the pc of a friend), didn’t think it would matter.
Should I download the php-files again with Filezilla? And upload them again?

Yep, delete your files from the webspace, and upload them again. Make sure you back them up first, of course.

LeechFTP might be OK, but if you are in doubt, Filezilla will work - I used it the other day. Don’t forget to switch the charset to UTF-8 in Filezilla preferences. I won’t explain how to do this step-by-step, since the instructions will be on the web, and are merely a search-engine search away. Good luck!

What I did:

  1. download every php-file from 000webhost to my computer with filezilla (with ‘force UTF-8’ enabled in the charset-thread)
  2. delete every php-file from the webspace it has to be on
  3. upload every downloaded php-file to the webspace with filezilla (with ‘force UTF-8’ enabled in the charset-thread)

Still the same result as before… (I didn’t change a single thing to them)

Don’t know if it matters but I noticed the file size was slightly different every time: index.php for example
on webspace before deleting: 10593
downloaded from 000webhost: 10804
now on webspace: 10330

If I change the collation in the mysql-db, it still doesn’t show properly: not in a table with utf8_unicode_ci, or a table with latin1_general_ci.
Server connection collation is utf8_unicode_ci

@anon9480751

It’s weird. But, Can I know what did you do after downloading those files? Did you open those files from a text editor?

and I’m sure, uploading isn’t the case.

Do you generate those HTML content with MYSQL queries or not?

Finally, Can I know your website address too?

Good spot. So that means the files are being corrupted between one computer and another. It also means your database is not at fault, so don’t bother changing anything there.

It could be changing line-endings (which should not cause this problem) and charset (that’s the likely cause of it).

It is possible the originals are not in UTF-8, so they are always not going to show characters correctly.

Next, I would suggest using your 000webhost File Manager to create a new HTML file in your web space, put some UTF-8 characters in there, and view the file on-screen using your browser. The editor in 000webhost uses UTF-8 automatically.

If that works, then try copying the file using FileZilla, first to your computer, and then to a new location in your web space. Does it still work? If that does work, then it suggests the files you have already moved are at fault.

@Supun

After downloading those files from 000webhost I did not even open them: download, upload, done.

The content that is giving trouble, I get from mysql-db with SELECT mysql-queries in my php-files indeed.

Site is:
http://testgeortesting.000webhostapp.com/

And so you can find the address of the actual webspace on which the characters are not showing properly.
(I rather don’t mention the actual URL here because I don’t want it to appear in the search engine results afterwards.)

To be absolutely clear, the way the php-files were created in the first place was mostly:

  • create the basic webdesign with Visual Studio Code
  • copy the text to a newly created file on the standard File Manager of 000webhost
  • adjust some things like mysql-queries
  • other pages were after that mostly created by:
    .or copying a file (and rename) in the standard 000webhost file manager
    .or copy/paste the text of another file into a newly created file (again in the standard 000webhost file manager)

@halfer

So I understand correctly: I create a random html/php-file in the standard 000webhost file manager (with utf8 not coming from the database) and check it on http://testgeortesting.000webhostapp.com/ ?
(Or do you mean I can somehow use the standard 000webhost file manager to create files directly on my actual webspace (so not 000webhost)?)

seems mysql is the error.

There are two things you can do with this.
1…

  • Change collation of your table to utf8mb4

2…

If above didn’t work, try this.

  • after connecting to mysql with php, set charset of the connection to utf8mb4

OOP: $mysqli -> set_charset('utf8mb4')
PROCEDURAL: mysqli_set_charset($mysqli, 'utf8mb4')

Note: $mysqli is the connection variable

Above code must be implemented after connecting to mysql, before doing the query.

1 Like

Any utf8mb4? utf8mb4_unicode_ci didn’t work for example.

mysqli_set_charset worked!
I tried it with setting it to utf8mb4 and to utf8, both work. Is there a reason why I should use the one and not the other?

1 Like

The MYSQL charset of the MySQL connection is defined by libmysql. As I know it’s latin1 by default. latin1 does only support Latin letters, numbers and little more. To use other characters, you should set connection charset to appropriate one.

Once, I had a problem with storing and retrieving emoji codes in the database. I found the utf8mb4 solution. However, you only need utf8mb4 if you are storing emojis. I recommend you to use utf8 if you only use characters in your language. (because utf8mb4 is hard weight.)

If you are thinking of a smarter solution to make every connection with utf8 you can extend the mysqli class to do that.

class myDB extends mysqli {
    public function __construct($host = NULL, $username = NULL, $dbname = NULL, $port = NULL, $socket = NULL) {
        parent::__construct($host, $username, $dbname, $port, $socket);
        $this->set_charset("utf8");
  } 
}

Then in your script,

$mysqli = new myDB(HOST,USER, PASSWORD, DB);

So, you don’t need to use set_charset("utf8"); everytime :smile:

2 Likes

Great! Found a solution, and a bit of an explanation, always nice! Thanks for the effort