Mojibake is often seen with text data that have been tagged with a wrong encoding it may not even be tagged at all, but moved between computers with different default encodings. As mojibake is the instance of non-compliance between these, it can be achieved by manipulating the data itself, or just relabeling it. To correctly reproduce the original text that was encoded, the correspondence between the encoded data and the notion of its encoding must be preserved.
The word is composed of 文字 (moji, IPA: ), "character" and 化け (bake, IPA: ), "transform". Mojibake means "character transformation" in Japanese.
- 4.3.3 Russian and other Cyrillic alphabets.
- 4 Problems in different writing systems.
-
2.4 Lack of hardware or software support.Importantly, these replacements are valid and are the result of correct error handling by the software. Symptoms of this failed rendering include blocks with the code point displayed in hexadecimal or using the generic replacement character. This is either because of differing constant length encoding (as in Asian 16-bit encodings vs European 8-bit encodings), or the use of variable length encodings (notably UTF-8 and UTF-16).įailed rendering of glyphs due to either missing fonts or missing glyphs in a font is a different issue that is not to be confused with mojibake. A replacement can also involve multiple consecutive symbols, as viewed in one encoding, when the same binary code constitutes one symbol in the other encoding. This display may include the generic replacement character ("�") in places where the binary representation is considered invalid. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system. Mojibake ( Japanese: 文字化け IPA: ) is the garbled text that is the result of text being decoded using an unintended character encoding. Without proper rendering support, you may see question marks, boxes, or other symbols. This article contains special characters.