- Mac Text Editor For Python
- Mac Text Editor For Coding
- Mac Text Editor For Encoding Chinese Characters
- Mac Text Editor For Html
- Download Text Editor For Mac
With your subtitle file open in the correct character encoding, now go to the menu File → Save as. And change the character encoding option (again, at the bottom of the window) to UTF-8 and save the file (possibly with a new name, for safety). It's Windows Latin 1. I pasted the Chinese text as UTF-8 into BBEDIT (a text editor for Mac) and re-opened the file as Windows Latin 1 and bang, the exact diacritics appeared.
I've tried googling around but wasn't able to find what charset that this text below belongs to:
具有éœé›»ç”¢ç”Ÿè£ç½®ä¹‹å½±åƒè¼¸å…¥è£ç½®
But putting <meta http-equiv='Content-Type'>
and keeping that string into an HTML file, I was able to view the Chinese characters properly:
具有靜電產生裝置之影像輸入裝置
So my question is:
What tools can I use to detect the character set of this text?
And how do I convert/encode/decode them properly in C#?
Updates:For completion sake, i've updated this test.
Thanks.
5 Answers
What is happening when you save the 'bad' string in a text file with a meta tag declaring the correct encoding is that your text editor is saving the file with Windows-1252 encoding, but the browser is reading the file and interpreting it as UTF-8. Since the 'bad' string is incorrectly decoded UTF-8 bytes with the Windows-1252 encoding, you are reversing the process by encoding the file as Windows-1252 and decoding as UTF-8.
Here's an example:
Even with correct decoding, you'll still need a font that supports the characters being displayed. If your default font doesn't support Chinese, you still might not see the correct characters.
The correct thing to do is figure out why the string you have was decoded as Windows-1252 in the first place. Sometimes, though, data in a database is stored incorrectly to begin with and you have to resort to these games to fix the problem.
Mark TolonenMac Text Editor For Python
Mark TolonenI'm not really sure what you mean, but I'm guessing you want to convert between a string in a certain encoding in byte array form and a string. Let's assume the character encoding is called 'FooBar':
This is how you encode and decode:
You can learn more about the Encoding class over at MSDN.
lesderidlesderidMac Text Editor For Coding
Answering your question at the end of your post:
If you want to determine the text encoding on runtime you should look at that: http://code.google.com/p/ude/
for converting character sets you can use http://msdn.microsoft.com/en-us/library/system.text.encoding.convert(v=vs.100).aspx
It's Windows Latin 1. I pasted the Chinese text as UTF-8 into BBEDIT (a text editor for Mac) and re-opened the file as Windows Latin 1 and bang, the exact diacritics appeared.
dda