MultibytetoWidechar and AtoW appear to return incorrectly

Discussion:

(too old to reply)

Peter Fay

2005-12-13 08:58:19 UTC

MultibyteToWideChar appears to give incorrect results under WinXP for
cyryllic codepage (866).
This is the same whether I use Michael Kaplan's AToW function or pass byte
Arrays directly to MultibyteToWideChar.

The snippet below uses unadulterated codepage.bas by Michael Kaplan.

Any pointers to where I am going astray?

Peter J Fay

##########################################################
Dim T As String
Dim i As Integer
Dim msg As String

'Cyryllic code page = 866
'Upper case Cyryllic Be in this character set =193
'lower case Cyryllic Be in this character set (215)

For i = 192 To 255
T = AToW(Chr(i), 866)
Debug.Print i & vbTab & Hex(AscW(T))
Next

'This returns 2534 (hexadecimal) for Upper case Cyryllic Be in this
character set (193)
'The correct unicode for this character is 441(hexadecimal)
'This returns 441 (hexadecimal) for lower case Cyryllic Be in this character
set (215)
'The correct unicode for this character is 431(hexadecimal)

Michael (michka) Kaplan [MS]

2005-12-13 12:52:18 UTC

Permalink

Actually, if you look st:

http://www.microsoft.com/globaldev/reference/oem/866.mspx

the right code points are the graphic characters the function returns, not
the ones you are suggesting. What is the source you were using that assumed
they would be Cyrillic script lettersZ?
--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap

This posting is provided "AS IS" with
no warranties, and confers no rights.

Post by Peter Fay
MultibyteToWideChar appears to give incorrect results under WinXP for
cyryllic codepage (866).
This is the same whether I use Michael Kaplan's AToW function or pass byte
Arrays directly to MultibyteToWideChar.
The snippet below uses unadulterated codepage.bas by Michael Kaplan.
Any pointers to where I am going astray?
Peter J Fay
##########################################################
Dim T As String
Dim i As Integer
Dim msg As String
'Cyryllic code page = 866
'Upper case Cyryllic Be in this character set =193
'lower case Cyryllic Be in this character set (215)
For i = 192 To 255
T = AToW(Chr(i), 866)
Debug.Print i & vbTab & Hex(AscW(T))
Next
'This returns 2534 (hexadecimal) for Upper case Cyryllic Be in this
character set (193)
'The correct unicode for this character is 441(hexadecimal)
'This returns 441 (hexadecimal) for lower case Cyryllic Be in this
character set (215)
'The correct unicode for this character is 431(hexadecimal)

Peter Fay

2005-12-13 13:53:45 UTC

Permalink

Thanks Michael,

I am now aware I have been confusing codepages and character sets.
(...back to your book!)
When character set is Cyrillic and you key in Alt+0193 you will get a
capital BE [or if you run chr() on the character concerned].

Peter

Post by Michael (michka) Kaplan [MS]
http://www.microsoft.com/globaldev/reference/oem/866.mspx
the right code points are the graphic characters the function returns, not
the ones you are suggesting. What is the source you were using that
assumed they would be Cyrillic script lettersZ?
--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap
This posting is provided "AS IS" with
no warranties, and confers no rights.

Post by Peter Fay
MultibyteToWideChar appears to give incorrect results under WinXP for
cyryllic codepage (866).
This is the same whether I use Michael Kaplan's AToW function or pass
byte Arrays directly to MultibyteToWideChar.
The snippet below uses unadulterated codepage.bas by Michael Kaplan.
Any pointers to where I am going astray?
Peter J Fay
##########################################################
Dim T As String
Dim i As Integer
Dim msg As String
'Cyryllic code page = 866
'Upper case Cyryllic Be in this character set =193
'lower case Cyryllic Be in this character set (215)
For i = 192 To 255
T = AToW(Chr(i), 866)
Debug.Print i & vbTab & Hex(AscW(T))
Next
'This returns 2534 (hexadecimal) for Upper case Cyryllic Be in this
character set (193)
'The correct unicode for this character is 441(hexadecimal)
'This returns 441 (hexadecimal) for lower case Cyryllic Be in this
character set (215)
'The correct unicode for this character is 431(hexadecimal)

Peter Fay

2005-12-13 17:56:16 UTC

Permalink

Michael,

I am trying to do 3 tasks.

1. Given a character displayed in a VB Flex grid (with charset set
appropriately), I want to "read" this character and display both its unicode
value and its Asc() value.
I can read its Asc value no problem but passing singlebyte characters to
MultibyteToWideChar leads to problems.
This appears to be because Asc numbering within a charset is not identical
to codepoints numbering for values 128 to 255.
eg Russian character upper case BE (like a small english b with a horizontal
bar on top) is Chr(193) in cyrillic charset, but code point &H81 = 129
However, MultibyteToWideChar does work well in this way for multibyte
characters and for first 127 characters in single byte set.
Is there any way of simply converting from charset numbering to codepage
numbering?

2. I want to read from a unicode text file, convert to ANSI using a given
charset and write to a VB text box again with charset property set.
Reading correct bytes from file to byte array is no problem.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
However converting and writing cyrillic text results in inappropriate
characters.

3. I want to read from unicode text on clipboard, convert to ANSI using a
given charset and write to a VB text box again with charset property set.
Reading correct bytes from clipboard to byte array is no problem using API
functions.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
Again trying to convert and write cyrillic text results in inappropriate
characters.

I suspect all these difficulties are due to differences between codepage
numbering and charset numbering.
Having re-read the relevant chapters in your book I am no wiser.

Am I on the right track?
If so, how do I convert from Charset to Codepage and vice versa.

Any pointers in right direction would be welcome

Regards

Peter

Post by Peter Fay
Thanks Michael,
I am now aware I have been confusing codepages and character sets.
(...back to your book!)
When character set is Cyrillic and you key in Alt+0193 you will get a
capital BE [or if you run chr() on the character concerned].
Peter

Post by Michael (michka) Kaplan [MS]
http://www.microsoft.com/globaldev/reference/oem/866.mspx
the right code points are the graphic characters the function returns,
not the ones you are suggesting. What is the source you were using that
assumed they would be Cyrillic script lettersZ?
--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap
This posting is provided "AS IS" with
no warranties, and confers no rights.

Post by Peter Fay
MultibyteToWideChar appears to give incorrect results under WinXP for
cyryllic codepage (866).
This is the same whether I use Michael Kaplan's AToW function or pass
byte Arrays directly to MultibyteToWideChar.
The snippet below uses unadulterated codepage.bas by Michael Kaplan.
Any pointers to where I am going astray?
Peter J Fay
##########################################################
Dim T As String
Dim i As Integer
Dim msg As String
'Cyryllic code page = 866
'Upper case Cyryllic Be in this character set =193
'lower case Cyryllic Be in this character set (215)
For i = 192 To 255
T = AToW(Chr(i), 866)
Debug.Print i & vbTab & Hex(AscW(T))
Next
'This returns 2534 (hexadecimal) for Upper case Cyryllic Be in this
character set (193)
'The correct unicode for this character is 441(hexadecimal)
'This returns 441 (hexadecimal) for lower case Cyryllic Be in this
character set (215)
'The correct unicode for this character is 431(hexadecimal)

Peter Fay

2005-12-13 19:07:55 UTC

Permalink

Think I have the solution:
Setting charset property to RUSSIAN_CHARSET = 204 actually results in
codepage 1251 being used (not 866)

Regards

Peter

Post by Peter Fay
Michael,
I am trying to do 3 tasks.
1. Given a character displayed in a VB Flex grid (with charset set
appropriately), I want to "read" this character and display both its
unicode value and its Asc() value.
I can read its Asc value no problem but passing singlebyte characters to
MultibyteToWideChar leads to problems.
This appears to be because Asc numbering within a charset is not identical
to codepoints numbering for values 128 to 255.
eg Russian character upper case BE (like a small english b with a
horizontal bar on top) is Chr(193) in cyrillic charset, but code point
&H81 = 129
However, MultibyteToWideChar does work well in this way for multibyte
characters and for first 127 characters in single byte set.
Is there any way of simply converting from charset numbering to codepage
numbering?
2. I want to read from a unicode text file, convert to ANSI using a given
charset and write to a VB text box again with charset property set.
Reading correct bytes from file to byte array is no problem.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
However converting and writing cyrillic text results in inappropriate
characters.
3. I want to read from unicode text on clipboard, convert to ANSI using a
given charset and write to a VB text box again with charset property set.
Reading correct bytes from clipboard to byte array is no problem using API
functions.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
Again trying to convert and write cyrillic text results in inappropriate
characters.
I suspect all these difficulties are due to differences between codepage
numbering and charset numbering.
Having re-read the relevant chapters in your book I am no wiser.
Am I on the right track?
If so, how do I convert from Charset to Codepage and vice versa.
Any pointers in right direction would be welcome
Regards
Peter

Post by Michael (michka) Kaplan [MS]
http://www.microsoft.com/globaldev/reference/oem/866.mspx
the right code points are the graphic characters the function returns,
not the ones you are suggesting. What is the source you were using that
assumed they would be Cyrillic script lettersZ?
--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap
This posting is provided "AS IS" with
no warranties, and confers no rights.

Post by Peter Fay
MultibyteToWideChar appears to give incorrect results under WinXP for
cyryllic codepage (866).
This is the same whether I use Michael Kaplan's AToW function or pass
byte Arrays directly to MultibyteToWideChar.
The snippet below uses unadulterated codepage.bas by Michael Kaplan.
Any pointers to where I am going astray?
Peter J Fay
##########################################################
Dim T As String
Dim i As Integer
Dim msg As String
'Cyryllic code page = 866
'Upper case Cyryllic Be in this character set =193
'lower case Cyryllic Be in this character set (215)
For i = 192 To 255
T = AToW(Chr(i), 866)
Debug.Print i & vbTab & Hex(AscW(T))
Next
'This returns 2534 (hexadecimal) for Upper case Cyryllic Be in this
character set (193)
'The correct unicode for this character is 441(hexadecimal)
'This returns 441 (hexadecimal) for lower case Cyryllic Be in this
character set (215)
'The correct unicode for this character is 431(hexadecimal)

Michael (michka) Kaplan [MS]

2005-12-13 21:53:01 UTC

Permalink

Inline...

Post by Peter Fay
Setting charset property to RUSSIAN_CHARSET = 204 actually results in
codepage 1251 being used (not 866)
Regards
Peter

Post by Peter Fay
Michael,
I am trying to do 3 tasks.
1. Given a character displayed in a VB Flex grid (with charset set
appropriately), I want to "read" this character and display both its
unicode value and its Asc() value.
I can read its Asc value no problem but passing singlebyte characters to
MultibyteToWideChar leads to problems.
This appears to be because Asc numbering within a charset is not
identical to codepoints numbering for values 128 to 255.
eg Russian character upper case BE (like a small english b with a
horizontal bar on top) is Chr(193) in cyrillic charset, but code point
&H81 = 129
However, MultibyteToWideChar does work well in this way for multibyte
characters and for first 127 characters in single byte set.
Is there any way of simply converting from charset numbering to codepage
numbering?
2. I want to read from a unicode text file, convert to ANSI using a given
charset and write to a VB text box again with charset property set.
Reading correct bytes from file to byte array is no problem.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
However converting and writing cyrillic text results in inappropriate
characters.
3. I want to read from unicode text on clipboard, convert to ANSI using a
given charset and write to a VB text box again with charset property set.
Reading correct bytes from clipboard to byte array is no problem using
API functions.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
Again trying to convert and write cyrillic text results in inappropriate
characters.
I suspect all these difficulties are due to differences between codepage
numbering and charset numbering.
Having re-read the relevant chapters in your book I am no wiser.
Am I on the right track?
If so, how do I convert from Charset to Codepage and vice versa.
Any pointers in right direction would be welcome
Regards
Peter

Post by Michael (michka) Kaplan [MS]
http://www.microsoft.com/globaldev/reference/oem/866.mspx
the right code points are the graphic characters the function returns,
not the ones you are suggesting. What is the source you were using that
assumed they would be Cyrillic script lettersZ?
--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap
This posting is provided "AS IS" with
no warranties, and confers no rights.

Post by Peter Fay
MultibyteToWideChar appears to give incorrect results under WinXP for
cyryllic codepage (866).
This is the same whether I use Michael Kaplan's AToW function or pass
byte Arrays directly to MultibyteToWideChar.
The snippet below uses unadulterated codepage.bas by Michael Kaplan.
Any pointers to where I am going astray?
Peter J Fay
##########################################################
Dim T As String
Dim i As Integer
Dim msg As String
'Cyryllic code page = 866
'Upper case Cyryllic Be in this character set =193
'lower case Cyryllic Be in this character set (215)
For i = 192 To 255
T = AToW(Chr(i), 866)
Debug.Print i & vbTab & Hex(AscW(T))
Next
'This returns 2534 (hexadecimal) for Upper case Cyryllic Be in this
character set (193)
'The correct unicode for this character is 441(hexadecimal)
'This returns 441 (hexadecimal) for lower case Cyryllic Be in this
character set (215)
'The correct unicode for this character is 431(hexadecimal)

Michael (michka) Kaplan [MS]

2005-12-13 21:57:53 UTC

Permalink

Inline....
--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap

This posting is provided "AS IS" with
no warranties, and confers no rights.

Post by Peter Fay
Setting charset property to RUSSIAN_CHARSET = 204 actually results in
codepage 1251 being used (not 866)

Yes, the Windows code page is used, not the OEM one.

Post by Peter Fay

Post by Peter Fay
1. Given a character displayed in a VB Flex grid (with charset set
appropriately), I want to "read" this character and display both its
unicode value and its Asc() value.
I can read its Asc value no problem but passing singlebyte characters to
MultibyteToWideChar leads to problems.
This appears to be because Asc numbering within a charset is not
identical to codepoints numbering for values 128 to 255.
eg Russian character upper case BE (like a small english b with a
horizontal bar on top) is Chr(193) in cyrillic charset, but code point
&H81 = 129
However, MultibyteToWideChar does work well in this way for multibyte
characters and for first 127 characters in single byte set.
Is there any way of simply converting from charset numbering to codepage
numbering?

Remember that the strings in VB are already Unicode -- so why convert them?

Post by Peter Fay

Post by Peter Fay
2. I want to read from a unicode text file, convert to ANSI using a given
charset and write to a VB text box again with charset property set.
Reading correct bytes from file to byte array is no problem.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
However converting and writing cyrillic text results in inappropriate
characters.

No idea without more info -- probably the wrong code page?

Post by Peter Fay

Post by Peter Fay
3. I want to read from unicode text on clipboard, convert to ANSI using a
given charset and write to a VB text box again with charset property set.
Reading correct bytes from clipboard to byte array is no problem using
API functions.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
Again trying to convert and write cyrillic text results in inappropriate
characters.

You could also leave it Unicode. :-)

If you have to convert then you must use the right code page.

Post by Peter Fay

Post by Peter Fay
I suspect all these difficulties are due to differences between codepage
numbering and charset numbering.
Having re-read the relevant chapters in your book I am no wiser.

Actually, it is that you are not using the right code page.

Post by Peter Fay

Post by Peter Fay
Am I on the right track?

Almost....

Post by Peter Fay

Post by Peter Fay
If so, how do I convert from Charset to Codepage and vice versa.

There are functions in the book that will do this conversion, as well as
help you get the info for the locale.

Peter Fay

2005-12-13 22:12:48 UTC

Permalink

Michael,

Initial error was using LOCALE_DEFAULTCODEPAGE constant rather than
LOCALE_DEFAULTANSICODEPAGE on calling getLocaleInfo.
Now I've realised this everything is falling into place. :-)

Thankyou

Post by Michael (michka) Kaplan [MS]
Inline....
--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap
This posting is provided "AS IS" with
no warranties, and confers no rights.

Post by Peter Fay
Setting charset property to RUSSIAN_CHARSET = 204 actually results in
codepage 1251 being used (not 866)

Yes, the Windows code page is used, not the OEM one.

Post by Peter Fay

Remember that the strings in VB are already Unicode -- so why convert them?

Post by Peter Fay

Post by Peter Fay
2. I want to read from a unicode text file, convert to ANSI using a
given charset and write to a VB text box again with charset property set.
Reading correct bytes from file to byte array is no problem.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
However converting and writing cyrillic text results in inappropriate
characters.

No idea without more info -- probably the wrong code page?

Post by Peter Fay

Post by Peter Fay
3. I want to read from unicode text on clipboard, convert to ANSI using
a given charset and write to a VB text box again with charset property
set.
Reading correct bytes from clipboard to byte array is no problem using
API functions.
Converting and writing using WideCharToMultibyte is no problem for Japanese.
Again trying to convert and write cyrillic text results in inappropriate
characters.

You could also leave it Unicode. :-)
If you have to convert then you must use the right code page.

Post by Peter Fay

Post by Peter Fay
I suspect all these difficulties are due to differences between codepage
numbering and charset numbering.
Having re-read the relevant chapters in your book I am no wiser.

Actually, it is that you are not using the right code page.

Post by Peter Fay

Post by Peter Fay
Am I on the right track?

Almost....

Post by Peter Fay

Post by Peter Fay
If so, how do I convert from Charset to Codepage and vice versa.

There are functions in the book that will do this conversion, as well as
help you get the info for the locale.

v***@gmail.com

2013-07-05 16:11:20 UTC

Permalink