2021-07-18

24: 'std::codecvt<char16_t,char,struct _Mbstatet>' Unresolved Error

<The previous article in this series | The table of contents of this series | The next article in this series>

A Visual C++ since-2015 bug, which Microsoft had refused to fix (until 2019?), based on a unconvincing reason. How to end-run the problem.

Topics


About: C++
About: Visual C++

The table of contents of this article


Starting Context


  • The reader has a basic knowledge on C++.

Target Context


  • The reader will know that the 'std::codecvt<char16_t,char,struct _Mbstatet>' unresolved error is because of a since-2015 old bug, which Microsoft had refused to fix (until 2019?), and how to end-run the problem.

Orientation


There are some more articles on Visual C++ peculiarities (an "unable to match function definition to an existing declaration" error, exporting the symbols from a DLL, explicitly instantiating constructor templates).


Main Body

Stage Direction
Hypothesizer 7 soliloquies.


1: Encountering a 'std::codecvt<char16_t,char,struct _Mbstatet>' Unresolved Error


Hypothesizer 7
This code causes a link error, "error LNK2001: unresolved external symbol "__declspec(dllimport) public: static class std::locale::id std::codecvt<char16_t,char,struct _Mbstatet>::id"", for Visual C++ 2017, while it is good for GCC.

@C++ Source Code
#include <codecvt>
#include <locale>

				wstring_convert <codecvt_utf8_utf16 <char16_t>, char16_t> l_wstringConverter;


2: The Cause Is a Since-2015 Old Bug, which Microsoft Had Refused to Fix (Until 2019?)


Hypothesizer 7
It has turned out that it is an old bug of Visual C++, which exists since Visual C++ 2015 . . .

I say that it is too old; I understand that some bugs could sneak into a product, but any bug will have to be fixed as soon as possible.

I do not agree that the reason of preserving the binary compatibility cited in the page justifies not fixing the bug. While preserving the binary compatibility is ideal by itself, is keeping breaking the compatibility with the C++ standard template library fine? . . . If there is a bug that entails a security vulnerability, will they preserve the bug in order to preserve the binary compatibility with a buggy old version? . . .

A bad choice, I say.

As a note, it seems to have been fixed at some time (2019? I do not know exactly when), but I keep this article (originally written before it was fixed, although repolished and republished later) as a record, partly because someone may be still using an old version.


3: The Suggested Remedy, Which Does Not Solve the Problem


Hypothesizer 7
They seem to suggest to just use 'wchar_t' instead of 'char16_t' as a remedy.

Does that solve the problem? . . . No. I want a function that converts any UTF8 datum to a 'u16string' datum, like this.

@C++ Source Code
#include <codecvt>
#include <locale>

			u16string getUtf16String (string const & a_utf8String) {
				wstring_convert <codecvt_utf8_utf16 <char16_t>, char16_t> l_wstringConverter;
				return l_wstringConverter.from_bytes (a_utf8String.data ());
			}

While changing 'char16_t' to 'wchar_t' changes the return type of 'from_bytes' to 'wstring', I do not want any 'wstring' instance, but a 'u16string' instance . . .. Note that any 'wstring' instance cannot just be cast to the 'u16string' type.


4: An End Run


Hypothesizer 7
After all, I have to create a 'u16string' instance using the 'wstring' instance, like this.

@C++ Source Code
				wstring l_wstring = l_wstringConverter.from_bytes (a_utf8String.data ());
				return u16string (l_wstring.begin (),  l_wstring.end ());

As that code does not work for Linux (in which, 'wchar_t' is different in length from 'char16_t'), I have to have a code like this (supposing that building in Linux equals using GCC).

@C++ Source Code
			u16string getUtf16String (string const & a_utf8String) {
#ifdef GCC
				wstring_convert <codecvt_utf8_utf16 <char16_t>, char16_t> l_wstringConverter;
				return l_wstringConverter.from_bytes (a_utf8String.data ());
#else
				wstring_convert <codecvt_utf8_utf16 <wchar_t>, wchar_t> l_wstringConverter;
				wstring l_wstring = l_wstringConverter.from_bytes (a_utf8String.data ());
				return u16string (l_wstring.begin (),  l_wstring.end ());
#endif
			}

I am not happy that I have to do such a thing, though.


References


<The previous article in this series | The table of contents of this series | The next article in this series>