Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions Doc/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -705,12 +705,15 @@ Glossary
This issue can be solved with locks or by using the EAFP approach.

locale encoding
On Unix, it is the encoding of the LC_CTYPE locale. It can be set with
``locale.setlocale(locale.LC_CTYPE, new_locale)``.
On Unix, it is the encoding of the :const:`LC_CTYPE <locale.LC_CTYPE>`
locale. It can be set with ``locale.setlocale(locale.LC_CTYPE,
new_locale)``.

On Windows, it is is the ANSI code page (ex: ``cp1252``).

``locale.getpreferredencoding(False)`` can be used to get the locale
Use :func:`locale.getpreferredencoding(False)
<locale.getpreferredencoding>` to get the locale encoding and
:func:`locale.get_current_locale_encoding` to get the *current* locale
encoding.

Python uses the :term:`filesystem encoding and error handler` to convert
Expand Down
21 changes: 21 additions & 0 deletions Doc/library/locale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,24 @@ The :mod:`locale` module defines the following exception and functions:
determined.


.. function:: get_current_locale_encoding()

Get the current :term:`locale encoding`:

* On Windows, return the current ANSI code page (ex: ``"cp1252"``) for the
operating system.
* Return ``"UTF-8"`` if ``nl_langinfo(CODESET)`` returns an empty string.
* Otherwise, return ``nl_langinfo(CODESET)`` result.

On Unix, the current locale encoding is the encoding of the
:const:`LC_CTYPE` locale.

Use :func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`
to get the locale encoding.

.. versionadded:: 3.10


.. function:: getlocale(category=LC_CTYPE)

Returns the current setting for the given locale category as sequence containing
Expand Down Expand Up @@ -331,6 +349,9 @@ The :mod:`locale` module defines the following exception and functions:
The :ref:`Python preinitialization <c-preinit>` configures the LC_CTYPE
locale. See also the :term:`filesystem encoding and error handler`.

Use :func:`locale.get_current_locale_encoding` to get the *current* locale
encoding.

.. versionchanged:: 3.7
The function now always returns ``UTF-8`` on Android or if the
:ref:`Python UTF-8 Mode <utf8-mode>` is enabled.
Expand Down
2 changes: 1 addition & 1 deletion Doc/library/sys.rst
Original file line number Diff line number Diff line change
Expand Up @@ -614,7 +614,7 @@ always available.
.. function:: getdefaultencoding()

Return the name of the current default string encoding used by the Unicode
implementation.
implementation: ``"utf-8"``.


.. function:: getdlopenflags()
Expand Down
7 changes: 7 additions & 0 deletions Doc/whatsnew/3.10.rst
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,13 @@ linecache
When a module does not define ``__loader__``, fall back to ``__spec__.loader``.
(Contributed by Brett Cannon in :issue:`42133`.)

locale
------

Added :func:`locale.get_current_locale_encoding` to get the current
:term:`locale encoding`.
(Contributed by Victor Stinner in :issue:`43552`.)

os
--

Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_fileutils.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ PyAPI_FUNC(int) _Py_GetLocaleconvNumeric(
PyAPI_FUNC(void) _Py_closerange(int first, int last);

PyAPI_FUNC(wchar_t*) _Py_GetLocaleEncoding(void);
PyAPI_FUNC(wchar_t*) _Py_GetCurrentLocaleEncoding(void);
PyAPI_FUNC(PyObject*) _Py_GetLocaleEncodingObject(void);

#ifdef __cplusplus
Expand Down
35 changes: 29 additions & 6 deletions Lib/locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -620,21 +620,44 @@ def resetlocale(category=LC_ALL):
_setlocale(category, _build_localename(getdefaultlocale()))


try:
from _locale import get_current_locale_encoding
except ImportError:
try:
from _locale import nl_langinfo, CODESET

# nl_langinfo(CODESET) implementation
def get_current_locale_encoding():
result = _locale.nl_langinfo(_locale.CODESET)
if not result:
# On macOS, nl_langinfo(CODESET) can return an empty string
# when the setting has an invalid value. Default to UTF-8 in
# that case because UTF-8 is the default charset on macOS and
# the caller expects a non-empty string.
result = 'UTF-8'
return result
except ImportError:
# getdefaultlocale() implementation.
# On Windows, _locale.getdefaultlocale()[1] is the ANSI code page.
def get_current_locale_encoding():
encoding = getdefaultlocale()[1]
if encoding is None:
# LANG not set, default conservatively to ASCII
encoding = 'ascii'
return encoding

try:
from _locale import _get_locale_encoding
except ImportError:
def _get_locale_encoding():
if hasattr(sys, 'getandroidapilevel'):
if hasattr(sys, 'getandroidapilevel') or sys.platform == 'vxworks':
# On Android langinfo.h and CODESET are missing, and UTF-8 is
# always used in mbstowcs() and wcstombs().
# Always use UTF-8 on VxWorks.
return 'UTF-8'
if sys.flags.utf8_mode:
return 'UTF-8'
encoding = getdefaultlocale()[1]
if encoding is None:
# LANG not set, default conservatively to ASCII
encoding = 'ascii'
return encoding
return get_current_locale_encoding()

try:
CODESET
Expand Down
6 changes: 6 additions & 0 deletions Lib/test/test_locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -536,6 +536,12 @@ def test_getpreferredencoding(self):
# If encoding non-empty, make sure it is valid
codecs.lookup(enc)

def test_get_current_locale_encoding(self):
encoding = locale.get_current_locale_encoding()
self.assertIsInstance(encoding, str)
self.assertGreater(len(encoding), 0, encoding)
codecs.lookup(encoding)

def test_strcoll_3303(self):
# test crasher from bug #3303
self.assertRaises(TypeError, locale.strcoll, "a", None)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Added :func:`locale.get_current_locale_encoding` to get the current
:term:`locale encoding`. Patch by Victor Stinner.
37 changes: 35 additions & 2 deletions Modules/_localemodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -772,15 +772,47 @@ _locale_bind_textdomain_codeset_impl(PyObject *module, const char *domain,
#endif // HAVE_LIBINTL_H


/*[clinic input]
_locale.get_current_locale_encoding

Get the current locale encoding:

* On Windows, return the current ANSI code page (ex: ``"cp1252"``)
for the operating system.
* Return "UTF-8" if nl_langinfo(CODESET) returns an empty string.
* Otherwise, return nl_langinfo(CODESET) result.
[clinic start generated code]*/

static PyObject *
_locale_get_current_locale_encoding_impl(PyObject *module)
/*[clinic end generated code: output=fce82957b117a446 input=2c7a800e7cf93287]*/
{
wchar_t *encoding = _Py_GetCurrentLocaleEncoding();
if (encoding == NULL) {
PyErr_NoMemory();
return NULL;
}

PyObject *str = PyUnicode_FromWideChar(encoding, -1);
PyMem_RawFree(encoding);
return str;
}


/*[clinic input]
_locale._get_locale_encoding

Get the current locale encoding.
Get the locale encoding:

* "UTF-8" on Android and VxWorks;
* "UTF-8" if the Python UTF-8 Mode is enabled;
* ANSI code page on Windows;
* nl_langinfo(CODESET) otherwise.
[clinic start generated code]*/

static PyObject *
_locale__get_locale_encoding_impl(PyObject *module)
/*[clinic end generated code: output=e8e2f6f6f184591a input=513d9961d2f45c76]*/
/*[clinic end generated code: output=e8e2f6f6f184591a input=4d3ed54cd5278cf2]*/
{
return _Py_GetLocaleEncodingObject();
}
Expand Down Expand Up @@ -812,6 +844,7 @@ static struct PyMethodDef PyLocale_Methods[] = {
#endif
#endif
_LOCALE__GET_LOCALE_ENCODING_METHODDEF
_LOCALE_GET_CURRENT_LOCALE_ENCODING_METHODDEF
{NULL, NULL}
};

Expand Down
32 changes: 30 additions & 2 deletions Modules/clinic/_localemodule.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

53 changes: 35 additions & 18 deletions Python/fileutils.c
Original file line number Diff line number Diff line change
Expand Up @@ -857,30 +857,19 @@ _Py_EncodeLocaleEx(const wchar_t *text, char **str,
}


// Get the current locale encoding name:
// Get the current locale encoding:
//
// - Return "UTF-8" if _Py_FORCE_UTF8_LOCALE macro is defined (ex: on Android)
// - Return "UTF-8" if the UTF-8 Mode is enabled
// - On Windows, return the ANSI code page (ex: "cp1250")
// - Return "UTF-8" if nl_langinfo(CODESET) returns an empty string.
// - Otherwise, return nl_langinfo(CODESET).
// * On Windows, return the current ANSI code page (ex: ``"cp1252"``)
// for the operating system.
// * Return "UTF-8" if nl_langinfo(CODESET) returns an empty string.
// * Otherwise, return nl_langinfo(CODESET) result.
//
// Return NULL on memory allocation failure.
//
// See also config_get_locale_encoding()
// Result must be freed by PyMem_RawFree().
wchar_t*
_Py_GetLocaleEncoding(void)
_Py_GetCurrentLocaleEncoding(void)
{
#ifdef _Py_FORCE_UTF8_LOCALE
// On Android langinfo.h and CODESET are missing,
// and UTF-8 is always used in mbstowcs() and wcstombs().
return _PyMem_RawWcsdup(L"UTF-8");
#else
const PyPreConfig *preconfig = &_PyRuntime.preconfig;
if (preconfig->utf8_mode) {
return _PyMem_RawWcsdup(L"UTF-8");
}

#ifdef MS_WINDOWS
wchar_t encoding[23];
unsigned int ansi_codepage = GetACP();
Expand All @@ -904,6 +893,34 @@ _Py_GetLocaleEncoding(void)
return wstr;
#endif // !MS_WINDOWS

}


// Get the locale encoding:
//
// - Return "UTF-8" if _Py_FORCE_UTF8_LOCALE macro is defined (ex: on Android)
// - Return "UTF-8" if the UTF-8 Mode is enabled
// - Return _Py_GetCurrentLocaleEncoding() otherwise.
//
// Return NULL on memory allocation failure.
//
// Result must be freed by PyMem_RawFree().
//
// See also config_get_locale_encoding()
wchar_t*
_Py_GetLocaleEncoding(void)
{
#ifdef _Py_FORCE_UTF8_LOCALE
// On Android langinfo.h and CODESET are missing,
// and UTF-8 is always used in mbstowcs() and wcstombs().
return _PyMem_RawWcsdup(L"UTF-8");
#else
const PyPreConfig *preconfig = &_PyRuntime.preconfig;
if (preconfig->utf8_mode) {
return _PyMem_RawWcsdup(L"UTF-8");
}

return _Py_GetCurrentLocaleEncoding();
#endif // !_Py_FORCE_UTF8_LOCALE
}

Expand Down