113 lines
4.3 KiB
ReStructuredText
113 lines
4.3 KiB
ReStructuredText
Unicode Support
|
|
===============
|
|
|
|
.. currentmodule:: click
|
|
|
|
Click has to take extra care to support Unicode text in different
|
|
environments.
|
|
|
|
* The command line in Unix is traditionally bytes, not Unicode. While
|
|
there are encoding hints, there are some situations where this can
|
|
break. The most common one is SSH connections to machines with
|
|
different locales.
|
|
|
|
Misconfigured environments can cause a wide range of Unicode
|
|
problems due to the lack of support for roundtripping surrogate
|
|
escapes. This will not be fixed in Click itself!
|
|
|
|
* Standard input and output is opened in text mode by default. Click
|
|
has to reopen the stream in binary mode in certain situations.
|
|
Because there is no standard way to do this, it might not always
|
|
work. Primarily this can become a problem when testing command-line
|
|
applications.
|
|
|
|
This is not supported::
|
|
|
|
sys.stdin = io.StringIO('Input here')
|
|
sys.stdout = io.StringIO()
|
|
|
|
Instead you need to do this::
|
|
|
|
input = 'Input here'
|
|
in_stream = io.BytesIO(input.encode('utf-8'))
|
|
sys.stdin = io.TextIOWrapper(in_stream, encoding='utf-8')
|
|
out_stream = io.BytesIO()
|
|
sys.stdout = io.TextIOWrapper(out_stream, encoding='utf-8')
|
|
|
|
Remember in that case, you need to use ``out_stream.getvalue()``
|
|
and not ``sys.stdout.getvalue()`` if you want to access the buffer
|
|
contents as the wrapper will not forward that method.
|
|
|
|
* ``sys.stdin``, ``sys.stdout`` and ``sys.stderr`` are by default
|
|
text-based. When Click needs a binary stream, it attempts to
|
|
discover the underlying binary stream.
|
|
|
|
* ``sys.argv`` is always text. This means that the native type for
|
|
input values to the types in Click is Unicode, not bytes.
|
|
|
|
This causes problems if the terminal is incorrectly set and Python
|
|
does not figure out the encoding. In that case, the Unicode string
|
|
will contain error bytes encoded as surrogate escapes.
|
|
|
|
* When dealing with files, Click will always use the Unicode file
|
|
system API by using the operating system's reported or guessed
|
|
filesystem encoding. Surrogates are supported for filenames, so it
|
|
should be possible to open files through the :class:`File` type even
|
|
if the environment is misconfigured.
|
|
|
|
|
|
Surrogate Handling
|
|
------------------
|
|
|
|
Click does all the Unicode handling in the standard library and is
|
|
subject to its behavior. Unicode requires extra care. The reason for
|
|
this is that the encoding detection is done in the interpreter, and on
|
|
Linux and certain other operating systems, its encoding handling is
|
|
problematic.
|
|
|
|
The biggest source of frustration is that Click scripts invoked by init
|
|
systems, deployment tools, or cron jobs will refuse to work unless a
|
|
Unicode locale is exported.
|
|
|
|
If Click encounters such an environment it will prevent further
|
|
execution to force you to set a locale. This is done because Click
|
|
cannot know about the state of the system once it's invoked and restore
|
|
the values before Python's Unicode handling kicked in.
|
|
|
|
If you see something like this error::
|
|
|
|
Traceback (most recent call last):
|
|
...
|
|
RuntimeError: Click will abort further execution because Python was
|
|
configured to use ASCII as encoding for the environment. Consult
|
|
https://click.palletsprojects.com/unicode-support/ for mitigation
|
|
steps.
|
|
|
|
You are dealing with an environment where Python thinks you are
|
|
restricted to ASCII data. The solution to these problems is different
|
|
depending on which locale your computer is running in.
|
|
|
|
For instance, if you have a German Linux machine, you can fix the
|
|
problem by exporting the locale to ``de_DE.utf-8``::
|
|
|
|
export LC_ALL=de_DE.utf-8
|
|
export LANG=de_DE.utf-8
|
|
|
|
If you are on a US machine, ``en_US.utf-8`` is the encoding of choice.
|
|
On some newer Linux systems, you could also try ``C.UTF-8`` as the
|
|
locale::
|
|
|
|
export LC_ALL=C.UTF-8
|
|
export LANG=C.UTF-8
|
|
|
|
On some systems it was reported that ``UTF-8`` has to be written as
|
|
``UTF8`` and vice versa. To see which locales are supported you can
|
|
invoke ``locale -a``.
|
|
|
|
You need to export the values before you invoke your Python script.
|
|
|
|
In Python 3.7 and later you will no longer get a ``RuntimeError`` in
|
|
many cases thanks to :pep:`538` and :pep:`540`, which changed the
|
|
default assumption in unconfigured environments. This doesn't change the
|
|
general issue that your locale may be misconfigured.
|