mirror of
https://github.com/rclone/rclone.git
synced 2025-01-22 06:09:21 +01:00
docs: add faq section explaining why rclone changes fullwidth characters in file names
This commit is contained in:
parent
33bf9b4923
commit
91cdaffcc1
@ -209,3 +209,21 @@ The most common cause of rclone using lots of memory is a single
|
||||
directory with thousands or millions of files in. Rclone has to load
|
||||
this entirely into memory as rclone objects. Each rclone object takes
|
||||
0.5k-1k of memory.
|
||||
|
||||
### Rclone changes fullwidth Unicode punctuation marks in file names
|
||||
|
||||
For example: On a Windows system, you have a file with name `Test:1.jpg`,
|
||||
where `:` is the Unicode fullwidth colon symbol. When using rclone
|
||||
to copy this to your Google Drive, you will notice that the file
|
||||
gets renamed to `Test:1.jpg`, where `:` is the regular (halfwidth) colon.
|
||||
|
||||
The reason for such renames is the way rclone handles different
|
||||
[restricted filenames](/overview/#restricted-filenames) on different
|
||||
cloud storage systems. It tries to avoid ambiguous file names as
|
||||
much and allow moving files between many cloud storage systems
|
||||
transparently, by replacing invalid characters with similar looking
|
||||
Unicode characters when transferring to one storage system, and replacing
|
||||
back again when transferring to a different storage system where the
|
||||
original characters are supported. When the same Unicode characters
|
||||
are intentionally used in file names, this replacement strategy leads
|
||||
to unwanted renames. Read more [here](/overview/#restricted-filenames-caveats).
|
||||
|
@ -46,6 +46,10 @@ Local file system at .: Replacing invalid UTF-8 characters in "gro\xdf"
|
||||
|
||||
#### Restricted characters
|
||||
|
||||
With the local backend, restrictions on the characters that are usable in
|
||||
file or directory names depend on the operating system. To check what
|
||||
rclone will replace by default on your system, run `rclone help flags local-encoding`.
|
||||
|
||||
On non Windows platforms the following characters are replaced when
|
||||
handling file names.
|
||||
|
||||
|
@ -138,7 +138,8 @@ Some cloud storage systems might have restrictions on the characters
|
||||
that are usable in file or directory names.
|
||||
When `rclone` detects such a name during a file upload, it will
|
||||
transparently replace the restricted characters with similar looking
|
||||
Unicode characters.
|
||||
Unicode characters. To handle the different sets of restricted characters
|
||||
for different backends, rclone uses something it calls [encoding](#encoding).
|
||||
|
||||
This process is designed to avoid ambiguous file names as much as
|
||||
possible and allow to move files between many cloud storage systems
|
||||
@ -150,14 +151,60 @@ to ensure correct formatting and not necessarily the actual name used
|
||||
on the cloud storage.
|
||||
|
||||
This transformation is reversed when downloading a file or parsing
|
||||
`rclone` arguments.
|
||||
For example, when uploading a file named `my file?.txt` to Onedrive
|
||||
will be displayed as `my file?.txt` on the console, but stored as
|
||||
`my file?.txt` (the `?` gets replaced by the similar looking `?`
|
||||
character) to Onedrive.
|
||||
The reverse transformation allows to read a file`unusual/name.txt`
|
||||
from Google Drive, by passing the name `unusual/name.txt` (the `/` needs
|
||||
to be replaced by the similar looking `/` character) on the command line.
|
||||
`rclone` arguments. For example, when uploading a file named `my file?.txt`
|
||||
to Onedrive, it will be displayed as `my file?.txt` on the console, but
|
||||
stored as `my file?.txt` to Onedrive (the `?` gets replaced by the similar
|
||||
looking `?` character, the so-called "fullwidth question mark").
|
||||
The reverse transformation allows to read a file `unusual/name.txt`
|
||||
from Google Drive, by passing the name `unusual/name.txt` on the command line
|
||||
(the `/` needs to be replaced by the similar looking `/` character).
|
||||
|
||||
#### Caveats {#restricted-filenames-caveats}
|
||||
|
||||
The filename encoding system works well in most cases, at least
|
||||
where file names are written in English or similar languages.
|
||||
You might not even notice it: It just works. In some cases it may
|
||||
lead to issues, though. E.g. when file names are written in Chinese,
|
||||
or Japanese, where it is always the Unicode fullwidth variants of the
|
||||
punctuation marks that are used.
|
||||
|
||||
On Windows, the characters `:`, `*` and `?` are examples of restricted
|
||||
characters. If these are used in filenames on a remote that supports it,
|
||||
Rclone will transparently convert them to their fullwidth Unicode
|
||||
variants `*`, `?` and `:` when downloading to Windows, and back again
|
||||
when uploading. This way files with names that are not allowed on Windows
|
||||
can still be stored.
|
||||
|
||||
However, if you have files on your Windows system originally with these same
|
||||
Unicode characters in their names, they will be included in the same conversion
|
||||
process. E.g. if you create a file in your Windows filesystem with name
|
||||
`Test:1.jpg`, where `:` is the Unicode fullwidth colon symbol, and use
|
||||
rclone to upload it to Google Drive, which supports regular `:` (halfwidth
|
||||
question mark), rclone will replace the fullwidth `:` with the
|
||||
halfwidth `:` and store the file as `Test:1.jpg` in Google Drive. Since
|
||||
both Windows and Google Drive allows the name `Test:1.jpg`, it would
|
||||
probably be better if rclone just kept the name as is in this case.
|
||||
|
||||
With the opposite situation; if you have a file named `Test:1.jpg`,
|
||||
in your Google Drive, e.g. uploaded from a Linux system where `:` is valid
|
||||
in file names. Then later use rclone to copy this file to your Windows
|
||||
computer you will notice that on your local disk it gets renamed
|
||||
to `Test:1.jpg`. The original filename is not legal on Windows, due to
|
||||
the `:`, and rclone therefore renames it to make the copy possible.
|
||||
That is all good. However, this can also lead to an issue: If you already
|
||||
had a *different* file named `Test:1.jpg` on Windows, and then use rclone
|
||||
to copy either way. Rclone will then treat the file originally named
|
||||
`Test:1.jpg` on Google Drive and the file originally named `Test:1.jpg`
|
||||
on Windows as the same file, and replace the contents from one with the other.
|
||||
|
||||
Its virtually impossible to handle all cases like these correctly in all
|
||||
situations, but by customizing the [encoding option](#encoding), changing the
|
||||
set of characters that rclone should convert, you should be able to
|
||||
create a configuration that works well for your specific situation.
|
||||
See also the [example](/overview/#encoding-example-windows) below.
|
||||
|
||||
(Windows was used as an example of a file system with many restricted
|
||||
characters, and Google drive a storage system with few.)
|
||||
|
||||
#### Default restricted characters {#restricted-characters}
|
||||
|
||||
@ -230,7 +277,7 @@ names in a different encoding than UTF-8 or UTF-16, like latin1. See the
|
||||
|
||||
#### Encoding option {#encoding}
|
||||
|
||||
Most backends have an encoding options, specified as a flag
|
||||
Most backends have an encoding option, specified as a flag
|
||||
`--backend-encoding` where `backend` is the name of the backend, or as
|
||||
a config parameter `encoding` (you'll need to select the Advanced
|
||||
config in `rclone config` to see it).
|
||||
@ -240,17 +287,17 @@ such a way as to preserve the maximum number of characters (see
|
||||
above).
|
||||
|
||||
However this can be incorrect in some scenarios, for example if you
|
||||
have a Windows file system with characters such as `*` and `?` that
|
||||
you want to remain as those characters on the remote rather than being
|
||||
translated to `*` and `?`.
|
||||
have a Windows file system with Unicode fullwidth characters
|
||||
`*`, `?` or `:`, that you want to remain as those characters on the
|
||||
remote rather than being translated to regular (halfwidth) `*`, `?` and `:`.
|
||||
|
||||
The `--backend-encoding` flags allow you to change that. You can
|
||||
disable the encoding completely with `--backend-encoding None` or set
|
||||
`encoding = None` in the config file.
|
||||
|
||||
Encoding takes a comma separated list of encodings. You can see the
|
||||
list of all available characters by passing an invalid value to this
|
||||
flag, e.g. `--local-encoding "help"` and `rclone help flags encoding`
|
||||
list of all possible values by passing an invalid value to this
|
||||
flag, e.g. `--local-encoding "help"`. The command `rclone help flags encoding`
|
||||
will show you the defaults for the backends.
|
||||
|
||||
| Encoding | Characters |
|
||||
@ -263,7 +310,7 @@ will show you the defaults for the backends.
|
||||
| Ctl | All control characters 0x00-0x1F |
|
||||
| Del | DEL 0x7F |
|
||||
| Dollar | `$` |
|
||||
| Dot | `.` |
|
||||
| Dot | `.` or `..` as entire string |
|
||||
| DoubleQuote | `"` |
|
||||
| Hash | `#` |
|
||||
| InvalidUtf8 | An invalid UTF-8 character (e.g. latin1) |
|
||||
@ -283,6 +330,8 @@ will show you the defaults for the backends.
|
||||
| Slash | `/` |
|
||||
| SquareBracket | `[`, `]` |
|
||||
|
||||
##### Encoding example: FTP
|
||||
|
||||
To take a specific example, the FTP backend's default encoding is
|
||||
|
||||
--ftp-encoding "Slash,Del,Ctl,RightSpace,Dot"
|
||||
@ -300,14 +349,42 @@ to the existing ones, giving:
|
||||
|
||||
This can be specified using the `--ftp-encoding` flag or using an `encoding` parameter in the config file.
|
||||
|
||||
Or let's say you have a Windows server but you want to preserve `*`
|
||||
and `?`, you would then have this as the encoding (the Windows
|
||||
encoding minus `Asterisk` and `Question`).
|
||||
##### Encoding example: Windows
|
||||
|
||||
Slash,LtGt,DoubleQuote,Colon,Pipe,BackSlash,Ctl,RightSpace,RightPeriod,InvalidUtf8,Dot
|
||||
As a nother example, take a Windows system where there is a file with
|
||||
name `Test:1.jpg`, where `:` is the Unicode fullwidth colon symbol.
|
||||
When using rclone to copy this to a remote which supports `:`,
|
||||
the regular (halfwidth) colon (such as Google Drive), you will notice
|
||||
that the file gets renamed to `Test:1.jpg`.
|
||||
|
||||
This can be specified using the `--local-encoding` flag or using an
|
||||
`encoding` parameter in the config file.
|
||||
To avoid this you can change the set of characters rclone should convert
|
||||
for the local filesystem, using command-line argument `--local-encoding`.
|
||||
Rclone's default behavior on Windows corresponds to
|
||||
|
||||
```
|
||||
--local-encoding "Slash,LtGt,DoubleQuote,Colon,Question,Asterisk,Pipe,BackSlash,Ctl,RightSpace,RightPeriod,InvalidUtf8,Dot"
|
||||
```
|
||||
|
||||
If you want to use fullwidth characters `:`, `*` and `?` in your filenames
|
||||
without rclone changing them when uploading to a remote, then set the same as
|
||||
the default value but without `Colon,Question,Asterisk`:
|
||||
|
||||
```
|
||||
--local-encoding "Slash,LtGt,DoubleQuote,Pipe,BackSlash,Ctl,RightSpace,RightPeriod,InvalidUtf8,Dot"
|
||||
```
|
||||
|
||||
Alternatively, you can disable the conversion of any characters with `--local-encoding None`.
|
||||
|
||||
Instead of using command-line argument `--local-encoding`, you may also set it
|
||||
as [environment variable](/docs/#environment-variables) `RCLONE_LOCAL_ENCODING`,
|
||||
or [configure](/docs/#configure) a remote of type `local` in your config,
|
||||
and set the `encoding` option there.
|
||||
|
||||
The risk by doing this is that if you have a filename with the regular (halfwidth)
|
||||
`:`, `*` and `?` in your cloud storage, and you try to download
|
||||
it to your Windows filesystem, this will fail. These characters are not
|
||||
valid in filenames on Windows, and you have told rclone not to work around
|
||||
this by converting them to valid fullwidth variants.
|
||||
|
||||
### MIME Type ###
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user