rclone/docs/content/hdfs.md
Yury Stankevich 71edc75ca6 HDFS (Hadoop Distributed File System) implementation - #42
This includes an HDFS docker image to use with the integration tests.

Co-authored-by: Ivan Andreev <ivandeex@gmail.com>
Co-authored-by: Nick Craig-Wood <nick@craig-wood.com>
2021-01-07 09:48:51 +00:00

200 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "HDFS Remote"
description: "Remote for Hadoop Distributed Filesystem"
---
{{< icon "fa fa-globe" >}} HDFS
-------------------------------------------------
[HDFS](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) is a
distributed file-system, part of the [Apache Hadoop](https://hadoop.apache.org/) framework.
Paths are specified as `remote:` or `remote:path/to/dir`.
Here is an example of how to make a remote called `remote`. First run:
rclone config
This will guide you through an interactive setup process:
```
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
[skip]
XX / Hadoop distributed file system
\ "hdfs"
[skip]
Storage> hdfs
** See help for hdfs backend at: https://rclone.org/hdfs/ **
hadoop name node and port
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Connect to host namenode at port 8020
\ "namenode:8020"
namenode> namenode.hadoop:8020
hadoop user name
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Connect to hdfs as root
\ "root"
username> root
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
--------------------
[remote]
type = hdfs
namenode = namenode.hadoop:8020
username = root
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:
Name Type
==== ====
hadoop hdfs
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
```
This remote is called `remote` and can now be used like this
See all the top level directories
rclone lsd remote:
List the contents of a directory
rclone ls remote:directory
Sync the remote `directory` to `/home/local/directory`, deleting any excess files.
rclone sync -i remote:directory /home/local/directory
### Setting up your own HDFS instance for testing
You may start with a [manual setup](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html)
or use the docker image from the tests:
If you want to build the docker image
```
git clone https://github.com/rclone/rclone.git
cd rclone/fstest/testserver/images/test-hdfs
docker build --rm -t rclone/test-hdfs .
```
Or you can just use the latest one pushed
```
docker run --rm --name "rclone-hdfs" -p 127.0.0.1:9866:9866 -p 127.0.0.1:8020:8020 --hostname "rclone-hdfs" rclone/test-hdfs
```
**NB** it need few seconds to startup.
For this docker image the remote needs to be configured like this:
```
[remote]
type = hdfs
namenode = 127.0.0.1:8020
username = root
```
You can stop this image with `docker kill rclone-hdfs` (**NB** it does not use volumes, so all data
uploaded will be lost.)
### Modified time
Time accurate to 1 second is stored.
### Checksum
No checksums are implemented.
### Usage information
You can use the `rclone about remote:` command which will display filesystem size and current usage.
### Restricted filename characters
In addition to the [default restricted characters set](/overview/#restricted-characters)
the following characters are also replaced:
| Character | Value | Replacement |
| --------- |:-----:|:-----------:|
| : | 0x3A | |
Invalid UTF-8 bytes will also be [replaced](/overview/#invalid-utf8).
### Limitations
- No server-side `Move` or `DirMove`.
- Checksums not implemented.
{{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/hdfs/hdfs.go then run make backenddocs" >}}
### Standard Options
Here are the standard options specific to hdfs (Hadoop distributed file system).
#### --hdfs-namenode
hadoop name node and port
- Config: namenode
- Env Var: RCLONE_HDFS_NAMENODE
- Type: string
- Default: ""
- Examples:
- "namenode:8020"
- Connect to host namenode at port 8020
#### --hdfs-username
hadoop user name
- Config: username
- Env Var: RCLONE_HDFS_USERNAME
- Type: string
- Default: ""
- Examples:
- "root"
- Connect to hdfs as root
### Advanced Options
Here are the advanced options specific to hdfs (Hadoop distributed file system).
#### --hdfs-encoding
This sets the encoding for the backend.
See: the [encoding section in the overview](/overview/#encoding) for more info.
- Config: encoding
- Env Var: RCLONE_HDFS_ENCODING
- Type: MultiEncoder
- Default: Slash,Colon,Del,Ctl,InvalidUtf8,Dot
{{< rem autogenerated options stop >}}