Virtual File Systems

Description

Apache Virtual File System (VFS) is part of the Apache Commons project. Through VFS, Hop users can access various files from different sources such as files on the local disk, on a HTTP(S) server, inside a ZIP archive and so through a url format.

Apache Hop makes fervent usage of VFS. Beyond the standard VFS file system types, we have added a number which are present in the various technology stacks supported by Hop. Like the standard file systems each has its own unique name scheme which you can use.

Apache Hop VFS File Systems

The table below provides a quick overview of the VFS file systems supported by Apache Hop. Click the File system name to access more detailed file system documentation.

File SystemDescriptionURI Format

AWS S3

Provides access to Amazon S3 Buckets

s3://

Azure Blob Storage

Provides access to Azure Blob Storage

azure://

Dropbox

Provides access to Dropbox

dropbox://

Google Cloud Storage

Provides access to Google Cloud Storage buckets

gs://

Google Drive

Provides access to Google Drive folders

googledrive://

Apache VFS File System Types

The table below lists the file system types provided by the default Apache VFS implementation.

Check the Apache VFS file system types for more information on the supported functionality per files system.

File SystemDescriptionURI Format

BZIP2

Provides read-only access to the contents of gzip and bzip2 files.

URI Format

gz:// compressed-file-uri

bz2:// compressed-file-uri

Where compressed-file-uri refers to a file of any supported type. There is no need to add a ! part to the URI if you read the content of the file you always will get the uncompressed version.

Examples

  • gz:/my/gz/file.gz

CIFS

File

Provides access to the files on the local physical file system.

URI Format

[file://] absolute-path

Where absolute-path is a valid absolute file name for the local platform. UNC names are supported under Windows.

Examples

  • file:///home/someuser/somedir

  • file:///C:/Documents and Settings

  • file://///somehost/someshare/afile.txt

  • /home/someuser/somedir

  • c:\program files\some dir

  • c:/program files/some dir

FTP

Provides access to the files on an FTP server.

URI Format

tp://[ username[: password]@] hostname[: port][ relative-path]

Examples

ftp://myusername:mypassword@somehost/pub/downloads/somefile.tgz

By default, the path is relative to the user’s home directory. This can be changed with:

FtpFileSystemConfigBuilder.getInstance().setUserDirIsRoot(options, false);

FTPS

Provides access to the files on an FTP server over SSL.

URI Format

ftps://[ username[: password]@] hostname[: port][ absolute-path]

Examples

ftps://myusername:mypassword@somehost/pub/downloads/somefile.tgz

GZIP

see ‘bzip2’

HDFS

Provides access to files in an Apache Hadoop File System (HDFS). On Windows the integration test is disabled by default, as it requires binaries.

URI Format

hdfs:// hostname[: port][ absolute-path]

Examples

  • hdfs://somehost:8080/downloads/some_dir

  • hdfs://somehost:8080/downloads/some_file.ext

HTTP(S)

Provides access to files on an HTTP server.

URI Format

http://[ username[: password]@] hostname[: port][ absolute-path]

https://[ username[: password]@] hostname[: port][ absolute-path]

File System Options

  • proxyHost The proxy host to connect through.

  • proxyPort The proxy port to use.

  • proxyScheme The proxy scheme (http/https) to use.

  • cookies An array of Cookies to add to the request.

  • maxConnectionsPerHost The maximum number of connections allowed to a specific host and port. The default is 5.

  • maxTotalConnections The maximum number of connections allowed to all hosts. The default is 50.

  • keystoreFile The keystore file for SSL connections.

  • keystorePass The keystore password.

  • keystoreType The keystore type.

Examples

Jar, Zip and Tar

Provides read-only access to the contents of Zip, Jar and Tar files.

URI Format

zip:// arch-file-uri[! absolute-path]

jar:// arch-file-uri[! absolute-path]

tar:// arch-file-uri[! absolute-path]

tgz:// arch-file-uri[! absolute-path]

tbz2:// arch-file-uri[! absolute-path]

Where arch-file-uri refers to a file of any supported type, including other zip files. Note: if you would like to use the ! as normal character it must be escaped using %21. tgz and tbz2 are convenience for tar:gz and tar:bz2.

Examples

mime

This (sandbox) filesystem can read mails and its attachements like archives. If a part in the parsed mail has no name, a dummy name will be generated. The dummy name is: _body_part_X where X will be replaced by the part number.

URI Format

mime:// mime-file-uri[! absolute-path]

Examples

  • mime:file:///your/path/mail/anymail.mime!/

  • mime:file:///your/path/mail/anymail.mime!/filename.pdf

  • mime:file:///your/path/mail/anymail.mime!/_body_part_0

RAM

A filesystem which stores all the data in memory (one byte array for each file content).

URI Format

ram://[ path]

File System Options

  • maxsize Maximum filesystem size (total bytes of all file contents).

Examples

  • ram:///any/path/to/file.txt

RES

This is not really a filesystem, it just tries to lookup a resource using javas ClassLoader.getResource() and creates a VFS url for further processing.

URI Format

res://[ path]

Examples

  • +res://path/in/classpath/image.png might result in jar:file://my/path/to/images.jar!/path/in/classpath/image.png+

SFTP

Provides access to the files on an SFTP server (that is, an SSH or SCP server).

URI Format

sftp://[ username[: password]@] hostname[: port][ relative-path]

Examples

  • sftp://myusername:mypassword@somehost/pub/downloads/somefile.tgz

Tar

see ‘jar’

Temp

Provides access to a temporary file system, or scratchpad, that is deleted when Commons VFS shuts down. The temporary file system is backed by a local file system.

URI Format

tmp://[ absolute-path]

Examples

  • tmp://dir/somefile.txt

WebDAV

Provides access to files on a WebDAV server through the modules commons-vfs2-jackrabbit1 and commons-vfs2-jackrabbit2.

URI Format

webdav://[ username[: password]@] hostname[: port][ absolute-path]

File System Options

  • versioning true if versioning should be enabled

  • creatorName the user name to be identified with changes to a file. If not set the user name used to authenticate will be used.

Examples

  • webdav://somehost:8080/dist

Zip

see ‘jar’

*) VFS file system type in development