Updated for version 1.21pl78.

Introduction

AFT provides a secure and flexible file transfer infrastructure for heterogeneous environments which work in a cyclic regime and need to automate its operations.

The links are TCP sockets and by default the transmitted data is encrypted with TLS.

Basic Concepts

AFT works in two basic roles: hub or agent. The hubs "take the control" of a number of agents. The hubs implement file transfers in three modes:

  1. GET for extracting files from a "source agent"

  2. PUT for pushing files into a "destination agent"

  3. A2A (agent to agent) for extracting files from a "source agent" (to a temporary directory) and then push such files into a "destination agent"

The hubs and the agents allow the configuration of user defined transfers (in any working mode) identified by a simple label or "id".

Hubs transfers in GET mode are identified by [hub-get:id] configuration sections, for PUT mode by [hub-put:id] sections, and the A2A mode by [hub-a2a:id] sections.

For a "GET hub transfer" there must be a corresponding [agent-src:id] section in the agent configuration; for a "PUT hub transfer" does correspond a [agent-dst:id] agent configuration; finally, for an "agent to agent hub transfer" two agents with [agent-src:id] and [agent-dst:id] must be configured.

AGENT-SRC  =====>    HUB-GET

                     HUB-PUT    =====>    AGENT-DST

AGENT-SRC  =====>    HUB-A2A     =====>  AGENT-DST

The "physical" TCP network connection may be established from the hub to the agents:

Hub starts the connection to agents (hub is a TCP client)

Or from agents to the hub:

Agents start the connection to the hub (hub is a TCP server)

In both cases, the hub may GET files from the agents, PUT files into agents, or combine both actions with the A2A mode.

The communication is encrypted by default using TLS.

In the design of AFT, it was envisioned that a hub instance will be in communication with several associated agents; but nothing precludes the user to setup several pairs of hub/agent instances for independent file transfers, or agent instances attending more than one hub, even simultaneously.

Running AFT

Step 1

AFT does require Java 8 or higher

Step 2

A configuration file named aft.cfg (in the etc subdirectory) must be provided by the user. The rest of this document deals with its construction.

Step 3

Start with:

cd bin
./run-aft.sh

on Windows systems, just double click the bin\run-aft.bat batch file, or manually execute it:

cd bin
run-aft.bat

In order to stop AFT, the stop-aft.sh and stop-aft.bat scripts are provided.

Configuration

The aft.cfg file is used in order to configure the hubs and the agents. This is a plain text file with sections delimited by headers.

The hubs define a number of "programmed file transfers" by [hub-a2a:name], [hub-get:name] and [hub-put:name] sections; optionally a [hub] section may be used to set some global parameters. The transfers are implemented by the hub "controlling" the agents by means of a TCP socket.

The agents are defined by a number of configured file transfers which start with an [agent-src:name] or an [agent-dst:name] section; optionally, an [agent] section may be present in order to set some common parameters.

Note that the configuration file may be modified at any time; it is reloaded about every minute, except when a transfer is in progress.

Agent Configuration

AFT promotes minimal agent configuration. The mandatory parameters are in place in order to avoid giving the hub excessive power over the agent computer; for example, we force the agent’s participating directories to be locally specified to prevent malicious access to sensitive paths from a compromised hub.

The agent configuration does allow for an optional [agent] section, for which the following settings are provided:

agent-port

TCP listen port. A mandatory setting for listening agents.

max-children

An optional setting for the maximum number of simultaneous connected hubs. Defaults to 5.

Example:

[agent]
agent-port = 20111

Agent File Transfers

Next, a number of "file transfers" are defined for the agent side, which allow the hub to extract or push files, from or into the agent host. Both cases correspond to [agent-src:name] and [agent-dst:name] sections respectively.

dir

The directory from which the files are to be extracted (in agent-src configurations) or into which the files will be written (in agent-dst configurations.)

Example:

[agent]
agent-port = 20111

[agent-src:bravo]
dir = /home/bravo/x-files
exec-after-transfer

A command to be executed via java’s ProcessBuilder after every file is transfered to/from the agent. The file name is added to the command. If the execution fails, it is logged out but the transfer is not stopped.

Warning: the AFT agent process may be blocked indefinitely if the command hangs.

The command must be an existing executable which will be executed via java’s ProcessBuilder with the file name as a single argument. No extra arguments are allowed for the command.

rename-destination

A pattern-oriented replacement for renaming the files being tranfered (valid for [agent-dst].) The syntax is rename-destination=search-pattern⇒replacement-pattern. The search-pattern follows the java’s Matcher#matches() method semantics, while the replacement-patterns is implemented by java’s Matcher#replaceAll() method. Several rename-destination directives may be declared in the configuration, allowing for distinct replacements.

Example:

[agent]
agent-port=55123

[agent-dst:tsr1]
dir=/tempo/agent-2
rename-destination=file(\d).txt=>file$1.dat

this will rename files like file1.txt, file2.txt…​ into file1.dat, file2.dat, etc.

Agent in Client Mode

When the agents initiate the TCP connection (TCP client), the following settings must be set:

hub-host

The hub hostname or IP address.

hub-port

The hub listening port number.

hub-connect-check

When to connect to the hub in order to transfer pending files. It can take the form delay:# for a fixed time retry specifying a number of seconds, and cron:expr where expr is a crontab expression, none (don’t attempt to connect; the hub will initiate the connection.) Defaults to none, and once which means to attempt the connection, do the work and shutdown the AFT process.

The once mode definitions are activated when AFT is started with the -agent=name command line argument; else, they are ignored.

Note that the agents may initiate or wait for the TCP connections with the hub, but this is totally independent of the agent-src/agent-dst transfer modes.

The cron expression follows the semantics of the Spring Framework.

Example:

# a single client mode agent transfer
# no [agent] nor agent-port needed at all

[agent-src:bravo]
dir = /home/bravo/x-files
hub-host = 192.1.4.51
hub-port = 6001
hub-connect-check = delay:3600

Hub Configuration

The hub configuration global [hub] section is optional, but is needed when the hub acts as a TCP server.

hub-port

Only if the hub will receive incoming connections from agents, a listening port must be set.

max-children

An optional setting for the maximum number of simultaneous connected "client" agents. Defaults to 100.

Example:

[hub]
hub-port = 6001

Hub File Transfers

The hub configuration has a number of programmed file transfers.

Example:

In the following configuration, my-transfer is a configuration which transfers files from the host 23.45.122.50 to the host 23.45.121.33 using a intermediate queue directory /home/aft/queue1 located in the hub host.

[hub-a2a:my-transfer]
src-host=23.45.122.50
src-port=14551
queue-dir=/home/aft/queue1
dst-host=23.45.121.33
dst-port=21001

The following settings are provided:

src-host

Source agent hostname or IP address where the hub will attempt a connection in order to extract files. Valid for "A2A" and "GET" modes. If not present, then it is assumed that the source agent will initiate the connection.

src-port

Port number of the source agent in order to establish the communication.

dst-dir

Final location (directory) for the files in the destination host. Only valid for GET file transfers. AFT does not create such directory.

Example:

# a GET file transfer: extract files every 12 hours
[hub-get:t2]
src-host=fx8320.ana.com.uy
src-port=41232
src-connect-check=delay:43200
dst-dir=/tmp/hub-destination
dst-host

Destination agent hostname or IP address where the hub will attempt a connection in order to push files. Valid for "A2A" and "PUT" modes. If not present, then it is assumed that the destination agent will initiate the connection.

dst-port

Port number of the destination agent in order to establish the communication.

src-dir

Location (directory) of the files to be extracted. Only valid for PUT file transfers. AFT does not create such directory.

Example:

# a PUT file transfer
[hub-put:t1]
src-dir=/tmp/hub-dir
dst-host=fx8320
dst-port=55123
queue-dir

Directory in the hub computer for temporary storage of extracted files, before sending into the destination host. The directory must exist and allow the creation of files. AFT does not create such directory. The queue directories can’t be shared by two or more transfers. Only valid for "A2A" mode.

no-change-check-seconds

When a file is about to be read, AFT checks its modification time in order to prevent the transmission of an uncompleted file which is being written. This setting configures the "antiquity" (in seconds) the file must have in order to reasonably guarantee the termination of its writing process. When the file is "too new" then another attempt will be done after a number of seconds as defined in the op-timeout-seconds setting. A zero setting totally avoids this check. Only valid for "PUT" mode.

no-change-retry-times

How many times to retry the extraction if a file is too new as per no-change-check-seconds.

src-connect-check

When to connect to the source agent in order to extract pending files. It can take the form delay:# for a fixed time retry specifying a number of seconds, and cron:expr where expr is a crontab expression, or none (don’t attempt to connect; agent will initiate the connection.) If no src-host is defined, defaults to none; is src-host is defined, defaults to delay:900 which means an extraction attempt every 15 minutes.

files

What files are to be extracted and transfered. Valid formats are all, list:name,…​, ereg:expr, and tereg:expr; defaults to all (all the files in the directory.) The list mode allows a comma separated list of exact file names to be specified; the ereg mode is used to specify a regular expression pattern to match the interesting file names. Finally, the tereg is a two step process where the subexpressions enclosed between %…​% are used as java’s SimpleDateFormat formatters with the current time (in order to build a time generated pattern) and then the result is used as a regular expression (like the ereg case.) Two consecutive percent signs are used to generate a single percent sign.

Example:

# only extract the three specified files
[hub-get:t2]
src-host=fx8320.ana.com.uy
src-port=41232
dst-dir=/tmp/hub-destination
files=list:file2.txt,file3.txt,file4.txt

The same result may be obtained with:

# only extract the three specified files
[hub-get:t2]
src-host=fx8320.ana.com.uy
src-port=41232
dst-dir=/tmp/hub-destination
files=ereg:file[234]\.txt

A time-related expression using the treg: format:

# only extract the files for today
[hub-get:t2]
src-host=fx8320.ana.com.uy
src-port=41232
dst-dir=/tmp/hub-destination
files=tereg:%yyyyMMdd%\.txt

The list:, ereg: and tereg: forms also admit a negated form with the corresponding !list:, !ereg: and !tereg: prefixes. For example:

# extract any YYYYMMDD.txt file except the today's one
[hub-get:t2]
src-host=fx8320.ana.com.uy
src-port=41232
dst-dir=/tmp/hub-destination
files=!tereg:%yyyyMMdd%\.txt
recursive

Whether to transfer the source files and the contents of its subdirectories. Defaults to false. This is an experimental setting, so use with caution.

dst-connect-check

When to check out for pushing files into a destination agent. See src-connect-check for the syntax and default value.

dst-connect-after-transfer

For A2A mode, whether to try to connect to the destination agent immediately after the extraction from the source agent, additionally to the programmed transfer regime of dst-connect-check. Set to true or false. Defaults to false.

src-exec-after-transfer

After the successful extraction of a file from the source agent, a user-defined program may be executed specifying an executable name with this setting. The just extracted file name is passed as an argument to this program. The program is executed in the hub host. If the execution fails, it is logged out but the transfer is not stopped. This setting is valid only for "A2A" and "GET" modes.

dst-exec-after-transfer

After the successful delivery of a file in its destination agent, a user-defined program may be executed specifying an executable name with this setting. The just transfered file name is passed as an argument to this program. The program is executed in the hub host. If the execution fails, it is logged but the transfer is not stopped. This setting is valid for "A2A" and "PUT" modes.

cleanup-mode

What to do with the source file after successful transfer. Valid options are remove, truncate, and none; defaults to remove.

Note: this action is carried on after any configured command execution.

Warning: the default remove may be unexpected to some (most?) users. The rationale is that the information is not lost at all since the transfer was effectively done. Be careful!

compress-mode

Compression operation mode. Valid settings are:

  1. none to avoid any compression

  2. network compress data in-transit in order to reduce network traffic

  3. gzip compress data in-transit and stores the files in Gzip compressed format

Defaults to network. The none mode is useful when transmitting already compressed files or any kind of non-compressible files (like encrypted ones.) It avoids the CPU consumption required by the compression/decompression operations.

In GET transfers, this parameter is used by the source agents.

compress-level

An integer in the 1-9 range, signaling 1=best speed, to 9=best compression. Defaults to 1.

In GET transfers, this parameter is used by the source agents.

write-mode

Valid settings are simple and tmp. The simple mode just opens the file for writing into its corresponding file name and writes as the data is being transfered. The tmp mode opens a temporary file (in the same destination directory) and only when all the data is transfered, renames the temporary to the final file name. Defaults to simple.

op-timeout-seconds

The timeout for connection setup and reply arrival. Defaults to 15 seconds. In very congested networks this could be increased.

When a transfer will be initiated by an agent, the src/dst host and port settings must not be set. Also, the src/dst connection check setting must be set to none. For example, in an A2A file transfer where the files are to be extracted from an agent which will make the connection to the hub, and later sent to an agent which awaits for the hub connection, then the following configuration is in order:

# extract from "client" agent, push to "server" agent
[hub]
hub-port = 5122

[hub-a2a:the-transfer]
src-connect-check=none
queue-dir=/home/aft/queue4
dst-host=fx8320
dst-port=55123
# try to send from the hour 9AM to 5PM only working weekdays
dst-connect-check=cron:0 0 9-17 * * MON-FRI
rename-destination

Valid for [hub-get]. Same behavior as for the [agent-dst] section (see above.)

read-buffer-size

The source file read buffer size in bytes. Defaults to 131072 (i.e. 128 kilobytes.) Calibrating this parameter may improve the throughput for some network infrastructures (testing is in order.)

In GET transfers, this parameter is used by the source agents.

TLS Configuration

By default, AFT uses a built-in self-signed certificate for the file transfers, which provides data encryption but does not prevent unauthorized parties (also having AFT) to interact with the participating nodes.

AFT allows the 'mutual authentication' of the interconnections relying on digital certificates which may be created by external tools or the built-in wizart aft-cert provided in the AFT distribution.

The wizard allows the creation of a self-signed root certificate authority (CA) in order to issue the per-node certificates. The root certificate file must be transfered to the deployed nodes and be referenced with the tls-root-cert setting in the [tls] section.

Each node must have a TLS certificate and its corresponding private key files, referenced by the tls-node-cert and tls-node-pk settings of the same section. A typical configuration looks like:

[tls]
tls-root-cert=etc/root-ca.crt
tls-node-cert=etc/NODE1.crt
tls-node-pk=etc/NODE1.key

Note that the private key must not be encrypted (else a password would be necessary in the configuration file which defeats the original purpose.) The private key file must be protected with the operating system permissions.

tls-enabled-protocols

A comma separated list of TLS protocols which will be enabled. This may be used to force a protocol level range. For example:

[tls]
tls-enabled-protocols=TLSv1.1,TLSv1.2
tls-enabled-cipher-suites

A comma separated list of TLS "cipher suites" which will be enabled. This may be used to force some encryption algorithms and parameters. Usually, this parameter must be configured in both peers. For example:

[tls]
tls-enabled-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

Mutual authentication

The parties may be configured to authenticate the peer node of the interconnection (inbound or outbound) by specifying the peer’s certificate common name (CN.) The tls-peer setting is used for that purpose.

Note that the peer authentication is optional, but very recommended.

For nodes working in TCP-client mode, the peer authentication must be configured by transfer (agent or hub.) For example:

# the peer (agent) node must present a valid certificate with CN=FX8320
[hub-get:t2]
src-host=fx8320.ana.com.uy
src-port=41232
dst-dir=/tmp/hub-destination
files=ereg:file[234]\.txt
tls-peer=FX8320

For nodes operating in TCP-server mode, the peer authentication must be configured besides the server port setting. For example:

# the peer (agent) node must present a valid certificate with CN=NODE-777
[hub]
hub-port = 6001
tls-peer=NODE-777

That is, the peers can’t be configured per-transfer. The reason is that the TLS handshake and authentication happens before the identification of the actual file transfer.

Disabling TLS

TLS may be disabled node-wide by setting the tls-disabled parameter to true (defaults to false.)

# disable TLS in the node
[tls]
tls-disabled=true