SYNOPSIS
parallel processes
ssh_parallel [-h] [-d TIME] [-o OPTION] [-s TIME] [-n NICE] [-v] direc-
tory machine1 processes1 [machine2 processes2 [...]]
DESCRIPTION
parallel and ssh_parallel run lines passed in on the standard input in
a BASH(1) shell, in parallel. parallel runs the commands on the local
machine, and ssh_parallel runs the commands using SSH on any number of
machines. Passwordless SSH must be set up for the ssh_parallel to oper-
ate. The program terminates when the final child process terminates.
PARALLEL
Command line arguments:
processes
number of processes to run in parallel
In the event of fork(2) failing, parallel will not loose the line: it
will keep retrying.
SSH_PARALLEL
Command line arguments:
-h print a help message
-d TIME
Check mark a machine as dead if it is unresponsive for TIME sec-
onds. This will cause the job to be rerun. The test is per-
formed using the ssh(1) options of "ServerInterval 10" and
"ServerAliveCountMax (5+TIME)/10", so the granularity can not be
finer than 10 seconds. There is no default: if this option is
not specified, then these options will not be passed to SSH,
ssh's own defaults will be used.
-o OPTION
pass OPTION to ssh(1)
-s TIME
set sleep timeout to TIME seconds
-n NICE
run processes with priority NICE
-v verbose operation
machine2 processes2 ...
Additional machine to run processes on.
There are several retryable errors which can occur. If ssh(1), process
spawning, or the change to the working directory on the remote machine
fails, then the line will be saved and re-executed. These conditions
are indicated by the return codes 255, 254 and 253 respectively. Since
ssh(1) returns the return code of the remote executing script, if the
remote script returns any of these error codes, it will be rerun, and
the corresponding error reported.
Note that some versions of ssh(1) do not follow the documentation and
return 1, not 255 on certain errors (connection refused, no route to
host). If ssh_parallel is built on a machine with broken ssh(1) then a
return code of 1 is also used to indicate failure.
A failure of ssh(1) or cd can indicate an intermittent network problem.
As a result, the machine which failed is put to sleep for TIME seconds
(default 10). Each available process on a machine counts as a different
machine.
ssh(1) is executed as follows:
ssh -n -o "PasswordAuthenticaton no" user@host commandline
This causes ssh to fail instead of promoting for the passsword. Since
this can be due to intermittent network problems, it is a retryable
error. This also causes ssh to disconnect from stdin, so remote pro-
cesses can not read in read from stdin either.
EXAMPLES
Convert all PNG files to JPEG files using 4 simultaneous processes:
ls | grep '\.png$' | sed -e 's/\(.*\)png/convert & \1jpg/' |
parallel 4
Perform the same operation on bigserver and hugeserver, using more pro-
cesses:
ls | grep '\.png$' | sed -e 's/\(.*\)png/convert & \1jpg/' |
ssh_parallel $PWD bigserver 8 hugeserver 16
AUTHOR
Edward Rosten
Man(1) output converted with
man2html