UHASH

Section: UHASH (1)
Updated: 13 July 2006

 

NAME

uhash - uber-hash

 

SYNOPSIS

uhash [options] [file...]

 

DESCRIPTION

uhash may be used to calculate multiple hashes (a.k.a checksums, message digests) on the given files "simultaneously", that is: each of the relevant hash functions is applied on data, one after another, without reading the same data more than once. It makes it easy to hash pipes and similar mechanisms that pass data in real time. Different hash algorithms may further protect the integrity of data as it would be obviously harder to find meaningful collisions on each of the algorithms used. To understand the security implications take a look at SECURITY CONSIDERATIONS below.

Uhash has been designed to be both very fast and secure, using mmap() to minimize I/O when possible. It was designed to be secure from the start by default, with every temporary buffers being wiped after use. If security isn't such a big deal, you can make the program run even more faster with the -f option. Note that other sensitive resources such as keys are always wiped and handled with care, regardless of whether the -f option was specified or not.

Currently supported algorithms are:

+ Hash / Size  128   160   192   224   256   384   512
  MD5         . X . . . . . . . . . . . . . . . . . . .
  MD4         . X . . . . . . . . . . . . . . . . . . .
  RIPEMD-160  . . . . X . . . . . . . . . . . . . . . .
  SHA-1       . . . . X . . . . . . . . . . . . . . . .
  SHA-224     . . . . . . . . . . X . . . . . . . . . .
  SHA-256     . . . . . . . . . . . . . X . . . . . . .
  SHA-384     . . . . . . . . . . . . . . . . X . . . .
  SHA-512     . . . . . . . . . . . . . . . . . . . X .
  TIGER       . . . . . . . X . . . . . . . . . . . . .
  GOST        . . . . . . . . . . . . . X . . . . . . .
  WHIRLPOOL   . . . . . . . . . . . . . . . . . . . X .

HMAC (RFC-2104) is supported on them all.

 

OPTIONS

-a algorithms
Specify the algorithms to use (see ALGORITHM STRING below).
-B directory
Hash the files existing in the given base directory.
-F format
Use the given output format string (see FORMAT STRING below).
-H
Apply HMAC as per RFC-2104. You must specify a key (see the -k and -X options below).
-X
The key is an ASCII hexadecimal string that must be converted to binary.
-k key
Use the given key for HMAC. This option is obviously insecure. It's better to let uhash ask for it interactively.
-c file
Check digests against the list in the given file. It must have been either an output of this program, GNU's md5sum(1), or sha1sum(1), OpenSSL's dgst(1) or BSD's md5(1), rmd160(1), sha1(1), sha256(1). Use the -n option to report only the files that didn't match and the -v option at least once to report both the correct hash and the one expected.
-e file
Append error messages to the given file. You may specify "" to prevent them from being printed at all (ie, very much like shell redirection to /dev/null).
-E file
Write error messages to the given file (overwrite). See the -e option above.
-i file
Hash the files specified in the given file.
-o file
Append output to the given file.
-O file
Write output to the given file (overwrite).
-f
Fast, insecure: temporary buffers won't be zeroed.
-h
Print comprehensible help.
-L
Used with the -R or -r options, process symbolic links to regular files as well.
-m
Do NOT use mmap(). There are cases where mmap(3) is slower than read(2). Uhash tries to detect whether a file can be mmapped or not (ie, pipes).
-M
Use mmap(). By default, only files shorter than 2 times the page size are mmap()ed. See getpagesize().
-N
Report files that don't exist (either directly or referenced by symbolic links). Used with the -c, -i, -R and -r options.
-n
Report only those files that didn't match their stored digests. Used with the -c option.
-p
Echo stdin to stdout and append the checksum.
-q
Quiet: only the checksum is printed out.
-r
Recursive (don't descend into other filesystems). Only regular files are processed, and no symbolic links are followed unless they are specified in the command line. Use the -L option if you want to process symbolic links to regular files as well.
-R
Deep recursion (descend into other filesystems). Only regular files are processed, and no symbolic links are followed unless they are specified in the command line. This is the best approach when you want to scan your root directory. Use the -L option if you want to process symbolic links to regular files as well.
-s string
Print a checksum of the given string.
-x hexstring
Print a checksum of the given hexadecimal string.
-v
Verbose operation. This option may be specified multiple times.
-z
Print file size. Useful with FreeBSD's /usr/ports to store the size of any files along with the computed checksums. In the future, some more info from the stat structure may be printed as well, but we're not tripwire.
-T
Perform a time test trial.
-V
Show version of this program as well as the name and version of the OS, date and time of compilation, machine information (model, endianness and word-size), algorithms supported and version of the OpenSSL library if any. This flag, if specified more than one time, gives additional information on the site's OpenSSL configuration such as time of compilation, compiler flags, etc.

 

ALGORITHM STRING

[ , ][ - | + ] keyword [ , [ - | + ] keyword ] ... ]

keywords: [ hmac_ | hmac- ] keyword

keywords: all, default, none, ssl, algorithm, bitsize

algorithms: (use the -V option to get an accurate list)
sha512, sha384, sha256, sha224, sha1, rmd160, tiger, md5, md4, gost, whirlpool

bitsize: decimal_number [ - | + | # | = ]

shortcuts: ll, hmac

ll: makes it easy to use "-all" instead of "-a all"

hmac: equal to "hmac-all"

Description: Keywords are separated by commas. The algorithm string is read from left to right. The goal is to create a set of algorithms that is initially void. A leading '-' (minus) is used to substract elements from this set, and a leading optional '+' (plus) is used to add elements to it. The special keywords all and none are pretty much self-explanatory. If built with OpenSSL support, the ssl keyword may be used to reference algorithms linked to it. The special keyword default is the set of algorithms that are used when no algorithm string was specified (either through the -a option or the UHASH_HASHES environment variable). A bitsize may be specified as well. It compares against the size of output bits for each supported algorithm. By default, it tests for equality, but a trailing modifier lets you specify another test. A trailing '+' character matches any algorithms with an output size greater than the given bitsize. In a similar fashion, a trailing '-' character matches those algorithms with an output size shorter than the specified bitsize. Finally, the '#' (pound) character tests for inequality and, for the sake of completeness, an optional '=' (equal) character may be used too. This format is flexible enough to achieve a "greater than or equal" comparation, for example, by using either "160,160+" or "159+" to refer to algorithms with an output size greater or equal to 160 bits.

Note: The algorithm string is case-insensitive, and it must not contain white spaces. Algorithms may be abbreviated so you may use "whirl" or "w" instead of the much larger "whirlpool". An abbreviation may be multiple, that is: "sha" may include all the SHA algorithms that were compiled. See EXAMPLES below.

 

FORMAT STRING

The format string is very much like printf(), but simple (no fancy modifiers and precision digits, yet). The default format string is the one used by BSD's md5(1), sha1(1), etc: "%H (%f) = %x\n". For example:

MD5 (/etc/fstab) = da98e437f145e6db5f2ca36aedba0f17

Currently supported modifiers are:

%H
Name of the algorithm in uppercase (ie, "MD5", "HMAC-SHA1", etc.)
%h
Name of the algorithm in lowercase ( i.e. "md5", "sha512", "hmac-rmd160", etc.
%f
Pathname of the file being hashed. It may be either relative or absolute.
%F
The filename portion of the file being hashed. See basename(3).
%s
The string being hashed. Note: %f and %F may be used too, but they are treated as %s if we aren't handling a file (no basename(3) when the string contain slashes).
%X
The message digest, ASCII hexadecimal representation in uppercase.
%x
Same as above, but in lowercase.
%%
A literal '%'.

The usual C backslash codes are recognized as well, any other backslash escapes and % modifiers are translated literally.

\a \b \f \n \r \t \v \' \" \\ \? \xXX (hexadecimal) and \OOO (octal)."

 

ENVIRONMENT VARIABLES

UHASH_HASHES
Algorithm string to use in case none is specified in the command line with the -a option. See ALGORITHM STRING above.
UHASH_FORMAT
Format string to use in case none is specified in the command line with the -F option. See FORMAT STRING above.
UHASH_VERBOSE
Non-negative integer specifying desired verbosity. See the -v option above.

 

EXAMPLES

1. To quickly check your tarball ports in FreeBSD:

find /usr/ports -name distinfo | xargs uhash -B/usr/ports/distfiles/ -c

2. Hash a file with all supported algorithms, minus SHA512 and MD5:

uhash -a all,-sha512,-md5 /etc/fstab

3. Hash a very large file with available SHA algorithms & RIPEMD-160:

uhash -vvv -a sha,rmd160 ~/downloads/cdimage.iso

4. Check hashes and report only those that didn't match:

uhash -vvv -nc ~/hashes.txt

 

KNOWN ISSUES

1. The -R and -r options only process regular files (and symbolic links pointing to regular files, with the -L option). Otherwise no symbolic links are followed unless they are specified in the command line. The -R option performs deep recursion across filesystems, so there's a good reason to not follow symlinks when you scan the root filesystem, for example. Use find(1) when you want fine-grained control on the files to be processed, preferably to generate a list of files that you may feed to uhash via the -i option. There's the more slower approach of using find(1)'s -exec option but it involves a lot of fork() overhead, and there's xargs(1) too, its functionality bounded by the maximum number of bytes that may be passed through the exec*() calls, usually ARG_MAX in <limits.h>

2. Special files such as block and char devices, pipes, Unix domain sockets, etc., are processed only when they are specified in the command line. Processing them while recursively scanning a directory would halt

 

SECURITY CONSIDERATIONS

1. Never use hash functions to "protect" sensitive data as they are used only to verify integrity. You must be aware that if you use multiple hash functions to "secure" data in this way then the weaker (less insecure and/or more faster) algorithm could be tried to find probable collisions, i.e.: a plaintext attack. Use encryption instead and then hash the ciphertext. Do not just blindly encrypt and hash (or HMAC) the same plaintext data, as the above consideration applies.

2. In addition to integrity checking and confidentiality (features of hash functions and encryption algorithms respectively), HMAC may provide authentication (see RFC-2104 for details). Search the Internet for information on the "encrypt then authenticate", "encrypt and authenticate" and "authenticate then encrypt" schemes, and why the first (EtA) is the more secure.

 

SEE ALSO

dgst(1), md5sum(1), sha1sum(1)

 

AUTHORS

uhash was written by Ighighi: ighighi [at] gmail.com. Bugs reports, patches and suggestions are welcome.


This document was created by man2html, using the manual pages.
Time: 11:39:49 GMT, July 13, 2006