この資料は Rust 製のコマンドラインツール hck について紹介するものです。
Rustの環境構築やツールのインストール方法については以下を参照してください
hck は cut コマンドのほぼ完全な代替ツールです。cut と同じ列選択構文を使って出力列の順序を指定できるだけでなく、固定文字列の代わりに正規表現による区切り記号を使用することができます。
GitHub - sstadick/hck: A sharp cut(1) clone.
hck はそれ単体では awk や cut、xsv などのツールに勝るような機能はありません。hck が優れているのは、出力フィールドの順序を変えたり、奇妙な区切り文字でレコードを分割したりといった、一般的なことを簡単にできるところです。
フィールドを抽出する例 (cut でうまく処理できない例)
% ps aux | hck -f1-3,5 | head -n4
USER PID %CPU VSZ
_windowserver 168 23.8 37042900
goichiiisaka 564 13.1 35055280
goichiiisaka 580 12.1 34518972
フィールドの順序を変更する例
% ps aux | hck -f2,1,3,5 | head -n4
PID USER %CPU VSZ
168 _windowserver 21.4 37042900
564 goichiiisaka 12.6 35059024
66184 goichiiisaka 7.8 68339520
フィールドを削除する例
% ps aux | hck -e3,5- | head -n4
USER PID %MEM
_windowserver 168 0.5
goichiiisaka 66184 2.7
goichiiisaka 564 1.6
hck のヘルプメッセージ
% hck --help
* `delimiter` is a regex by default and a fixed substring with `-L` * `header-fields` allows for specifying a literal or a regex to match header names to select columns * both `header-fields` and `fields` order dictate the order of the output columns * input files (not stdin) are automatically compressed * the output delimiter can specified with `-D`
## Selection by headers
Instead of specifying fields to output by index ranges (i.e `1-2,4-`), you can specify a regex or string literal to select a headered column to output with the `-F` option. By default `-F` options are treated as string literals. To treat them as regexs add the `-r` flag.
## Ordering of outputs
*Values are written only once*. So for a `fields` value of `4-,1,5-8`, which translates to "print columns 4 through the end and then the last column and then columns 5 through 8", columns 5-8 won't be printed again because they were already consumed by the `4-` range.
If `field-headers` is used as a regex then the headers will be be grouped together in groups that all matched the same regex, and in the order of the regex as specified on the CLI.
Usage: hck [OPTIONS] [INPUT]...
Arguments:
[INPUT]...
Input files to parse, defaults to stdin.
If a file has a recognizable file extension indicating that it is compressed, and a local binary to perform decompression is found, decompression will occur automagically. This requires with `-z`.
Options:
-o, --output <OUTPUT>
Output file to write to, defaults to stdout
-d, --delimiter <DELIMITER>
Delimiter to use on input files, this is a substring literal by default. To treat it as a literal add the `-L` flag
[default: \\s+]
-L, --delim-is-literal
Treat the delimiter as a string literal. This can significantly improve performance, especially for single byte delimiters
-I, --use-input-delim
Use the input delimiter as the output delimiter if the input is literal and no other output delimiter has been set
-D, --output-delimiter <OUTPUT_DELIMITER>
Delimiter string to use on outputs
[default: "\\t"]
-f, --fields <FIELDS>
Fields to keep in the output, ex: 1,2-,-5,2-5. Fields are 1-based and inclusive
-e, --exclude <EXCLUDE>
Fields to exclude from the output, ex: 3,9-11,15-. Exclude fields are 1 based and inclusive. Exclude fields take precedence over `fields`
-E, --exclude-header <EXCLUDE_HEADER>
Headers to exclude from the output, ex: '^badfield.*$`. This is a string literal by default. Add the `-r` flag to treat as a regex
-F, --header-field <HEADER_FIELD>
A string literal or regex to select headers, ex: '^is_.*$`. This is a string literal by default. add the `-r` flag to treat it as a regex
-r, --header-is-regex
Treat the header_fields as regexs instead of string literals
-z, --try-decompress
Try to find the correct decompression method based on the file extensions
-Z, --try-compress
Try to gzip compress the output
-t, --compression-threads <COMPRESSION_THREADS>
Threads to use for compression, 0 will result in `hck` staying single threaded
[default: 4]
-l, --compression-level <COMPRESSION_LEVEL>
Compression level
[default: 6]
--no-mmap
Disallow the possibility of using mmap
--crlf
Support CRLF newlines
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version