, 1 min read
Line Length Distribution in Files
Original post is here eklausmeier.goip.de/blog/2022/11-12-line-length-distribution-in-files.
When processing input files I have to check whether those input files have a common record format. For this I therefore have to compute the line length of each record in the input file.
1. Perl solution. The below program reads the input file and shows a histogram of each line length with its according frequency.
#!/bin/perl -W
# Histogram of line length's
use strict;
my %H;
while (<>) {
$H{length($_)} += 1;
}
for (sort {$a <=> $b} keys %H) {
printf("%5d\t%d\n",$_,$H{$_});
}
2. Perl one-liner. Many times a simple Perl program can be converted into a Perl one-liner. See for example Introduction to Perl one-liners, written by Peteris Krumnis. Also see Useful One-Line Scripts for Perl.
perl -ne '$H{length($_)} += 1; END { printf("%5d\t%d\n",$_,$H{$_}) for (sort {$a <=> $b} keys %H); }' <yourFile>
Example usage:
printf "\n\na\n\ab\nabc\n" | perl -ne '$H{length($_)} += 1; END { printf("%5d\t%d\n",$_,$H{$_}) for (sort {$a <=> $b} keys %H); }'
gives
1 2
2 1
3 1
4 1
3. Awk solution. If Perl is not available, then hopefully Awk is installed. Below Awk program accomplishes pretty much the same.
#!/bin/awk -f
function max(a,b) {
return a>b ? a : b
}
{ m = max(length($0),m); x[length($0)] += 1 }
END {
for (i=0; i<=m; ++i)
print i, x[i]
}