Release of dump package
Jun. 18th, 2009 | 01:49 pm
It supports versions two through four of the Linux kernel's extended file system (ext2, ext3, ext4). Technically, it is considered beta software, but I have reason to believe there are a lot of people who use it in production systems. You can also use it for disk-based backups. You don't need to have a tape jukebox to use it, although it is designed for use with tape backups.
Link | Leave a comment | Add to Memories | Tell a Friend
Shell hack: Files with some DOS lines
May. 19th, 2009 | 11:22 am
I came across a project whose source code contains both DOS text files and Unix text files. Some of the Unix files contain carriage return line endings. Though, perhaps they were DOS files with Unix end lines! I wanted to suggest converting those files with mixed line endings to Unix.
Sometimes, the file command is helpful for showing what files have a mixed end of line style, but not always. For example, the file command will say "ASCII C program text, with CRLF, LF line terminators". That's perfect. However, sometimes the command just says, "PHP script text".
I wrote this find expression that would get files that contain DOS carriage returns, but not entirely DOS files.
$ find -type f -execdir grep -qe '^V^M$' {} \; \
! -execdir awk 'BEGIN{is_dos=1;}!/\r$/{is_dos=0}END{exit(!is_dos);}' {} \; \
-print
The above doesn't work, since many DOS files don't end in a newline (and without a carriage return) as they do for Unix text files.
Awk obviously considers the last line as a line, but since there's no carriage return the file is not considered a DOS file based on the logic I've written. This results in a false negative.
This change to the Awk script makes this hack work as it should.
$ find -type f -execdir grep -qe '^V^M$' {} \; \
! -execdir awk 'BEGIN{is_dos=1;}
!/\r$/ && is_dos{is_dos=0;n=NR}
END{exit(!is_dos && n != NR);}' {} \; \
-print
Link | Leave a comment {1} | Add to Memories | Tell a Friend
Shell hack: Avoiding built-ins
May. 1st, 2009 | 04:40 am
To avoid using a builtin command of a Bourne or Bash shell in a shell script, one can use the full path of the executable command. For example, rather than
$ echo Hello, World\! Hello, World!
you could
$ /bin/echo Hello, World\! Hello, World!
Here's a way to show the difference--and make fun of the GNU coding standards at the same time.
$ echo --version --version
$ /bin/echo --version echo (GNU coreutils) 6.12 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3 : GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Brian Fox and Chet Ramey.
I prefer to use exec than using the full path for a command so that the PATH environment variable is used, and avoid the day should the full path to a binary change some day.
Unfortunately, a consequence of exec is that it runs the command in the current process and therefore will exit on completion, thus cutting short the life of your shell script. To avoid that, just wrap an exec statement in a sub-shell by using parens:
$ ( exec echo --version ) echo (GNU coreutils) 6.12 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3 : GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Brian Fox and Chet Ramey.
I have never seen this written in a script before. Perhaps, there's another way--that's a bit more canonical--to do this. This construct is entirely redundant and contradictory--"exec something in the current shell, but also in a sub-shell". Further, it's probably pretty much always the case to opt for the shell built-in. There are zero to no cases where you want to avoid the built-in. My only scenarios are timing processes in the shell.
According to the Limitations of Shell Builtins section of the GNU Autoconf manual,
When it is desired to avoid a regular shell built-in, the workaround is to use some other forwarding command, such as env or nice, that will ensure a path search:
$ pdksh -c 'exec true --version' | head -n1 $ pdksh -c 'nice true --version' | head -n1 true (GNU coreutils) 6.10 $ pdksh -c 'env true --version' | head -n1 true (GNU coreutils) 6.10
That manual has everything it it. I guess I'll go with env, doesn't sound as nice as "exec", but it's a good mnemonic since it use the environment's path variable to run the command.
$ env echo --version echo (GNU coreutils) 6.12 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Brian Fox and Chet Ramey.
Link | Leave a comment {6} | Add to Memories | Tell a Friend
Shell hack: Date work
Apr. 26th, 2009 | 01:21 am
Needed to make some Apache redirects for some links on a unix user's group Web site I maintain. The new site is based in a Wiki, and a member of the group moved all the pages with meeting announcements by hand using more readable page names. The old pages had the data as a four-digit year, two-digit month followed by the two-digit day (for example, 20061219). The new pages have the spelled out version of the week day and month (for example, Tuesday, December 19, 2006).
Here's a sample of what I needed for the .htaccess file.
Redirect /group/meeting-20061219.html http://host.org/group/wiki/index.php/Tuesday,_December_19,_2006 Redirect /group/meeting-20070417.html http://host.org/group/wiki/index.php/Tuesday,_April_17,_2007 Redirect /group/meeting-20070515.html http://host.org/group/wiki/index.php/Tuesday,_May_15,_2007 Redirect /group/meeting-20070717.html http://host.org/group/wiki/index.php/Tuesday,_July_17,_2007 Redirect /group/meeting-20071128.html http://host.org/group/wiki/index.php/Wednesday,_November_28,_2007 Redirect /group/meeting-20080618.html http://host.org/group/wiki/index.php/Wednesday,_June_18,_2008
I could do this by-hand, but I'd rather get a shell script to do it right, the first time. I found it easy to do with an extended Grep expression, awk and the date command that comes with GNU coreutils.
$ ls -1 \
| grep -Ee '[0-9]{8}.html$' \
| perl -pe 's/([0-9]{4})([0-9]{2})([0-9]{2}).html$/$&\t\1-\2-\3/' \
| awk '{printf $1 "\t";
system("date +\"%A,_%B_%e,_%Y\" -d " $2);}' \
| awk '{print "Redirect", "/group/" $1,
"http://host.org/group/wiki/index.php/" $2;}'
I'm thankful I consistently used a file naming convention with the old site.
Link | Leave a comment {8} | Add to Memories | Tell a Friend
Shell hack: Min function
Mar. 27th, 2009 | 03:06 pm
I couldn't find a minimum function for my shell scripting, nor a utility on GNU/Linux, so I am using this function.
### # min NUM ... # # Find smallest value of NUMs. ## function min() { echo "$@" | tr '[[:space:]]' '\n' \ | grep -Ee '^-?[[:digit:],]+(.[[:digit:]]*)?$' \ | sort -n | sed 1q } ## end minIt supports floating point numbers with decimal notation, but does not support exponential notation or other.
Example:
min 1 0.2 1,023.56 -0 -1.3
Gives: -1.3
A max function is the same thing but needing to change either
- sort -nr
- sed '$p;d'
- sed -n '$p'
Link | Leave a comment {7} | Add to Memories | Tell a Friend
Load all meta data of files into PostgreSQL
Dec. 23rd, 2008 | 03:56 pm
In the previous installment, I quickly showed how to use GNU Findutils to load file system meta information into a PostgreSQL database. I did so using a comma separated value (CSV) file generated from a tab delimited file. The us of tabs limited the data set to files without newlines or tabs in their names. Here I will show how to load any file name.
The find command can output all possible files by separating the fields in the output with nulls, and each line by double nulls.
Here's the new -printf statement that outputs nulls between records, and two nulls between each line.
$ find / -printf '%i\0%f\0%p\0%h\0%y\0%u\0%U\0%g\0%G\0%M\0%m\0%s\0%b\0%k\0%l\0%n\0%A
This Perl scriptlet will convert the output to CSV.
$ perl -mText::CSV_XS -e 'my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); my $n = 21; my @c = (); local $/ = "\0\0";' -ne '$_ .= "\n"; push(@c, split(/\0/)); pop(@c); if ($#c + 1 < $n) {next;} elsif ($#c + 1 > $n) {pop; if ($csv->combine(@c[0 .. $n - 1])) {print $csv->string;} else {printf STDERR $csv->error_input;} @c = @c[$n .. $#_];}' -e 'if (@c > 0) {printf STDERR ("Extra fields at the end\n");}' finddb.txt > finddb.csv
Coincidentally, double-nulls don't help delimit records in the output, since they could be confused as an empty field. So the script above keeps an internal tally of fields in a record, and is hard-coded as 21. Thus, when enough null-delimited fields are read, then the next record is read.
With the resulting CSV file, loading into PostgreSQL is as easy as the following command.
$ psql -c '\copy finddb from STDIN CSV FORCE NOT NULL path, symlink' < finddb.csv
I put the CSV file generation bits together in a Perl script, find-csv.pl. It tries to maintain the consistency between GNU findutils -printf formatting fields, the names of database columns, and the Perl code for generating CSV files.
In a follow-up, I will give more tastings on possible database queries that can be made of this file system information.
Thanks to James Youngman for reading a previous version of this article. This post is the result of just one of his generous comments.
Link | Leave a comment | Add to Memories | Tell a Friend
Loading file system information into PostgreSQL
Dec. 18th, 2008 | 11:03 am
GNU findutils is extremely useful. Unfortunately, the find command doesn't always scale well. For instance, running find on the entire machine can take a long time. And should you learn something that requires modifying your find expression, you have to start the command all over again from the beginning.
The sister program, locate, command helps by being much faster. However, it is missing the expressiveness of find. The locate command should support all the features of find -- save for maybe the -exec expression for security reasons.
Even still, the query syntax for find commands are also not scalable or very user-friendly.
I always wanted to import file system meta information into a database, and use SQL queries to find information about the file system. SQL has its own set of problems, but it would make asking questions about the files on a computer much more worthwhile and maybe even a bit exciting. Very interesting queries could be made, and they would be answered very quickly -- without having to wait.
So, I finally gave it a try. I describe here how I was able to load the results of find into PostgreSQL
The -printf expression of findutils was great for this task. It can generate the information about the file system. And it can have its output formatted to a tab-delimited file.
The following use of the find command makes a text file with entries for every file in your user directory.
$ find ~ -printf '%i\t%f\t%p\t%h\t%y\t%u\t%U\t%g\t%G\t%M\t%m\t%s\t%b\t%k\t%l\t%n\t%A
It's a pretty hideous command. These are all the fields it produces in the output:
- inode
- name
- wholename
- path
- type (file, link, directory, device, ...)
- user (symbolic)
- user_id (number)
- group
- group_id
- perm (e.g. "-rw-rw-r--")
- perm_octal (e.g. "0664")
- bytes
- blocks (512-byte blocks used of disk)
- kblocks (1k-blocks)
- symlink
- links (number of hard links)
- atime (last access time)
- mtime (last modification)
- ctime (last time modified or status changed)
- fstype (e.g. "ext3")
- dev_id (device number)
These represent the header fields in the output. Later, they will represent the table columns in the database.
Unfortunately, the output of find is one-file-per-line and tab-delimited. That means files with newline or tab characters in their names won't cooperate. A suboptimal solution is to just ignore those files on the system. That's easy to do with the find command.
$ find / ! -regex ".*[$(echo -ne '\n\t')].*" -printf '%i\t%f\t%p\t%h\t%y\t%u\t%U\t%g\t%G\t%M\t%m\t%s\t%b\t%k\t%l\t%n\t%A
Better yet, complain to the user on standard error (STDERR) every time the find command comes across one of these rare files.
$ find / \( -regex ".*[$(echo -ne '\n\t')].*" -exec sh -c 'echo >&2 "$0": File name has tab or newline' '{}' \; \) -o -printf '%i\t%f\t%p\t%h\t%y\t%u\t%U\t%g\t%G\t%M\t%m\t%s\t%b\t%k\t%l\t%n\t%A
If you want to know if your system has these wickedly named files, run the following locate command.
$ locate -r ".*[$(echo -ne '\n\t')].*"
To convert the file to a comma-separated value (CSV) file, I like to use Perl.
$ perl -mText::CSV_XS -e 'my $csv = Text::CSV_XS->new({ binary => 1, eol => $/ });' -ne 'chomp; split(/\t/); if ($csv->combine(@_)) {print $csv->string;} else {printf STDERR $csv->error_input;}' finddb.txt > finddb.csv
This is a table definition for PostgreSQL that can be loaded with the CSV or tab-delimited text file.
CREATE TABLE finddb (
inode bigint NOT NULL,
name text DEFAULT '' NOT NULL,
wholename text DEFAULT '' NOT NULL,
PRIMARY KEY (inode, wholename),
path text DEFAULT '' NOT NULL,
type character varying(1) NOT NULL,
"user" text DEFAULT '' NOT NULL,
user_id integer NOT NULL,
"group" text DEFAULT '' NOT NULL,
group_id integer NOT NULL,
perm character varying(10) DEFAULT '' NOT NULL,
perm_octal character varying(6) DEFAULT '' NOT NULL,
bytes bigint DEFAULT 0 NOT NULL,
blocks bigint DEFAULT 0 NOT NULL,
kblocks bigint DEFAULT 0 NOT NULL,
symlink text DEFAULT '' NOT NULL,
links integer DEFAULT 0 NOT NULL,
atime timestamp without time zone
DEFAULT '1970-01-01 00:00:00' NOT NULL,
mtime timestamp without time zone
DEFAULT '1970-01-01 00:00:00' NOT NULL,
ctime timestamp without time zone
DEFAULT '1970-01-01 00:00:00' NOT NULL,
fstype text DEFAULT '' NOT NULL,
dev_id integer NOT NULL
);
Loading the text file into PostgreSQL is as easy as:
$ psql -c '\copy finddb from STDIN' < finddb.txt
for the CSV file:
$ psql -c '\copy finddb from STDIN CSV FORCE NOT NULL path, symlink' < finddb.csv
I did come across a few names on a file system that -- I believe -- Postgres would complain about, because of improperly encoded characters. Postgres on my system expects everything to be UTF-8 encoded. According to James Youngman, the maintainer of GNU findutils,
Character encoding is of course a significant problem. The Unix file system API offers no way to record the character encoding in effect at the time the file is created/renamed, so files on a file system will often have differing encodings.
After the load is completed, here's an example query and a returned row.
=> SELECT * FROM finddb WHERE wholename = '/home/aaronh/.emacs'; -[ RECORD 1 ]-------------------- inode | 18842194 name | .emacs wholename | /home/aaronh/.emacs path | /home/aaronh type | f user | aaronh user_id | 500 group | aaronh group_id | 500 perm | -rw-rw-r-- perm_octal | 664 bytes | 2884 blocks | 8 kblocks | 4 symlink | links | 1 atime | 2008-11-04 12:26:46 mtime | 2008-10-24 13:10:24 ctime | 2008-10-24 13:10:24 fstype | ext3
The following are some more examples of queries on this database table.
This query finds the 5 largest graphic files that were last modified in 2007, but ignores the auxiliary files of many a version control system.
SELECT wholename, bytes, mtime
FROM finddb
WHERE "type" = 'f' AND "name" ~ '.jpe?g'
AND path not like '%/.svn/%'
AND path not like '%/.git/%'
AND path not like '%/.hg/%'
AND path not like '%/.bzr/%'
AND path not like '%/{arch}/%'
AND path not like E'%/\\_darcs/%'
AND mtime >= TIMESTAMP '2007-01-01 00:00:00'
AND mtime <= TIMESTAMP '2007-12-31 23:59:59'
ORDER BY bytes DESC
LIMIT 5;
This query shows every user owning a file on the system, with the the total megabytes used, and with the biggest users first in the list.
SELECT "user", SUM(kblocks) / 1000.0 AS "mbytes" FROM finddb GROUP BY "user" ORDER BY SUM(bytes) DESC;
This query tries to mimic the output of the ls -l / command.
SELECT perm, links, "user", "group", bytes, mtime, name
FROM finddb
WHERE path = '' AND name NOT LIKE '.%' ORDER BY name;
perm | links | user | group | bytes | mtime | name
------------ ------- ------ ------- ------- --------------------- ------------
drwxr-xr-x | 3 | root | root | 4096 | 2008-09-22 05:17:44 | backup
drwxr-xr-x | 2 | root | root | 4096 | 2008-10-29 18:30:19 | bin
drwxr-xr-x | 5 | root | root | 1024 | 2008-10-28 12:43:47 | boot
drwxr-xr-x | 2 | root | root | 4096 | 2008-09-10 04:11:52 | cdrom
drwxr-xr-x | 13 | root | root | 4460 | 2008-11-12 21:58:07 | dev
drwxr-xr-x | 117 | root | root | 8192 | 2008-11-12 21:57:53 | etc
drwxr-xr-x | 5 | root | root | 4096 | 2008-11-07 16:20:54 | home
drwxr-xr-x | 16 | root | root | 8192 | 2008-10-29 18:29:59 | lib
drwx------ | 2 | root | root | 16384 | 2008-09-10 03:05:56 | lost found
drwxr-xr-x | 2 | root | root | 4096 | 2008-11-12 21:57:28 | media
drwxr-xr-x | 2 | root | root | 4096 | 2008-04-07 17:44:40 | mnt
drwxr-xr-x | 2 | root | root | 4096 | 2008-04-07 17:44:40 | opt
dr-xr-xr-x | 107 | root | root | 0 | 2008-11-12 21:55:52 | proc
drwxr-x--- | 6 | root | root | 4096 | 2008-11-12 20:02:29 | root
drwxr-xr-x | 2 | root | root | 8192 | 2008-10-29 18:30:18 | sbin
drwxr-xr-x | 7 | root | root | 0 | 2008-11-12 21:55:52 | selinux
drwxr-xr-x | 2 | root | root | 4096 | 2008-04-07 17:44:40 | srv
drwxr-xr-x | 11 | root | root | 0 | 2008-11-12 21:55:52 | sys
drwxrwxrwt | 74 | root | root | 4096 | 2008-11-13 00:56:28 | tmp
drwxr-xr-x | 13 | root | root | 4096 | 2008-09-10 03:15:15 | usr
drwxr-xr-x | 21 | root | root | 4096 | 2008-09-24 06:53:23 | var
Sending queries against the data is loads of fun, but it really needs some improvements to match the strength of findutils matching expressions -- for example, the permissions matching rules and the -empty predicate. Some new tables with alternative perspectives on the data could accommodate better queries.
In a follow-up, I will present on how to handle all possible file names by using a null-delimited file rather than a tab-delimited one. There may be a piece on loading into MySQL. And in the last piece, I'll give more tastings on possible queries can be made of this file system information and provide the script that helps me manage the database loads from find.
Link | Leave a comment | Add to Memories | Tell a Friend
Samsung's YP-S2 flash player
Dec. 15th, 2008 | 12:46 pm
(Fits in the palm of your hand.)
Marketed as "the pebble", it is small and light. Pictures on the Web don't really communicate its compactness until you experience it in-person. I assumed it was the size of a hockey puck, but it fits the middle of your hand. For better photos of the S2, see the review of the S2 at anything but ipod.com. The sound quality is surprisingly good. Samsung is trying to compete with Apple's iPod Shuffle by price-point, but is a better alternative since it has out-of-the-box Ogg Vorbis support. The iPod shuffle is not compared here.
The S2 comes with a pair of earphones, a USB adapter and a tiny installation CD. It requires being charged by USB when you take it out of the box. The manual suggests charging for 2 hours. The subtle and translucent LED light turns from red to green when the battery is fully charged. While waiting for the S2 to charge, you should be transferring your files (see below). The transfer rates are slow for USB (less than 1MB per second), but you can fill the S2 with 25 to 30 albums of music in 15 minutes. (Copying from the drive is at speeds around 4MB per second.)
The earphones have a special adapter on the plug to clasp the pebble tightly. Combined with a nylon neck strap on the headphones, the S2 hangs on your chest quite comfortably. The lanyard provides another nickname for the device, "pendant". There's no clothing clip for the S2, but securing it would only be necessary if you were bouncing during aerobic activity.
The USB adapter for the S2 plugs into the same hole as the headphones. After connecting the USB adapter into a GNU/Linux machine, it will be mounted as USB mass storage device called "S2". There are 3 folders on the S2 -- music, playlist and system. I removed the demo mp3 files in the music folder and the one playlist file. I copied my collection of Ogg files from the command-line with the command cp -a ~/music/* /mnt/S2/Music.
Its not clear if any of the default folders of the S2 are necessary. They don't take up much space. Should removing them disable the player, there's a system reset button available in a pin hole that advertises to re-initialize the system when the S2 "won't play music, or isn't recognized by your computer when you connect it."
Features like upgrading the firmware and programming playlists are available through the proprietary software that only runs on Microsoft Windows. The device also enforces Digital Restrictions Management -- by not playing them. Fortunately, you can avoid these drawbacks by using Ogg Vorbis, and not downloading DRM-encumbered files. You can manage the playlists from the player itself, but the button-pressing to do this is gruesome and playlists are limited to only 30 songs, anyway. Since there isn't a way to navigate albums (folders), a playlist with a song from each album on the player would be useful, however. According to the Portable players page at Xiph there are no techniques to over ride the S2's firmware, yet.
The battery is advertised to last 13 hours, though that's hard to believe. The ability to disable the LED light (see below) would probably not lengthen the battery's charge. Turning down the volume would probably work better. The battery does not appear to be replaceable, either. The battery is likely covered by Samsung's one year warranty, however.
The S2's LED indicator is the only visual display -- if you can call it that. There is a beeping system as well. In addition to the battery status (see above), the light is blue when the player is turned on. The LED is blue during regular playback, and red if playing a playlist. If in shuffle mode, it cycles between all the colors (including green). It signals red when the power is low, too. Should you get sick of the light show, you can disable it by pausing the music, then holding the Smart button. Doing the same again turns it back on.
(The S2's LED is glowing blue)
If you don't have a USB adapter for your car stereo, then you'll need to purchase either an FM transmitter or get a standard 3.5 mm (1/8") audio cable (TRS). Unfortunately, there's no way to charge the device while it plays, but it could be charged in a USB car adapter that plugs into the cigarette lighter.
Besides the S2's proprietary software support, the other major design flaw is the location of the headphone jack. The instinct is to hold the headphone wire at the bottom of one's hand and use the thumb to hit buttons. Instead, the jack is at the top of the device. So when you grab the S2 upside down, adjust the volume slowly. You may be cranking it up when you mean to turn it down. Although not person-oriented, this configuration is natural for when the device is plugged into something else, like a stereo.
Samsung will likely be releasing a 2GB version of the player. This might be handy, but you'll run up against battery capacity before playing the first gigabyte of songs. I'd prefer owning two 1GB for that reason. With two you get more battery, and on a really long car trip you could have one charging while the other is playing.
Samsung should do more to support free software operating systems with the S2. But, this is not surprise since Samsung is a deal-maker and collaborator with Microsoft against GNU/Linux. Regardless of their hostility, the player is a great way to support the Play Ogg! movement.
Link | Leave a comment {4} | Add to Memories | Tell a Friend
Kickstarting a QEMU image with Fedora
Nov. 16th, 2008 | 03:31 pm
Turn on, build everything, shut off.
Link | Leave a comment {2} | Add to Memories | Tell a Friend
rpmbuild -tb tarball
Nov. 14th, 2008 | 11:24 am
$ rpmbuild -bb package.spec
This presumes that all the source files for the RPM are already copied to the SOURCES directory accessible by RPM (
%_topdir/SOURCES).Its possible to build a source RPM (SRPM) from a spec file.
$ rpmbuild -bs package.spec
The benefit of an SRPM, is it contains all the source files necessary to rebuild the RPM.
$ rpmbuild --rebuild package-1.0-1.src.rpm
Using RPM to build an SRPM guarantees those files are included, and will put files in your SOURCES directory for you.
RPM has an additional feature where it can build an RPM from a tarball. Only a spec file needs to exist in the archive for it to work.
$ rpmbuild -tb package-1.0.tar.gz
This isn't a popular feature, nor is it very well documented. It is likely a relic of another time, when RPMs were not maintained by a distribution, but software maintainers would try and have their source packages install using RPM. This feature of RPM is made more and more obsolete with the success of large RPM-based distributions with a large and vibrant posse of packagers.
The rpmbuild in its tarball mode will find a spec file, even if its not in the top-directory of the package. For example, its not uncommon to put a spec file inside pkg/fedora.
The tarball mode of rpmbuild presumes you're in the SOURCES directory (More proof that RPM's tarball mode is probably a legacy feature). So copy the tarball to the SOURCES directory and run rpmbuild on it from there.
Most software packages simply need to be built from their source archive and don't need any additional files. However, it's not uncommon for packages to need to be specially configured by RPM on some systems by including particular files. On this chance, the included RPM spec file will name other source files or patch files besides the tarball (
Source1, Source2, Patch0, Patch1 and so on). These files will need to be copied to the SOURCES directory as well. (Did I mention you need to run rpmbuild -tb in your SOURCES directory?) I presumed there was some way to tell RPM where to find these SOURCE files in the tarball. For example, if you put these extra source files in the same place as the spec file, pkg/fedora then RPM would find them. Unfortunately, RPM's tarball mode doesn't know to copy anything to the SOURCES directory for you. However, it should be easy to modify the spec file to have it copy the source files in pkg/fedora to the SOURCES directory.
Adding the following tar command to the
%prep section of the RPM spec file to copy the source files to the SOURCES directory.tar -C %{name}-%{version}/pkg/fedora -cf - . | tar -C %{_sourcedir} -xf -Alternatively, a single tar command on the actual tarball could extract the files into the SOURCES directory.
tar --strip=3 -C %{_sourcedir} -zxf %{SOURCE0} %{name}-%{version}/pkg/fedora/\*The latter would only use a single execution of the tar command. The former may be more reliable should GNU tar not be available.
With that line inserted, a tar archive with such a SPEC file can bootstrap its own RPM.. The rpmbuild -ta will build both the binary and source RPMS.
Unfortunately, the rpmbuild -ts command will not work in this scenario, until the SOURCE files are present. You can copy the files yourself for it to work. Or run the the
%prep stage of rpmbuild to get the task done.$ rpmbuild -tp
$ rpmbuild -ts
And one other final word of warning, don't make changes to the tarball's source files in the SOURCES directory. Since the source files are extracted every time on each build, any changes to these files will be overwritten, unless you "short-circuit" the rpmbuild. Although short circuiting in RPM will not allow you to actually build the package.
Being able to build an RPM from the tarball source package is something for software maintainers to advertise to their users, but isn't a reliable way to develop RPM packages.
Link | Leave a comment | Add to Memories | Tell a Friend
Logging the changes
Nov. 3rd, 2008 | 12:28 pm
Most software projects also use a version control system, which in addition to tracking the line-by-line changes to a file, they also require a developer add a description for each change, very analogous to the entries of a ChangeLog file.
But even with version control software -- like CVS, Subversion and Git -- ChangeLog files will probably not go away. The public can easily access the revision history of many a free software project. Regardless, I predict separately maintaining the change history of a source package and shipping the ChangeLog file with a source tar-ball will not go away anytime soon.
There should always be a low hurdle for users to determine what has changed in a new release. Further, contributions to software are often made outside the software project's coterie. The request is often made of contributors, "Please include a ChangeLog entry with your submitted patch." ChangeLog file's also stick around because people rely on them. The format of ChangeLog files are really easy to read once you get used to them.
The problem is that the relation between version control software and ChangeLog files isn't automatic enough as it should be. There can never be a complete and binding relationship between version control log entries and the ChangeLog file. For example, some version control entries should not and do not translate well to a ChangeLog file -- and that's ok. This is why the batch-operating commands -- like rcs2log, cvs2cl, svn2cl -- don't work well enough. After running these commands, the ChangeLog file still usually needs to be hand-edited.
In Emacs, ChangeLog support is quite successful. There is a ChangeLog Mode that supports the unique syntax well, and can also automatically generate entries in the closest ChangeLog file. It formats the entry based on either the location in a source file or a diff file. You really have to see it to believe it.
Unfortunately, Emacs doesn't coordinate its ChangeLog features with its equally impressive version control features. It does have the batch-oriented commands that I previously mentioned.
Eric Ludlam of CEDET fame, has come up with an Emacs command for populating the log entry buffer with entries palpable for ChangeLog. It's an interesting solution to the problem.
Unfortunately, on the projects I work, I usually work on the ChangeLog file before I make the commit. So, I'd prefer to go the other direction, edit my ChangeLog file and have them populate the revision control logs.
However, perhaps Ludlam's solution is the way it should be.
Link | Leave a comment {2} | Add to Memories | Tell a Friend
Free software maintainer manual
Oct. 24th, 2008 | 04:53 pm
It was updated as recently as August of 2008. However, the booklet covers topics that have lasting currency.
Link | Leave a comment {1} | Add to Memories | Tell a Friend
RPM macro includes
Oct. 1st, 2008 | 11:50 am
At my work, we use RPM to modify various system configuration files on Fedora GNU/Linux. We do a few tricks with RPM scripting in our RPM spec files to get this accomplished the way we'd like. Unfortunately, this results in a lot of SPEC files having the same block of 100 or more lines of shell script.
To our surprise, we've had to already fix a few bugs in the RPM script used across these files. The fixes can usually be propagated across the RPM spec files by applying a patch with the latest fix to the code. However, it was discovered you can create a macro file that contains the duplicated code and put it in /etc/rpm. My Fedora 9 system has the following macro files:
$ ls -1 /etc/rpm/ macros.dist macros.jpackage macros.pear macros.texlive platform
Fortunately, the macros are expanded and inserted in the spec file at compile-time (when the RPM is built), rather than run-time (when the RPM is installed or uninstalled). So we need to make sure that our custom RPMs files are on the build machine, but not any of the installed machines. If we find a bug, then the RPMs need to rebuilt and released as an update.
One could imagine it might be helpful to have the macro expanded at run-time, so you could fix bugs in the RPM scripts by just updating the macro file, and propagating the macro file through an update. However, this could have less than useful side-effects, ones that aren't confirmed by the original RPM-builder's intentions.
Link | Leave a comment | Add to Memories | Tell a Friend
HTTP/1.1 request with telnet
Sep. 26th, 2008 | 04:15 pm
Here's how I make an HTTP/1.1 request from the command-line using telnet.
$ ( echo "GET / HTTP/1.1";
echo "Host: www.yahoo.com";
echo "User-Agent: $(bash --version | head -n 1)";
echo "Connection: close";
echo; echo;
sleep 1 ) | telnet www.microsoft.com 80
Sometimes you need to increase the sleep time if the Web server is taking longer to return the response, and you want to keep telnet from closing the connection prematurely.
The output sent to the Web server can be shown by simply removing the pipe to telnet from above.
$ ( echo "GET / HTTP/1.1";
echo "Host: www.yahoo.com";
echo "User-Agent: $(bash --version | head -n 1)";
echo "Connection: close";
echo; echo;
sleep 1 )
GET / HTTP/1.1
Host: www.yahoo.com
User-Agent: GNU bash, version 3.2.33(1)-release (i386-redhat-linux-gnu)
Connection: close
[2 empty lines]
I really enjoy reporting my User-Agent as GNU Bash shell
Yes, the domain names used in this example are only a humorous mention of Carl Icahn's latest proxy battle.
Telnet is a bit of a pain, might as well just use GNU Wget if you can.
$ wget -S -O - http://www.gnewsense.org/
That's much shorter to type.
Link | Leave a comment {5} | Add to Memories | Tell a Friend
Feeding entropy to GnuPG on Fedora
Aug. 28th, 2008 | 03:08 pm
In a previous post, I mentioned we are putting together an RPM build server at work. The RPMs that are built are signed by an encryption key and uploaded to the Yum server. The GnuPG (GPG) signing will give us confidence that the RPMs were from the build server and weren't tampered with since they were built and copied to the Yum repository.
At this point, the security of the signing key is not important. I say this confidently even after the recent package signing compromise at Fedora and Red Hat. We want to have automated package signing and we're only building packages for distribution inside the office.
One nice feature of GnuPG is its automatic key generation. The RPM build server is generating its own key, and preferably as non-interactive as possible. Unfortunately, this requires entropy to work consistently.
For information about automatically generating keys with GPG see the section "Unattended key generation" in the DETAILS file that comes with GnuPG. That documentation can be found on a GNU/Linux system with the following command.
$ less -p "^Unattended" /usr/share/doc/gnupg-*/DETAILS
As the summary says:
This feature allows unattended generation of keys controlled by a parameter file. To use this feature, you use --gen-key together with --batch and feed the parameters either from stdin or from a file given on the command line [sic].
Here's an example of automatically generating a secret GPG key.
$ cat gpg-key.conf
%echo Generating a package signing key
Key-Type: DSA
Key-Length: 1024
Subkey-Type: ELG-E
Subkey-Length: 2048
Name-Real: Build Server
Name-Email: builds@site.org
Expire-Date: 0
Passphrase: Does not ex1st!
%commit
%echo Done
$ gpg --batch --gen-key gpg-key.conf \
> gpg-keygen.log \
2> gpg-keygen_error.log
Those familliar with generating keys know that it is an extremely interactive process. Not just for entering the details about the key, but because you need to inject
If you see no progress during key generation you should start some other activities such as moving the mouse or hitting the CTRL and SHIFT keys. Generate a key only on a machine where you have direct physical access - don't do it over the network or on a machine also used by others, especially if you have no access to the root account. (original emphasis)
This becomes a problem on servers that don't have mice or keyboards attached. One would typically see the following message from GnuPG complaining about not having enough entropy.
$ gpg --batch --gen-key gpg-key.conf gpg: Generating a package signing key .++++++++++++++++++++...+++++..++++++++++++++++++++++++++++++++++++++++++++++++ +++++++.+++++++++++++++++++++++++++++++++++++++++++++++++++++++..>+++++...+++++ Not enough random bytes available. Please do some other work to give the OS a chance to collect more entropy! (Need 123 more bytes) gpg: Interrupt caught ... exiting
As a sidebar, the "Key generation" section of the DETAILS file explains all those special characters spit to the screen when the key is generated.
Key generation shows progress by printing different characters to
stderr:
"." Last 10 Miller-Rabin tests failed
"+" Miller-Rabin test succeeded
"!" Reloading the pool with fresh prime numbers
"^" Checking a new value for the generator
"<" Size of one factor decreased
">" Size of one factor increased
I tried various complicated strategies of creating entropy on a headless system to no success. One of them was piping the output of /dev/random into /dev/urandom and visa verse. Let's see if I can rehash it here.
$ b=2048; \
future=$(date -d'+6 seconds' +'%s' ); \
while [ ${future} -gt $(date +'%s') ]; do \
head -c b /dev/random > /dev/urandom; \
head -c ${b} /dev/urandom > /dev/random; \
done &
$ gpg --batch --gen-key gpg-key.conf
Anyway, it didn't work.
Running this does, though.
# rngd -r /dev/urandom
The rngd service provides "true random number generation" (RNG). It comes as part of the rng-tools package.
According to the documentation in the Linux kernel:
The hw_random framework is software that makes use of a special hardware feature on your CPU or motherboard, a Random Number Generator (RNG). The software has two parts: a core providing the /dev/hw_random character device and its sysfs support, plus a hardware-specific driver that plugs into that core.
In Fedora, this package can be installed with Yum.
# yum install rng-utils
I've arrived on Planet Fedora. Planet Fedora is an aggregation of article feeds from members of the Fedora Project -- a community project affiliated with Red Hat that distributes the GNU/Linux operating system.
Link | Leave a comment {9} | Add to Memories | Tell a Friend
Shell hack: md5sum file name output
Jul. 6th, 2008 | 08:40 am
Noticed an interesting thing about md5sum command that comes with GNU Coreutils. If there is a newline or backslash in the filename, md5sum leads the output with a backslash.
I wasn't able to find this in the user manual. Although, I may have been searching for the keyword "null" (to understand this read on) rather than a more appropriate term like "slash". Instead, I tried to verify it myself.
$ touch foo-bar $ md5sum foo-bar d41d8cd98f00b204e9800998ecf8427e foo-bar $ touch foo\\bar $ md5sum foo\\bar \d41d8cd98f00b204e9800998ecf8427e foo\\bar
I eventually did find where the feature is documented in the manual:
If FILE contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names.
I even found in the md5sum source code where this is intentionally done:
/* Output a leading backslash if the file name contains
a newline or backslash. */
if (strchr (file, '\n') || strchr (file, '\\'))
putchar ('\\');
Clearly, the motivation for this is to handle arbitrarily named files. However, it departs from the GNU standard to use the null character to delimit lines, see "NUL Terminated File Names" in the GNU tar manual.
If you're using the output of md5sum as part of a process with your shell programming, this behavior of md5sum becomes less than helpful. No other command uses this format for representing arbitrary file names. Fortunately, it is easy enough to convert the output of md5sum to the null-terminated line format -- understood by many GNU programs -- and carry on with your work. Here's one solution using GNU gawk that converts md5sum output to null-line terminated format:
$ touch foo\\bar 'foo^V^Jbar'
$ md5sum foo\\bar 'foo^V^Jbar'
$ cat md5sum2null.awk
#!/usr/bin/gawk
/^\\/ {
gsub(/^\\/, "");
gsub(/\\\\/, "\\");
gsub(/\\n/, "\n");
}
{
printf "%s\0", $0;
}
$ md5sum foo\\bar 'foo^V^Jbar' | gawk -f md5sum2null.awk
Now back to your regularly scheduled programming.
Link | Leave a comment | Add to Memories | Tell a Friend
Shell hack: Random password generator
Jun. 19th, 2008 | 12:29 pm
In a previous post about shell hacking, I wrote.
"The command tools in unix shell programming are general enough to do pretty monumental tasks with just using a small number of commands -- in both breadth and length. Even a little bit of properly written complex shell programming can allow you to write a pretty full-proof command -- as a proof-of-concept or as temporary solution until you discover a shortcoming. [...] Although rare, if the shell doesn't have what you need, then you're using the wrong tool."
In that spirit, I wanted to see how the shell and its sister tools in unix-land could handle generating random passwords.
After searching around a bit, I was able to find some good strategies for generating random passwords with the shell, but nothing I was entirely pleased with. The following explains my approach to this problem.
The best way to get an unlimited number of bits on a unix system is with the system device /dev/urandom. For shell programming, it can handily spit out random characters for you. I don't want every character possible, however. For my purpose of generating random passwords, the alpha-numeric and punctuation characters would be enough and the more randomness the better. I don't need the passwords to be human-readable or memorable.
The tr command can filter to those characters you want, and the head command can limit the number of characters you want. To get 6 random characters that are either alpha-numeric or punctuation you can use the following command in GNU Bash.
$ tr -dc "[:alnum:][:punct:]" < /dev/urandom | head -c 6 && echo S>t^V`
The echo inserts a newline for display purposes after the characters are printed, since neither tr, /dev/urandom, or head insert an endline character for you.
According to the GNU Grep user manual , there are 32 punctuation characters. That means there are a total of 94 distinct characters available to a random password here. If we generated just 8 character passwords, that would give -- 94 to the power of 8 (94^8) -- 6,095,689,385,410,816 (6.1e15) different possibilities, roughly 2 to the power of 52 (2^52).
The other desirable characteristic of a random password generator is to have variable password lengths. How to generate a random integer in the shell? Most Bourne shells -- including GNU Bash -- have a built-in RANDOM environmental variable to return a random number.
$ echo $RANDOM 6472
To generate a random number between 8 and 16 -- inclusive:
$ ( min_length=8; \
max_length=16; \
echo $(( $RANDOM % ($max_length - $min_length + 1) + $min_length )) )
15
Combine this all together.
password=$(tr -dc "[:alnum:][:punct:]" < /dev/urandom \
| head -c $( RANDOM=$$; echo $(( $RANDOM % (8 + 1) + 8 )) ) )
echo "${password}";
P50.6kw41
Note that it's good practice to seed the random number generator with the current process number -- RANDOM=$$;, even though most shells properly initialize it, already.
According to my handy Emacs calculator, the sum of the series of 94 to the power of k where k ranges from 8 to 16 gives 37,556,971,331,618,802,283,689,774,779,1
I needed this to automatically reset a system password for accounts at my workplace. You can take the result of the shell random password generator and send it to the passwd command.
# ( password=$(tr -dc "[:alnum:][:punct:]" < /dev/urandom \
| head -c $( RANDOM=$$; \
echo $(( $RANDOM % (8 + 1) + 8 )) ) ); \
echo "${password}"; echo "${password}"; ) | passwd USERNAME
New UNIX password: Retype new UNIX password: passwd: password updated successfully
More appropriately, password administration on most GNU/Linux systems can be done with chpasswd.
# ( user=warehouse; \
password=$(tr -dc "[:alnum:][:punct:]" < /dev/urandom \
| head -c $( RANDOM=$$; \
echo $(( $RANDOM % (8 + 1) + 8 )) ) ); \
echo "${user}:${password}" ) | chpasswd
Admittedly, this scriptlet is about as good as the pwgen command with the -s and -y options.
For further reading, see pwgen.sh where I have accumulated all of this together into a shell script.
Link | Leave a comment | Add to Memories | Tell a Friend
Timing processes in the shell
Jun. 11th, 2008 | 03:20 pm
I use the time command in my work every so often. Here's a primitive example.
$ time sleep 2 real 0m2.012s user 0m0.000s sys 0m0.000s
The output is quite friendly, or I suppose I'm just used to it.
Sometimes I want to track the performance of something that's later in the command
$ echo 1 | time sleep 2 0.00user 0.00system 0:02.02elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (136major+14minor)pagefaults 0swaps
What happened? Why does the output format change for the time command?
The answer is that the time used is different in the second example.
The first version of time comes as a built-in with the GNU Bash shell.
The second version is Bash's cousin, GNU time.
$ which time /usr/bin/time $ /usr/bin/time sleep 2 0.00user 0.00system 0:02.07elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 56inputs+0outputs (1major+154minor)pagefaults 0swaps $ /usr/bin/time --version GNU time 1.7
Both of these versions support a POSIX format with -p.
$ /usr/bin/time -p sleep 2 real 2.00 user 0.00 sys 0.00 $ time -p sleep 2 real 2.00 user 0.00 sys 0.00
However, it's not good enough. I'm often wrapping the command I want to clock in a bash sub-process.
$ echo 1 | bash -c "time sleep 2" real 0m2.001s user 0m0.000s sys 0m0.000s
I don't predict I'll lose either habit of preferring the default Bash format or using bash -c.
GNU time does have the ability to set the output format with the TIME environment variable.
$ /usr/bin/time -f "real\t%es\nuser\t%Us\nsys\t%Ss\n" sleep 1 real 1.00s user 0.00s sys 0.00s $ export TIME="real\t%es\nuser\t%Us\nsys\t%Ss\n" $ echo 1 | time sleep 1 real 1.00s user 0.00s sys 0.00s
This would be a good compromise, but I'm too staunch a user of Bash's defaults and try to avoid ever changing my settings in my .bashrc file.
Link | Leave a comment | Add to Memories | Tell a Friend
Finding "leaf" nodes on a file system
Jun. 2nd, 2008 | 02:13 pm
After making some significant changes to the organization of files, you often want to be a complete geek and show the results of your work. Listing all the resulting files is easy enough on GNU/Linux with findutils:
$ find . -type f
If you had some symlinks, you can show them, too.
$ find . -type f -o -type l
If you want to be more general you can do this.
$ find . ! -type d
But often when you propose a new file system hierarchy you may have left some empty directories as placeholders. Here's a simple example.
$ mkdir a a/b a/b/c $ touch a/b $ find a a a/b a/b/2 a/b/c
My first approach to the problem was to output the files through Awk -- trusting the order of listed files by find, and only print names that are not part of previous names.
$ find a -depth | awk '{if (!match(last, $0)) { print $0 } last = $0}'
a/b/2
a/b/c
Kind of ugly. Since you use strings instead of node information, you can have a file and another file (or subdirectory) in the same directory with the same prefix and get a false negative.
$ touch a/b/c2
$ find a -depth | awk '{if (!match(last, $0)) { print $0 } last = $0}'
a/b/2
a/b/c2
The output should be
a/b/2 a/b/c2 a/b/c
I figured the tree command that comes with GNU coreutils could probably do this, but didn't see such an option.
In unix file system theory, directories and files are linked nodes in a tree structure. A file is linked to the node of the directory it resides, and every subdirectory is linked to 1) the directory it resides 2) the special file "." 3) and any other files or subdirectories it contains. In isolation, files can be always considered to have only 1 link while every subdirectory must have at least 2 links. If a directory only has exactly 2 links then it is a "leaf" directory! Empty subdirectories are technically "leaves" -- in the sense they terminate the hierarchical structure of a file system.
So the following can usually print all the leaf nodes under the current working directory of a file system.
$ find . -links -3
This won't work if there are "leaf" directories with hard links. If that were the case I'd probably go back to my Awk script.
However, my co-worker brought the -empty option of find to my attention. I knew the option could find empty files, but I didn't know it knew the definition of an empty directory, too.
So here is how to show the leaf nodes of the current working directory.
$ find . ! -type d -o -empty
GNU findutils wins again!
Unfortunately, the theoretical solution doesn't work should you have an alternate definition for "empty". For example, revision control software and other various systems will put auxiliary files or directories in an empty directory. For example, with Subversion every directory in a checked-out version of the repository has a .svn subdirectory. In this scenario, you can either go with an alternate definition of links -- -links -4, or duplicate the entire repository, delete the auxiliary directories and try with the -empty option again.