Monday, April 29, 2013

dmesg -H is sexy

dmesg(1) from util-linux 2.23 provides some small usability improvements and all the improvements is possible to enable by -H,--human command line option.
  • enable pager (you know this for example from 'git log')
  • enable relative timestamps to display local time and delta in human readable format
  • enable colors for timestamp prefix, subsystem prefix and message body (different colors for different log levels)
dmesg -H util-linux 2.23

Hint: add

    alias dmesg="dmesg --human" 

somewhere to your /etc/profile.d/ directory.

Tuesday, April 23, 2013

umount(8), mount(8) and nsenter(1)

umount(8) from util-linux 2.23 (now -rc2) supports new command line options --recursive and --all-targets. The new command nsenter(1) opens doors to namespaces.

 # umount /mnt/A
 umount: /mnt/A: target is busy.

This is pretty common situation, the problem is obvious:

 # findmnt -R /mnt/A
 TARGET     SOURCE    FSTYPE OPTIONS
 /mnt/A     /dev/sdb1 ext2   rw,relatime,stripe=32
 └─/mnt/A/B /dev/sdb2 ext2   rw,relatime,stripe=32

so you have to unmount /mnt/A/B before /mnt/A. In some cases especially in scripts it could be a little bit tricky to umount all in the right order. The user friendly solution is --recursive:

 # umount --recursive /mnt/A

Note that this solution is not atomic and possible umount options (like --lazy) are applied to all umount(2) calls.

The another improvement is the option --all-targets. It umounts all mountpoints for the given filesystem (device) in the current namespace. This options is usable in situations when the same device is mounted on more places. The option --all-targets could be used together with the option --recursive.

 # findmnt -R /dev/sdb1
 TARGET     SOURCE    FSTYPE OPTIONS
 /mnt/A     /dev/sdb1 ext2   rw,relatime,stripe=32
 └─/mnt/A/B /dev/sdb2 ext2   rw,relatime,stripe=32
 /mnt/B     /dev/sdb1 ext2   rw,relatime,stripe=32
 /mnt/C     /dev/sdb1 ext2   rw,relatime,stripe=32

verbose mode provides more details about the umount order:
 
 # umount --recursive --all-targets --verbose /dev/sdb1
 umount: /mnt/C (/dev/sdb1) unmounted
 umount: /mnt/B (/dev/sdb1) unmounted
 umount: /mnt/A/B (/dev/sdb2) unmounted
 umount: /mnt/A (/dev/sdb1) unmounted

Note that /proc/self/mountinfo contains information about mountpoints hierarchy as well as chronological order.

All these umount(8) improvements have a small limitation -- umount(8) works with the current namespace only. It means if you want to be really paranoid than you should not expect that after --all-targets is the devices completely unmounted.

Fortunately we have a new command nsenter(1) to enter the namespaces of the other processes. Let's create a session with unshared mount namespace:
 
 # mount --bind --make-private /mnt/test /mnt/test
 
 # unshare --mount
 
 # mkdir /mnt/test/foo
 # mount /dev/sdb1 /mnt/test/foo
 
 # findmnt -R -o +PROPAGATION /mnt/test
 /mnt/test       /dev/sda4[/mnt/test] ext4   rw,relatime,data=ordered private
 └─/mnt/test/foo /dev/sdb1            ext2   rw,relatime,stripe=32    private
 
 # echo $$
 28008

Note that --make-private is necessary if the parent is mounted as shared (this is default for example on Fedora). Another session (shell):

 # findmnt -R /mnt/test
 TARGET    SOURCE               FSTYPE OPTIONS
 /mnt/test /dev/sda4[/mnt/test] ext4   rw,relatime,data=ordered

The /mnt/test/foo is invisible in this namespace, but we can enter the namespace by nsenter(8) command and we can make changes in the namespace:
 
 # nsenter --mount --target 28008
 
 # mkdir /mnt/test/bar
 # mount /dev/sdb2 /mnt/test/bar
 
 # findmnt -R /mnt/test
 TARGET          SOURCE               FSTYPE OPTIONS
 /mnt/test       /dev/sda4[/mnt/test] ext4   rw,relatime,data=ordered
 ├─/mnt/test/foo /dev/sdb1            ext2   rw,relatime,stripe=32
 └─/mnt/test/bar /dev/sdb2            ext2   rw,relatime,stripe=32
 
 # echo $$
 29886

It means we have two sessions (shells with PIDs 28008 and 29886) that share the same mount namespace.

The another example is to umount a directory in all namespaces:
 
 for p in $(pidof bash); do
  nsenter --mount --target $p -- umount --recursive /mnt/test
 done

The command nsenter(1) (as well as unshare(1)) supports mount, uts, IPC, net, PID and user namespaces. Note that you can use
 
  findmnt --task $pid

to list another mount namespaces. nsenter(1) is necessary only if you want to do changes to the namespaces.

2.23 highlights:
  • mount(8) allows to use propagation flags together with another mount operations (e.g. mount --make-private /dev/sda1 /mnt)
  • mount(8) allows to specify propagation flags in /etc/fstab by mount options (private,slave, ...)
  • mount(8) supports new option x-mount.mkdir to create mountpoint directories
  • findmnt(8) lists propagation flags (e.g. findmnt -o +PROPAGATION)
  • unshare(1) and nsenter(1) execute shell if no program specified
Note that Linux kernel still does not allow to use propagation flags together with another mount operations. All is implemented in userspace by additional mount(2) calls -- one call for one propagation flag, see strace output:

 # strace -e mount  mount --bind  --make-private /mnt/test /mnt/test
 mount("/mnt/test", "/mnt/test", 0x1886f10, MS_MGC_VAL|MS_BIND, NULL) = 0
 mount("none", "/mnt/test", NULL, MS_PRIVATE, NULL) = 0


Wednesday, February 20, 2013

local yum repository for koji builds

It takes days to see new packages on the official Fedora update-testing mirrors, and I'm so impatient... And sometimes it's necessary to test unofficial builds. The solution is pretty simple with "koji download-build" command.

Let's create a /usr/local/bin/local-repo script:
  #!/bin/bash
  REPODIR=/tmp/local-repo

  mkdir -p $REPODIR
  koji download-build $1
  createrepo -d .
And local repository config file /etc/yum.repos.d/local-updates-testing.repo
  [local-updates-testing]
  name=Local $releasever - $basearch - Test Updates
  failovermethod=priority
  baseurl=file:///tmp/local-repo/
  enabled=0
  gpgcheck=0
  metadata_expire=10
Now if you want to download the latest package:
  # local-repo pkgname-ver-rel
  # yum --enablerepo=local-updates-testing update
Note that if you want to use the repository for something large than /tmp is probably not the best place :-)

Monday, February 18, 2013

non-recursive automake

You probably know Recursive Make Considered Harmful article. The simple way how to implement non-recursive build-system is to use one top level Makefile. Well, autotools are the best, so we will write Makefile.am rather than directly Makefile :-)

For more details about the basic non-recursive make see Flameeyes's autotools-mythbuster. I'd like to talk about something more advanced.

The problem is that maintain all in one huge Makefile.am is pretty painful, especially if your project uses many subdirectories. From my point of view it's better to maintain make rules at the same place (same directory) like the code.

Fortunately automake is smart enough to generate one huge Makefile from many .am files. The solution is "include" automake directive.

For example (top level Makefile.am):
  include foo/Makemodule.am
  include bar/Makemodule.am
where foo/ and bar/ are sub-directories and Makemodule.am is sub-directory specific automake stuff. You can use another name if you don't like Makemodule.

The important is to understand that in this case the result will be one Makefile. It means that things like usrbin_PROGRAMS are interpreted as global variables. The ideal solution is to define all the global variables before you include Makemodule.am and use += operator in the Makemodule.am files. For example:
top level Makefile.am:
   usrbin_PROGRAMS =
   man_MANS =
 
   include foo/Makemodule.am
   include bar/Makemodule.am
foo/Makemodule.am:
   usrbin_PROGRAMS += myprog
   man_MANS += foo/myprog.8
   myprog_SOURCES = foo/myprog.c \
                    foo/myprog-utils.c
Don't forget to use complete paths for all your project files (e.g. foo/myprog.c). The compiled binaries (final programs) will be stored in top-level directory, things like .o files will be store in the sub-directories.

Well, set some variables is pretty simple, but what about real make rules and automake hooks? Let say that program "abc" requires a special "make install" rule to create an "ABC" symlink:

top level Makefile.am:
    INSTALL_EXEC_HOOKS =

    include abc/Makemodule.am

    install-exec-hook: $(INSTALL_EXEC_HOOKS)
abc/Makemodule.am:
    install-exec-hook-abc::
         cd $(DESTDIR)$(usrsbin_dir); ln -sf ABC abc

    INSTALL_EXEC_HOOKS += install-exec-hook-abc
The trick is that you define INSTALL_EXEC_HOOKS global variable that points to all your sub-directory specific rules and the real automake "install-exec-hook" depends on this variable.

Note that you can use automake conditionals, for example:
  if LINUX
  ... 
  endif
for all the stuff.

The result is build-system with small readable subdir/Makemodule.am files and one top level maintainable Makefile.am.

The final top-level Makefile generated from your Makefile.am will be almost always faster. For example util-linux (make -j):
           recursive | non-recursive
           ----------+--------------
   2 cores: 14.5 sec | 13.2 sec
  16 cores:  9.5 sec |  4.3 sec
... but the numbers are not so important. After more than 6 months with non-recursive build-system I have to say that subdir/Makemodule.am based solution is better, because:
  • all variables are shared, all is initialized on one place in top-level Makefile.am
  • dependences between programs and libs (yeah, I use libtools) work as expected without extra make rules
  • because all is interpreted from top level directory I don't have to care about correct $(srcdir) and $(top_strcdir) within sub-directories
  • all final binaries are on one place, just type "make prog; ./prog" to test the program without care about a sub-directory. For recursive build-system you have to use "make -C subdir prog; ./subdir/prog".
That's all.

Thursday, January 24, 2013

gummiboot

Maybe it's not obvious, but boot loader could be pretty simple. It's really not necessary to use turing-complete language in config files or extra filesystem drivers -- all this is overkill with UEFI firmware.

If you have a machine with UEFI, then I have good news for you:
  • the firmware is able to read GPT partition table (by the way, it's the best partition table format at all)
  • it's able to read data from FAT32 filesystem from you system partition (GPT uses partition type specific UUIDs to identify partitions)
  • it's possible to use more then one boot loader (it means that you can try another boot loader and your original boot method won't be affected)
  • you can modify your UEFI boot setting from Linux command line (reboot -> bios -> reboot is unnecessary)
  • all your configured boot loaders are visible in your bios boot menu (e.g. F12 for ThinkPad)
I guess that for Fedora 19 we will have rpm packages and some nice way how to integrate gummiboot to the distribution (don't forget that with UEFI you can use more boot loaders, so the official distribution boot loader as well as alternative methods maybe be supported).

My How-to:

Install gnu-efi library and EFI boot manager (for example for Fedora):
  # yum install gnu-efi efibootmgr
compile gummiboot:
  $ git clone git://anongit.freedesktop.org/gummiboot
  $ cd gummiboot
  $ make
check your partition table, /dev/sda1 is usually the system partition (from EFI point of view), use partx(8) to see more details:
  # partx -n 1 /dev/sda
  NR START     END SECTORS  SIZE NAME                 UUID
   1  2048 2050047 2048000 1000M EFI System Partition 623b2882-c50b-48af-b3f8-f19e8639b02b
the partition with FAT filesystem is mounted on /boot/efi, use findmnt(8) to see more details:
  # findmnt /dev/sda1
  TARGET    SOURCE    FSTYPE OPTIONS
  /boot/efi /dev/sda1 vfat   rw,relatime
if you have UEFI machine than the partition is probably already initialized by your distribution.

Now install the boot loader:
   # mkdir -p /boot/efi/EFI/gummiboot
   # cp gummiboot.efi /boot/efi/EFI/gummiboot/
inform your UEFI about the new boot loader:
   # efibootmgr -c -L "Gummiboot" -l '\EFI\gummiboot\gummiboot.efi'
now in your bios boot menu will be a new entry "Gummiboot". Note that UEFI uses windows-like paths, so '\' is correct. The efibootmgr(8) is a command line util to manipulate with boot EFI variables, you need kernel with /sys/firmware/efi/vars support (for example standard Fedora kernel).

Use
   # efibootmgr -v
to see more details, you can also change boot order and another things by efibootmgr.

The last step is to add entry (kernel) to gummiboot, there is a nice script in gummiboot in repository:
  loader-postinst.sh kernel-version path-to-kernel  
you can run this script manually or you can add the script into /etc/kernel/postinst.d (this directory does not exist by default on Fedora, mkdir -p is your friend...), then the script will be automatically executed after new kernel install.

The script copies the kernel to /boot/efi/distroname/machine-id where distroname is /etc/os-release and machine-id is is /etc/machine-id.

The script also creates /boot/efi/loader/entries/*.conf file with kernel command line from /etc/kernel/cmdline, paths to kernel, initrd. etc.

The result:

  • we have kernel in boot loader independent directory, you can use another boot loader (UEFI Shell, elilo, etc.) to read the kernel images
  • all is accessible for UEFI firmware on FAT filesystem (boot loader does not have to support ext4 filesystem for example)
  • separate config files for each kernel in /boot/efi/loader/entries/ directory
  • /etc/kernel/postinst.d based solution is extendable and open for alternative boot loaders.
Yes, gummiboot does not provide console to resolve possible boot problems. It seems unnecessary, you can install UEFI Shell. For more details see Arch Wiki.

Reboot :-)

Monday, September 17, 2012

util-linux 2.22

I released the latest util-linux version 2.22 one week ago. As usually the list of the changes in our ReleaseNotes is huge (it would be nice to release more often ;-)).

For me, the most important thing is that number of the project contributors is growing and we are able to coordinate our changes with another upstream projects like coreutils, procps-ng or systemd and maintainers from distributions contribute to the project.

The commands like mount(8), umount(8) or swapon(8) support new tags PARTLABEL=
and PARTUUID=.  It means that you can address partitions by name or UUID independently on the filesystem on the device.  You don't have to care about your fstab after mkfs or mkswap. The setting with PARTUUID= will be always valid.

Finally, we have dmesg --follow to print new kernel messages (like tail -f). This feature depends on readable /dev/kmsg (since kernel 3.5.0). I have also implemented a new dmesg output format --reltime (suggested by Linus on lkml):

$ dmesg --reltime
...
[Aug26 10:58] scsi_debug: host protection
[  +0.000004] scsi84 : scsi_debug, version 1.82 [20100324], dev_size_mb=50, opts=0x0
[  +0.000546] scsi 84:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
[  +0.000173] sd 84:0:0:0: Attached scsi generic sg1 type 0
[  +0.000356] sd 84:0:0:0: [sdb] 102400 512-byte logical blocks: (52.4 MB/50.0 MiB)
[  +0.000988] sd 84:0:0:0: [sdb] Write Protect is off
[  +0.000004] sd 84:0:0:0: [sdb] Mode Sense: 73 00 10 08

The low-level userspace tools consolidation continues:
  • sulogin(1) and utmpdump(1) from sysvinit merged into util-linux (the goal is to remove all init independent utils from sysvinit package)
  • eject(1) reimplemented to use /proc and /sys information and libmount and moved to util-linux
  • new command lslocks(8) as replacement for dead lslk(8)
The command findmnt(8) has initial support for per-process mount tables (namespaces), for example findmnt --task will print mount table for the PID. In the next release I'd like to have a new option --unshared-tasks to print all processes with unshared mounts.

The tool lsblk(8) supports reverse trees, it means that you can see whole stack of the block devices from top to down:

$ lsblk -s /dev/mapper/luks-10d813de-fa82-4f67-a86c-23d5d0e7c30e
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
luks-10d813de-fa82-4f67-a86c-23d5d0e7c30e (dm-0)
        253:0    0  39.1G  0 crypt /home/kzak
└─sda6    8:6    0  39.1G  0 disk
  └─sda   8:0    0 149.1G  0 disk

The most invasive change is a new non-recursive build-system (just for the record: autotools are the best :-)). The result is faster build, binaries are in one top-level build directory rather than in many subdirectories, make distcheck calls our regression etc.

The another big change is fdisk refactoring. This is slow and painful work, but the result should be GPT support in release 2.23 (patches from Davidlohr Bueso are already in mailing list). I hope that one day the default fdisk will be nice, readable (colored?) low-level tool without some obsolete junk like CHS.

Note that in the next util-linux release 2.23 we're going to remove cryptoloop support. Yes, cryptoloop is bad and dead, use dm-crypt. (Note that util-linux upstream has never supported loop-AES.)

Thursday, August 2, 2012

lslocks(8)

The command lslk(1) is unmaintained since year 2001 and it seems that there is no usable replacement for this command in distributions.

So, we (Davidlohr Bueso and I) decided to write a new implementation from scratch. The new implementation is based on the same concept like findmnt(8), lsblk(8) etc. It means that you can control output columns and output format.

# lslocks
COMMAND           PID  TYPE SIZE MODE  M      START                 END PATH
iscsiuio         1043 POSIX   5B WRITE 0          0                   0 /run/iscsiuio.pid
iscsid           1051 POSIX   5B WRITE 0          0                   0 /run/iscsid.pid
crond            1076 POSIX   5B WRITE 0          0                   0 /run/crond.pid
libvirtd         1264 POSIX   4B WRITE 0          0                   0 /run/libvirtd.pid
(unknown)        1304 FLOCK   0B WRITE 0          0                   0 /run
tracker-store    2088 POSIX 6.7M READ  0 1073741826          1073742335 /home/kzak/.cache/tracker/meta.db
tracker-store    2088 POSIX  32K READ  0        128                 128 /home/kzak/.cache/tracker/meta.db-shm
tracker-miner-f  2089 POSIX 6.7M READ  0 1073741826          1073742335 /home/kzak/.cache/tracker/meta.db
tracker-miner-f  2089 POSIX  32K READ  0        128                 128 /home/kzak/.cache/tracker/meta.db-shm
firefox         17151 POSIX   0B WRITE 0          0                   0 /home/kzak/.mozilla/firefox/zf4j57bz.de
firefox         17151 POSIX 416K READ  0 1073741826          1073742335 /home/kzak/.mozilla/firefox/zf4j57bz.de
firefox         17151 POSIX  20M READ  0 1073741826          1073742335 /home/kzak/.mozilla/firefox/zf4j57bz.de
firefox         17151 POSIX  32K READ  0        128                 128 /home/kzak/.mozilla/firefox/zf4j57bz.de
firefox         17151 POSIX 1.5M READ  0 1073741826          1073742335 /home/kzak/.mozilla/firefox/zf4j57bz.de
firefox         17151 POSIX  32K READ  0        128                 128 /home/kzak/.mozilla/firefox/zf4j57bz.de
firefox         17151 POSIX 416K WRITE 0 1073741824          1073742335 /home/kzak/.mozilla/firefox/zf4j57bz.de
java            25348 POSIX   0B WRITE 0          0                   0 /opt/xmind/Commons/configuration/org.ec
java            25348 POSIX   0B WRITE 0          0                   0 /opt/xmind/Commons/configuration/org.ec
java            25348 POSIX   0B WRITE 0          0                   0 /opt/xmind/Commons/configuration/org.ec
java            25348 POSIX   0B WRITE 0          0 9223372036854775806 /opt/xmind/Commons/data/workspace-cathy
atd              3348 POSIX   5B WRITE 0          0                   0 /run/atd.pid
sendmail         3400 POSIX  33B WRITE 0          0                   0 /run/sendmail.pid
sendmail         3419 POSIX  49B WRITE 0          0                   0 /run/sm-client.pid
Select process:
# lslocks --pid $(pidof crond) 
COMMAND   PID  TYPE SIZE MODE  M START END PATH
crond    1076 POSIX   5B WRITE 0     0   0 /run/crond.pid
 or use it in scripts:
 for x in $(lslocks -rn -o PID); do kill $x; done

lslocks(1) will be available in util-linux 2.22 (now -rc1).