Karel Zak's blog: 2013

Tuesday, October 15, 2013

util-linux v2.24 - fdisk(8)

This release is mostly about small incremental changes, a small exception is fdisk(8).

I have finished fdisk(8) refactoring, this task we started with Davidlohr Bueso and Petr Uzel two releases ago. The result is a code that is easy to extend and all disklabel specific code is no more maintained together as a ball of spaghetti.

The goal is to have one library (libfdisk) shared between all the fdisks (sfdisk, cfdisk, ...). Now it's used in fdisk only. I hope that one day we're going to have shared library with a public stable API.

The most visible changes:

fdisk dialogs and output unification, more verbose messages
colors for warnings
new (but backwardly compatible) menus
all operations for all disklabels based on sectors by default
man page clean up
improved GPT disklabel support (rename partition, partition and disk UUID modification)

The last missing thing is support for manual modification of hybrid GPT. This is planned for the next release v2.25.

My laptop with GPT:

# fdisk -l /dev/sda

Disk /dev/sda: 149.1 GiB, 160041885696 bytes, 312581808 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3549290F-417C-4941-8503-F7835109B821

Device           Start          End   Size Type
/dev/sda1         2048      2050047  1000M EFI System
/dev/sda2      2050048      6146047     2G Microsoft basic data
/dev/sda3      6146048     26462207   9.7G Linux swap
/dev/sda4     26462208     98142207  34.2G Microsoft basic data
/dev/sda5     98142208    230662143  63.2G Microsoft basic data
/dev/sda6    230662144    312580095  39.1G Microsoft basic data

disklabel hex dump (command 'D')

This is nice feature for advanced users. You don't have to fight with hexdump, search the right offsets etc. fdisk(8) will locate all relevant disklabel parts on the device and print all in hex format. (Don't forget that many things don't use static offsets -- for example GPT array with partitions, MBR extended partitions, etc).

# echo -e 'x\nD\nq\n' | fdisk  /dev/sdb
 
Welcome to fdisk (util-linux 2.24.rc2-22-1d8c-dirty).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
 
Command (m for help): 
Expert command (m for help): 
MBR: offset = 0, size = 512 bytes. 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
000001b0  00 00 00 00 00 00 00 00  14 92 2b 9e 00 00 00 20
000001c0  21 00 83 df 13 0c 00 08  00 00 00 20 03 00 00 df
000001d0  14 0c 05 7c 30 4c 00 28  03 00 00 98 0f 00 00 00
000001e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa
 
EBR: offset = 105906176, size = 512 bytes.
06500000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
065001c0  34 0d 83 bf 26 19 00 08  00 00 00 20 03 00 00 bf
065001d0  27 19 05 9f 39 26 00 28  03 00 00 28 03 00 00 00
065001e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
065001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa
 
EBR: offset = 211812352, size = 512 bytes.
0ca00000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0ca001b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 e0
0ca001c0  08 19 83 9f 39 26 00 08  00 00 00 20 03 00 00 00
0ca001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0ca001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa

Thursday, August 22, 2013

libmount Python binding

This week I merged Python libmount binding into util-linux. The original prototype was from Ondrej Oprala (Thanks!).

The binding provides the same functionality as libmount in C -- for now supported objects are:

Fs - filesystem description
Table - tab files parser and container for filesystems
Context - high-level API to mount/umount.

Examples:

$ python                                                        
>>> import libmount                                             
>>> tb = libmount.Table('/proc/self/mountinfo')                 
>>> fs = tb.find_target('/home/kzak')                           
>>> print fs.source                                             
/dev/mapper/luks-10d813de-fa82-4f67-a86c-23d5d0e7c30e

# python                                                        
>>> import libmount                                             
>>> cx = libmount.Context()                                     
>>> cx.target = '/mnt/backup'                                   
>>> cx.mount()                                                   
>>> fs = cx.mtab.find_target('/mnt/backup')                     
>>> print fs.fstype
nfs4

# python -c "import libmount; libmount.Context(target='/mnt/backup').umount()"

The side effect of the work on the Python binding is that libmount uses reference counting for libmnt_{fs,table,cache} objects now. So we're ready for more complicated scenarios.

.. and --without-python if you don't like it ;-)

Monday, April 29, 2013

dmesg -H is sexy

dmesg(1) from util-linux 2.23 provides some small usability improvements and all the improvements is possible to enable by -H,--human command line option.

enable pager (you know this for example from 'git log')
enable relative timestamps to display local time and delta in human readable format
enable colors for timestamp prefix, subsystem prefix and message body (different colors for different log levels)

Hint: add

alias dmesg="dmesg --human"

somewhere to your /etc/profile.d/ directory.

Tuesday, April 23, 2013

umount(8), mount(8) and nsenter(1)

umount(8) from util-linux 2.23 (now -rc2) supports new command line options --recursive and --all-targets. The new command nsenter(1) opens doors to namespaces.

 # umount /mnt/A
 umount: /mnt/A: target is busy.

This is pretty common situation, the problem is obvious:

 # findmnt -R /mnt/A
 TARGET     SOURCE    FSTYPE OPTIONS
 /mnt/A     /dev/sdb1 ext2   rw,relatime,stripe=32
 └─/mnt/A/B /dev/sdb2 ext2   rw,relatime,stripe=32

so you have to unmount /mnt/A/B before /mnt/A. In some cases especially in scripts it could be a little bit tricky to umount all in the right order. The user friendly solution is --recursive:

 # umount --recursive /mnt/A

Note that this solution is not atomic and possible umount options (like --lazy) are applied to all umount(2) calls.

The another improvement is the option --all-targets. It umounts all mountpoints for the given filesystem (device) in the current namespace. This options is usable in situations when the same device is mounted on more places. The option --all-targets could be used together with the option --recursive.

 # findmnt -R /dev/sdb1
 TARGET     SOURCE    FSTYPE OPTIONS
 /mnt/A     /dev/sdb1 ext2   rw,relatime,stripe=32
 └─/mnt/A/B /dev/sdb2 ext2   rw,relatime,stripe=32
 /mnt/B     /dev/sdb1 ext2   rw,relatime,stripe=32
 /mnt/C     /dev/sdb1 ext2   rw,relatime,stripe=32

verbose mode provides more details about the umount order:

 
 # umount --recursive --all-targets --verbose /dev/sdb1
 umount: /mnt/C (/dev/sdb1) unmounted
 umount: /mnt/B (/dev/sdb1) unmounted
 umount: /mnt/A/B (/dev/sdb2) unmounted
 umount: /mnt/A (/dev/sdb1) unmounted

Note that /proc/self/mountinfo contains information about mountpoints hierarchy as well as chronological order.

All these umount(8) improvements have a small limitation -- umount(8) works with the current namespace only. It means if you want to be really paranoid than you should not expect that after --all-targets is the devices completely unmounted.

Fortunately we have a new command nsenter(1) to enter the namespaces of the other processes. Let's create a session with unshared mount namespace:

 
 # mount --bind --make-private /mnt/test /mnt/test
 
 # unshare --mount
 
 # mkdir /mnt/test/foo
 # mount /dev/sdb1 /mnt/test/foo
 
 # findmnt -R -o +PROPAGATION /mnt/test
 /mnt/test       /dev/sda4[/mnt/test] ext4   rw,relatime,data=ordered private
 └─/mnt/test/foo /dev/sdb1            ext2   rw,relatime,stripe=32    private
 
 # echo $$
 28008

Note that --make-private is necessary if the parent is mounted as shared (this is default for example on Fedora). Another session (shell):

 # findmnt -R /mnt/test
 TARGET    SOURCE               FSTYPE OPTIONS
 /mnt/test /dev/sda4[/mnt/test] ext4   rw,relatime,data=ordered

The /mnt/test/foo is invisible in this namespace, but we can enter the namespace by nsenter(8) command and we can make changes in the namespace:

 
 # nsenter --mount --target 28008

 # mkdir /mnt/test/bar
 # mount /dev/sdb2 /mnt/test/bar
 
 # findmnt -R /mnt/test
 TARGET          SOURCE               FSTYPE OPTIONS
 /mnt/test       /dev/sda4[/mnt/test] ext4   rw,relatime,data=ordered
 ├─/mnt/test/foo /dev/sdb1            ext2   rw,relatime,stripe=32
 └─/mnt/test/bar /dev/sdb2            ext2   rw,relatime,stripe=32
 
 # echo $$
 29886

It means we have two sessions (shells with PIDs 28008 and 29886) that share the same mount namespace.

The another example is to umount a directory in all namespaces:

 
 for p in $(pidof bash); do
  nsenter --mount --target $p -- umount --recursive /mnt/test
 done

The command nsenter(1) (as well as unshare(1)) supports mount, uts, IPC, net, PID and user namespaces. Note that you can use

 
  findmnt --task $pid

to list another mount namespaces. nsenter(1) is necessary only if you want to do changes to the namespaces.

2.23 highlights:

mount(8) allows to use propagation flags together with another mount operations (e.g. mount --make-private /dev/sda1 /mnt)
mount(8) allows to specify propagation flags in /etc/fstab by mount options (private,slave, ...)
mount(8) supports new option x-mount.mkdir to create mountpoint directories
findmnt(8) lists propagation flags (e.g. findmnt -o +PROPAGATION)
unshare(1) and nsenter(1) execute shell if no program specified

Note that Linux kernel still does not allow to use propagation flags together with another mount operations. All is implemented in userspace by additional mount(2) calls -- one call for one propagation flag, see strace output:

# strace -e mount mount --bind --make-private /mnt/test /mnt/test
mount("/mnt/test", "/mnt/test", 0x1886f10, MS_MGC_VAL|MS_BIND, NULL) = 0
mount("none", "/mnt/test", NULL, MS_PRIVATE, NULL) = 0

Wednesday, February 20, 2013

local yum repository for koji builds

It takes days to see new packages on the official Fedora update-testing mirrors, and I'm so impatient... And sometimes it's necessary to test unofficial builds. The solution is pretty simple with "koji download-build" command.

Let's create a /usr/local/bin/local-repo script:

  #!/bin/bash
  REPODIR=/tmp/local-repo

  mkdir -p $REPODIR
  koji download-build $1
  createrepo -d .

And local repository config file /etc/yum.repos.d/local-updates-testing.repo

  [local-updates-testing]
  name=Local $releasever - $basearch - Test Updates
  failovermethod=priority
  baseurl=file:///tmp/local-repo/
  enabled=0
  gpgcheck=0
  metadata_expire=10

Now if you want to download the latest package:

  # local-repo pkgname-ver-rel
  # yum --enablerepo=local-updates-testing update

Note that if you want to use the repository for something large than /tmp is probably not the best place :-)

Monday, February 18, 2013

non-recursive automake

You probably know Recursive Make Considered Harmful article. The simple way how to implement non-recursive build-system is to use one top level Makefile. Well, autotools are the best, so we will write Makefile.am rather than directly Makefile :-)

For more details about the basic non-recursive make see Flameeyes's autotools-mythbuster. I'd like to talk about something more advanced.

The problem is that maintain all in one huge Makefile.am is pretty painful, especially if your project uses many subdirectories. From my point of view it's better to maintain make rules at the same place (same directory) like the code.

Fortunately automake is smart enough to generate one huge Makefile from many .am files. The solution is "include" automake directive.

For example (top level Makefile.am):

  include foo/Makemodule.am
  include bar/Makemodule.am

where foo/ and bar/ are sub-directories and Makemodule.am is sub-directory specific automake stuff. You can use another name if you don't like Makemodule.

The important is to understand that in this case the result will be one Makefile. It means that things like usrbin_PROGRAMS are interpreted as global variables. The ideal solution is to define all the global variables before you include Makemodule.am and use += operator in the Makemodule.am files. For example:
top level Makefile.am:

   usrbin_PROGRAMS =
   man_MANS =
 
   include foo/Makemodule.am
   include bar/Makemodule.am

foo/Makemodule.am:

   usrbin_PROGRAMS += myprog
   man_MANS += foo/myprog.8
   myprog_SOURCES = foo/myprog.c \
                    foo/myprog-utils.c

Don't forget to use complete paths for all your project files (e.g. foo/myprog.c). The compiled binaries (final programs) will be stored in top-level directory, things like .o files will be store in the sub-directories.

Well, set some variables is pretty simple, but what about real make rules and automake hooks? Let say that program "abc" requires a special "make install" rule to create an "ABC" symlink:

top level Makefile.am:

    INSTALL_EXEC_HOOKS =

    include abc/Makemodule.am

    install-exec-hook: $(INSTALL_EXEC_HOOKS)

abc/Makemodule.am:

    install-exec-hook-abc::
         cd $(DESTDIR)$(usrsbin_dir); ln -sf ABC abc

    INSTALL_EXEC_HOOKS += install-exec-hook-abc

The trick is that you define INSTALL_EXEC_HOOKS global variable that points to all your sub-directory specific rules and the real automake "install-exec-hook" depends on this variable.

Note that you can use automake conditionals, for example:

  if LINUX
  ... 
  endif

for all the stuff.

The result is build-system with small readable subdir/Makemodule.am files and one top level maintainable Makefile.am.

The final top-level Makefile generated from your Makefile.am will be almost always faster. For example util-linux (make -j):

           recursive | non-recursive
           ----------+--------------
   2 cores: 14.5 sec | 13.2 sec
  16 cores:  9.5 sec |  4.3 sec

... but the numbers are not so important. After more than 6 months with non-recursive build-system I have to say that subdir/Makemodule.am based solution is better, because:

all variables are shared, all is initialized on one place in top-level Makefile.am
dependences between programs and libs (yeah, I use libtools) work as expected without extra make rules
because all is interpreted from top level directory I don't have to care about correct $(srcdir) and $(top_strcdir) within sub-directories
all final binaries are on one place, just type "make prog; ./prog" to test the program without care about a sub-directory. For recursive build-system you have to use "make -C subdir prog; ./subdir/prog".

That's all.

Thursday, January 24, 2013

gummiboot

Maybe it's not obvious, but boot loader could be pretty simple. It's really not necessary to use turing-complete language in config files or extra filesystem drivers -- all this is overkill with UEFI firmware.

If you have a machine with UEFI, then I have good news for you:

the firmware is able to read GPT partition table (by the way, it's the best partition table format at all)
it's able to read data from FAT32 filesystem from you system partition (GPT uses partition type specific UUIDs to identify partitions)
it's possible to use more then one boot loader (it means that you can try another boot loader and your original boot method won't be affected)
you can modify your UEFI boot setting from Linux command line (reboot -> bios -> reboot is unnecessary)
all your configured boot loaders are visible in your bios boot menu (e.g. F12 for ThinkPad)

I guess that for Fedora 19 we will have rpm packages and some nice way how to integrate gummiboot to the distribution (don't forget that with UEFI you can use more boot loaders, so the official distribution boot loader as well as alternative methods maybe be supported).

My How-to:

Install gnu-efi library and EFI boot manager (for example for Fedora):

  # yum install gnu-efi efibootmgr

compile gummiboot:

  $ git clone git://anongit.freedesktop.org/gummiboot
  $ cd gummiboot
  $ make

check your partition table, /dev/sda1 is usually the system partition (from EFI point of view), use partx(8) to see more details:

  # partx -n 1 /dev/sda
  NR START     END SECTORS  SIZE NAME                 UUID
   1  2048 2050047 2048000 1000M EFI System Partition 623b2882-c50b-48af-b3f8-f19e8639b02b

the partition with FAT filesystem is mounted on /boot/efi, use findmnt(8) to see more details:

  # findmnt /dev/sda1
  TARGET    SOURCE    FSTYPE OPTIONS
  /boot/efi /dev/sda1 vfat   rw,relatime

if you have UEFI machine than the partition is probably already initialized by your distribution.

Now install the boot loader:

   # mkdir -p /boot/efi/EFI/gummiboot
   # cp gummiboot.efi /boot/efi/EFI/gummiboot/

inform your UEFI about the new boot loader:

   # efibootmgr -c -L "Gummiboot" -l '\EFI\gummiboot\gummiboot.efi'

now in your bios boot menu will be a new entry "Gummiboot". Note that UEFI uses windows-like paths, so '\' is correct. The efibootmgr(8) is a command line util to manipulate with boot EFI variables, you need kernel with /sys/firmware/efi/vars support (for example standard Fedora kernel).

Use

   # efibootmgr -v

to see more details, you can also change boot order and another things by efibootmgr.

The last step is to add entry (kernel) to gummiboot, there is a nice script in gummiboot in repository:

  loader-postinst.sh kernel-version path-to-kernel

you can run this script manually or you can add the script into /etc/kernel/postinst.d (this directory does not exist by default on Fedora, mkdir -p is your friend...), then the script will be automatically executed after new kernel install.

The script copies the kernel to /boot/efi/distroname/machine-id

where distroname is /etc/os-release and machine-id is is /etc/machine-id.

The script also creates /boot/efi/loader/entries/*.conf file with kernel command line from /etc/kernel/cmdline, paths to kernel, initrd. etc.

The result:

we have kernel in boot loader independent directory, you can use another boot loader (UEFI Shell, elilo, etc.) to read the kernel images
all is accessible for UEFI firmware on FAT filesystem (boot loader does not have to support ext4 filesystem for example)
separate config files for each kernel in /boot/efi/loader/entries/ directory
/etc/kernel/postinst.d based solution is extendable and open for alternative boot loaders.

Yes, gummiboot does not provide console to resolve possible boot problems. It seems unnecessary, you can install UEFI Shell. For more details see Arch Wiki.

Reboot :-)