Wednesday, February 15, 2012

libblkid maintainer's brain dump

This article is about the low-level probing libblkid code, and it's really dump, nothing more ;-)

High and Low level

The library contains two APIs.
  • high-level - this is the original library code from e2fsprogs. All results are cached in the file /etc/blkid.tab (or /run/blkid/blkid.tab). The advantage is that information about LABELs and UUIDs are accessible for non-root users and the cache has positive impact on performance.

    This advantage is no more valid on many systems where all necessary information are stored in udev db, and things like LABEL and UUID are accessible by /dev/disk/by-* udev symlinks.

    This is reason why for newly written programs are recommended blkid_evaluate_* functions which are able to use udev symlinks as well as the original libblkid cache. This functionality is also accessible from command line by the blkid -L|-U command.

  • low-level - this part of the API completely bypass the cache and allows to work directly with library probing functions. The rest of this article is about the low-level part of the library.
The library contains three chains of the probing functions:
  1. superblocks
  2. partitions
  3. topology
The superblocks probing is enabled by default. The command "blkid -p -o udev" (or built-in code in udevd) enables partitions probing chain too.

There are two basic probing methods:
  • safeprobe - this is recommended method. This method cares about collisions between filesystems, raids or partition tables.
  • fullprobe - don't check for conflicts, used for example in wipefs(8)
For the superblock is available NAME=value based API only. For topology and partitions is available binary interface too. See the docs link below.

Superblocks
  • three basic "usage" groups: filesystems, raids, crypto and others
  • RAIDs (MD, LVM, ...) are probed before filesystems
  • don't check for filesystems when a RAID signature is detected
  • don't check for RAIDs or others (swap-area) on CD-ROMs
  • don't check for RAIDs on tiny devices (< 1 MiB)
  • don't read whole FAT root directory (to lookup LABEL) on tiny devices (< 1 MiB)
exceptions / extra cases:
  • MD RAID is ignored if detected within a valid partition during whole-disk probing

    [use case: partitioned disk, last partition used as a RAID member and the RAID has metadata at the end of the last partition (so end of the disk)]

  • LVM signature is ignored if another signature is detected within first 8KiB of the device (LVM wipes this area, so there should not be any filesystem superblock)

    [use case: disk with LVM, user stops to use LVM and creates a new partition table by fdisk, result is MBR and obsolete LVM signature on the same device]
Partitions
  • disabled by default, enabled for udev (see ID_PART_ENTRY_* in udev db)
  • parse partition tables (aix, minix, bsd, mbr, gpt, mac, sgi, solaris, sun, ultrix and unixware)
  • detect nested partition tables (e.g. BSD) within partitions
  • if given device is a partition (e.g. sda1) then open whole disk (e.g. sda) to read details about the partition from partition table. This feature has to be enabled by BLKID_PARTS_ENTRY_DETAILS flag.
  • partition table is ignored if a valid RAID superblock is detected at the end of the device

    [use case: partitioned RAID1 (mirror) -- the partition table is visible from underlaying devices]
Topology
  • rarely used
  • designed for mkfs-like or fdisk-like programs to get info about I/O topology
  • for kernel >= 2.6.3x uses ioctl or sysfs
  • as fallback for old kernels uses code originally from xfsprogs

Tips for users

  • please use wipefs(8) before fdisk, mkfs or mkswap. The latest version is able to remove really all possible backup signatures, partition tables and at first glance invisible things. Don't rely on mkfs developers :-)
  • think twice before you start to use some complex setups (for example partitioned RAIDs) to avoid misinterpretation by kernel or system tools.
  • don't forget that blkid without -p might returns cached results
Tips for developers

.... I'll try to keep these notes updated.

Friday, February 10, 2012

login(1) changes

I have completely refactored login(1). The new login(1) merges features from Suse login(1) back into util-linux version and is more compatible with login(1) from shadow-utils. I believe that now we have login(1) implementation which is usable in all mainstream Linux distributions.

The original util-linux login(1) code is derived from 4.3 BSD (so older than Linux kernel).

Changes:
  • PAM only

    It's obvious that in all mainstream distributions is PAM de facto standard for users authentication and maintain any non-PAM methods in login(1) is waste of time.

    If you don't like this change (really?) than you can use login(1) from shadow-utils.

  • support /etc/login.defs(5) config file

    Supported options: MOTD_FILE, LOGIN_TIMEOUT, LOGIN_RETRIES, FAIL_DELAY, TTYPERM, TTYGROUP HUSHLOGIN_FILE, DEFAULT_HOME, LOG_UNKFAIL_ENAB, ENV_PATH, ENV_ROOTPATH, ENV_SUPATH

  • print hostname in the login prompt, the default prompt is compatible with initial prompt from agetty

  • add -H for compatibility with Suse version. This option suppresses the hostname printing in the login prompt.

  • global hush mode for all accounts (enabled if /etc/hushlogins exists, but is empty). The global hush mode allows to use PAM for "Last login" message.
More details: http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/4866

Wednesday, January 25, 2012

prlimit(1)

prlimit(1) is a new util that will be available in util-linux-2.21 (now -rc1). This new util is very nice and flexible command line interface to prlimit(2) Linux syscall (supported since Linux 2.6.36).

prlimit(1) allows to get or set one or more process resource limits for given PID. When a command is given instead of PID, prlimit(1) will run this command with the given resources.

The output is flexible like output from lsblk(8) or findmnt(8). You can define the output columns, use parsable output etc.

See the default output:
  $ prlimit --pid $$
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size 0 unlimited blocks
CPU CPU time unlimited unlimited seconds
DATA max data size unlimited unlimited bytes
FSIZE max file size unlimited unlimited blocks
LOCKS max number of file locks held unlimited unlimited
MEMLOCK max locked-in-memory address space 65536 65536 bytes
MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 1024 4096
NPROC max number of processes 1024 62809
RSS max resident set size unlimited unlimited pages
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs
SIGPENDING max number of pending signals 62809 62809
STACK max stack size 8388608 unlimited bytes
or redefine the output and ask for max number of open files only:
  $ prlimit  --nofile --output RESOURCE,SOFT,HARD --pid $$
RESOURCE SOFT HARD
NOFILE 1024 4096
and now let's modify the soft limit of maximal core file size and maximal number of open files:
  $ prlimit --core=1000000: --nofile=100: --pid $$
the notation used for the limits is:
  soft:hard    specify both limits
soft: specify only the soft limit
:hard specify only the hard limit
value specify both soft and hard limits to the same value
and check the result:
  $ prlimit  --nofile --core --pid $$
RESOURCE DESCRIPTION SOFT HARD UNITS
NOFILE max number of open files 100 1024
CORE max core file size 1000000 unlimited blocks
and revert the core file soft limit:
  $ prlimit --core=unlimited: --pid $$

$ prlimit --core --pid $$
RESOURCE DESCRIPTION SOFT HARD UNITS
CORE max core file size unlimited unlimited blocks
Do you want to restrict CPU time for given command (sort(1) in this example):
   $ prlimit --cpu=10 sort -u hugefile
I think prlimit(1) is much better than the shell built-in command ulimit.

Thanks to Davidlohr Bueso who found time to implement prlimit(1) for util-linux 2.21.
-- In memory of Dennis M. Ritchie

Monday, January 9, 2012

frustrating gnome-control-center network

The "gnome-control-center network" is pretty incomplete application. Unfortunately, this application is called by gnome-shell if you want to modify your network settings. And all this is default in Fedora-16. Grrrr...

The most stupid thing is that you cannot configure wireless connections if your wireless is disabled. And if you enable wireless then you will be affected by your broken configuration...

Fortunately, old good nm-connection-editor works as expected. So, all you need is to kick off the idiotic network stuff from gnome-shell and start old good Network Manager Applet:
 mv /usr/share/gnome-shell/js/ui/status/network.js \
/usr/share/gnome-shell/js/ui/status/network.disabled

and restart gnome-shell (command 'r').

Thursday, January 5, 2012

gnome-shell is not so bad

It seems that I will be able to use GNOME 3. I love the minimalism provided by gnome-shell. It's gnome, so it's tricky to fully customize the desktop, but it's possible.

My requirements:
  • no animations (gnome "switch workspace" is pure hell...)
  • tagging -- at least for terminal (fluxbox supports this for arbitrary application, gnome has at least gnome-terminal with tabs)
  • minimalistic windows decoration
  • no window title for maximized windows
  • tiny window title for normal and modal windows
  • no max/min/close buttons for windows (I have keyboard...)
  • pretty visible focused window (e.g. green border)
  • only one tiny panel
  • WM has to remember workspace for applications (e.g. firefox = 2nd workspace)
We all love screenshots, right? ;-) My old good fluxbox here, and the "same" with GNOME 3 is here.

The first step is to install some gnome-shell extensions:
  • Native Window Placement Extension
  • Auto move windows extension
  • Disable Window Animation Extension
  • Remove User Name Extension
  • windowNavigator Extension
The next steps is to make windows decorations more minimalistic. This step is more tricky, because you have to modify window manager theme (for more details see gnome bug 594879). My theme is available at my home page.

The next step is to customize desktop files for some applications, for example I want to start gnome-terminal with --hide-menubar option, so
cp /usr/share/applications/gnome-terminal.desktop \
~/.local/share/applications/myterm.desktop
and modify Exec and Name entries in the file. Then you can add the application to your gnome-shell Favorites.

The last step is to customize keyboard shortcuts, this is simple (see "keyboard" in gnome-control center).

Note than many things like info about CPU temperature does not have to waste any place on your desktop. IMHO it's better to use keyboard shortcuts and print necessary information to the screen on demand. For example I use osd_cat:
#!/bin/bash

BATT=$(acpitool -b | awk -F ':' '/Battery/ { print $2 }' | sed 's/ //')
AC=$(acpitool -a | awk -F ':' '/AC/ { print $2 }' | sed 's/ //g')
TEMPE=$(sensors | awk '/temp/ { print $2 }' | sed 's/ //g')
FAN=$(sensors | awk '/fan/ { print $2 }' | sed 's/ //g')

(printf " Battery: %-20s\n" "$BATT"
printf " AC: %-20s\n" "$AC"
printf "Temperature: %-20s \n" "$TEMPE"
printf " Fan: %-20s\n" "$FAN") | osd_cat --delay 4 --pos bottom \
--align right --offset 45 --indent 10 \
--color green --font "-misc-fixed-*-*-*-*-20-*-*-*-*-*-*-*"

Monday, December 5, 2011

Monitor a list of currently mounted filesystems

You know that /proc/mounts and /proc/self/mountinfo contain list of currently mounted filesystems. These files is possible to monitor by poll(2) or select(2) functions. The util findmnt(8) exports this functionality to command line.

Examples:
session A (monitor):                                          
findmnt --poll
session B (event):
mount /home/fs-images/ext2.img /mnt/test

session A (findmnt output after event):
ACTION TARGET SOURCE FSTYPE OPTIONS
mount /mnt/test /dev/loop0 ext2 rw,relatime,user_xattr,acl,barrier=1
The another examples; wait until /mnt/test is unmounted:
   findmnt --poll=umount --first-only /mnt/test
Inform me about all ext2, ext3, and ext4 remounts to read-only mode:
   findmnt --poll=remount --types ext2,ext3,ext4 --options ro
You can also define output columns, for example if you want to know more about "mount --move" operations:
   # findmnt --poll=move -o OLD-TARGET,TARGET,SOURCE
OLD-TARGET TARGET SOURCE
/mnt/test /mnt/foo /dev/loop0
the event in this example was generated by "mount --move /mnt/test /mnt/foo" command.

And for example if you want info about old and new options after remount:
   # findmnt --poll=remount -o TARGET,OLD-OPTIONS,OPTIONS
TARGET OLD-OPTIONS OPTIONS
/mnt/foo ro,relatime,user_xattr,acl,barrier=1 rw,user_xattr,acl,barrier=1
this event was generated by "mount -o remount /mnt/foo -o rw,strictatime" command.

All this is available in util-linux 2.20 (e.g. Fedora 16).

Friday, November 25, 2011

wipefs(8) improvements

I finally found time to improve the command wipefs(8). The most visible change is support for partition tables. You can use wipefs(8) to remove MBR as well as GPT and many others partition tables.

The another important change (well.. bugfix) is that "wipefs -a" really erases everything what is possible to detect by libblkid (blkid(8)).

Now it calls libblkid detection code also after magic string erasing to ensure that nothing is possible to found on the device. This is important for stuff like GPT where is backup table on another place or for filesystems like FAT where is more ways to detect the superblock.

The last important change is a new command line option "-t". Now you can specify filesystem, raid or partition table name. For example
       wipefs -a -t ext4
will erase 'ext4' only. The option is interpreted in the same way how -t for mount(8) or findmnt(8), so you can specify more filesystems and you can prefix all or selected filesystems by 'no' prefix, for example:
       wipefs -a -t noext4,ext3,ext2
all but ext4, ext3 and ext2 filesystems will be erased.

If you're filesystem tools (e.g. mkfs.type) developer then you should know that now libblkid contains a new function blkid_do_wipe():
      
blkid_probe pr = blkid_new_probe_from_filename("/dev/sda1");

blkid_probe_enable_superblock(pr, true);
blkid_probe_set_superblocks_flags(pr, BLKID_SUBLKS_MAGIC);

while (blkid_do_probe(pr) == 0)
blkid_do_wipe(pr, 0);

blkid_free_probe(pr);
and all superblocks are undetectable...

By the way, it's also good idea to call wipefs -a from system installer to avoid some unexpected problems. I have seen many bug reports from people with mess on their disks (unexpected mix of partition table and raid superblocks, swap and LUKS or ReiserFS ...etc.).

The changes will be available in the next util-linux release 2.21 (beta planned next month).