Linux for little systems
an area Linux mainstream has been moving away from since Linus got a real job." To this end, he has released a tree called 2.6.0-test11-tiny which incorporates a large set of patches aimed at slimming down the kernel. It's worth a look as an expression of just what needs to be done if you want to run Linux on small systems.
So what's required? The -tiny patch includes, among others, the following:
- Building the kernel with the -Os compiler option, which
instructs gcc to optimize for size. This option results in a smaller
kernel; interestingly, there have also been reports that -Os
yields better performance on large systems as well, since the
resulting executable has better cache behavior.
- The 4k kernel stack patch cuts the runtime per-process memory use
significantly.
- Various patches shrink the size of internal data structures to their
minimum values. Target structures include the block and char device names hash
tables, the maximum number of swapfiles, the maximum number of
processes, the futex hash table, CRC lookup tables, and many others.
- For truly daring users, the -tiny kernel has an option to remove
printk() from the kernel entirely, along with its associated
buffers and most of the strings passed to printk(). The
space savings will be considerable; you just have to hope that the
kernel has nothing important to tell you. Strings for BUG()
and panic() calls can also be removed.
- Various subsystems which are not normally optional become so. With
the -tiny kernel, it is possible to configure out sysfs (which can
take a lot of run-time memory), asynchronous I/O,
/proc/kcore, ethtool support, core dump support, etc.
- Inline functions are heavily used in the kernel; they can improve performance, and, in some situations, the use of inline code is mandatory. Excessive use of inline functions can bloat the size of the kernel considerably, however. The -tiny kernel includes a patch which makes the compiler complain about the use of inline functions, allowing a size-conscious developer to find which ones are invoked most often.
There are almost 80 separate patches in all. Matt claims that his kernel,
when configured with a full networking stack, fits "comfortably" on a 4MB
box, which is, indeed, considered small these days. Matt has some
ambitious future plans, including cutting functionality out of the console
subsystem and (an idea that is sure to raise some eyebrows) making parts of
the kernel be pageable. It remains to be seen whether things will get that
far, but there is no doubt that making Linux work on small systems is a
worthy goal.
Posted Dec 18, 2003 2:34 UTC (Thu)
by flewellyn (subscriber, #5047)
[Link] (3 responses)
Posted Dec 18, 2003 9:35 UTC (Thu)
by gnb (subscriber, #5132)
[Link] (2 responses)
Posted Dec 18, 2003 9:49 UTC (Thu)
by phip (guest, #1715)
[Link]
Posted Dec 30, 2003 21:30 UTC (Tue)
by joern_engel (guest, #4663)
[Link]
Unless things go horribly wrong, most CPU-time is spend in userspace, not in the kernel. Therefore, the userspace will flush out most kernel instructions from the cache before switching to kernelspace again. Therefore, the cache is always cold, when it comes to the kernel.
With a cold cache, smaller code is also faster code. Your bottle-neck is the memory-bus.
Building the kernel with the -Os compiler option, which instructs gcc to optimize for size. This option results in a smaller kernel; interestingly, there have also been reports that -Os yields better performance on large systems as well, since the resulting executable has better cache behavior.Linux for little systems
Actually, this makes some sense. With the wide disparity between modern CPU speeds (blazingly fast) and memory bus speeds (rather slow), anything which helps improve cache coherence is going to improve performance greatly. It may even outweigh the improvements from "speed" optimizations such as inlining, loop unrolling, etc. Some benchmarks in this area alone would be interesting.
>Some benchmarks in this area alone would be interesting. Linux for little systems
Yes, provided they were for a system very like the one you cared about. The trouble
is there is no one right answer for a kernel expected to run on everything from
an ARM with 8k + 8k of L1 cache and nothing else to a Xeon that can probably
get the whole kernel into L2.
Or a PA-RISC that has 1.5M + 750K L1 (D & I respectively) cache and nothing else...
Linux for little systems
Actually, cache size shouldn't matter too much, as it is de-facto zero anyway. Linux for little systems