
Linux Kernel Debugging
By :

I'll start off by saying this: debugging is both a science and an art, refined by experience – the mundane hands-on slogging through to reproduce and identify a bug and its root cause, and (possibly) fix it. I'm of the opinion that the following few debug tips are really nothing new; that said, we do tend to get caught up in the moment and often miss the obvious. The hope is that you'll find these tips useful and return to them time and again!
Churchill famously said, "Never, never, never, give up". We say "Never, never, never, make assumptions".
Assumptions are, very often, the root cause behind many, many bugs and defects. Think back, re-read the Software bugs – a few actual cases section!
In fact (hey, I am partially joking here), just look at the word assume: it just begs saying, "Don't make an ASS out of U and ME"!
Using assertions in your code is a great way to catch assumptions. The userspace way is to use the assert()
macro. It's well documented in the man page. (We cover more on using macros within the kernel in Chapter 12, A Few More Kernel Debugging Approaches, in the Assertions, warnings and BUG() macros section).
At times, we do get lost in the twisted mazes of complex code paths. In these circumstances, it's really easy to lose sight of the big idea, the objective of the code. Try and zoom out and think of the bigger picture. It often helps spot the faulty assumption(s) that led to the error(s). Well-written documentation can be a lifesaver.
When faced with a difficult bug, try this: build/configure/get the smallest possible version of your problem (statement) to execute causing the issue or bug you're currently facing to surface. This often helps you track down the root cause of the problem. In fact, very often (in my own experience), the mere act of doing this – or even just the detailed jotting down of the problem you face – triggers you seeing the actual issue and its solution in your mind!
This paraphrased quote is by Brian Kernighan in the book The Elements of Programming Style. So, should we not use our full brainpower while writing code? Ha, of course you should... But, debugging is typically harder than writing code. The real point is this: take the trouble to first carefully do your groundwork: write a brief very high-level design document and write what you expect the code to do, at a high level of abstraction. Then move on to the specifics (with a so-called low-level design doc). Good documentation will save you one day (and blessings shall be showered upon you!).
That reminds me of another quote: An ounce of design is worth a pound of refactoring – Karl Wiegers.
Sometimes, the code can become too complex (spaghetti-like; it just smells). In many cases, just giving up and starting from scratch again, if viable, is perhaps the best thing to do.
This Zen-Beginner's Mind state also implies that we at least temporarily stop our (perhaps over-egotistical) thought patterns (I wrote this so well, how can it be wrong!?) and look at the situation from the point of view of somebody completely new to it. It is, in fact, one key reason why a colleague reviewing your code can spot bugs you'd never see! Plus, a good night's rest can do wonders.
I recall a Q&A on Quora revealing that the hardest thing a programmer does is name variables well! This is truer than it might appear at first glance. Variable names stick; choose yours carefully. As with commenting, don't go overboard either: a local variable for a loop index? int i
is just fine (int theloopindex
is just painful). The same goes for comments: they're there to explain the rationale, the design behind the code, what it's designed and implemented to achieve, not how the code works. Any competent programmer can figure that out.
It's self-evident perhaps, but we can often miss the obvious when under pressure... Carefully checking kernel (and even app) logs often reveals the source of the issue you might be facing. Logs are usually able to be displayed in reverse-chronological order and give you a view of what actually occurred; Linux's systemd journalctl(1)
utility is powerful; learn how to leverage it!
A truism, unfortunately. Still, testing and QA is simply one of the most critical parts of the software process; ignore it at your peril! The time and the trouble taken to write exhaustive test cases – both positive and negative – pays off in large dividends in the long run, helping make the product or project a grand success. Negative test cases and fuzzing are critical for exposing (and subsequently fixing) security vulnerabilities in the code base. Then again, runtime testing only tests the portions of code actually executed. Take the trouble to perform code coverage analysis; 100% code coverage – and runtime testing it is the objective! (Again, we cover more on these key points in Chapter 12, A Few More Kernel Debugging Approaches, in the An introduction to kernel code coverage tools and testing frameworks section).
Every now and then, you realize deep down that though what you've coded works, it's not been done well enough (perhaps there still exist corner cases that will trigger bugs or undefined behavior); that nagging feeling that perhaps this design and implementation simply isn't the best. The temptation to quickly check it in and hope for the best can be high, especially as deadlines loom! Please don't; there is really a thing called technical debt. It will come and get you.
If I had a penny for each time I've made really silly mistakes when developing code, I'd be a rich man! For instance, I once spent nearly half a day racking my head about why my C program would just refuse to work correctly until I realized I was editing the correct code but compiling an old version of it – performing the build in the wrong directory! (I am certain you've faced your share of such pesky frustrations.) Often, a break, a good night's sleep, can do wonders.
The word empirical means to validate something (anything) by actual and direct observation or experience rather than relying on theory.
Figure 1.6 – Be empirical!
So, don't believe the book (this one is an exception of course!), don't believe the tutorial, the article, blog, tutor, or author: be empirical – try it out and see for yourself!
Years (decades, actually) back, on my very first day of work at a company I joined, a colleague emailed me a document that I still hold dear: The Ten Commandments for C Programmers, by Henry Spencer (https://www.electronicsweekly.com/open-source-engineering/linux/the-ten-commandments-for-c-programmers-2009-04/). Do check it out. In a similar, albeit clumsier, manner, I present a quick checklist for you.
Very important! Did you remember to do the following?:
-Wall
and possibly -Wextra
or even -Werror
; yes, treating warnings as errors is going to make its way into the kernel!); eliminate all warnings as far as is possible.checksec
, lynis
, and several others), fuzzers, code coverage tools, fault injection frameworks, and so on. Don't ignore security!We shall elaborate on several of these points in the coming material.
Change the font size
Change margin width
Change background colour