Testing

Here are some thoughts about krb5’s regression test suite.  This is based mostly on my experience with krb5, Subversion, and other projects, and I’m sure these ideas could be greatly refined through research in the field.

For the most part, we have two different kinds of tests in the krb5 tree: unit tests and system tests.  The unit tests are typically in the form of C source files beginning with “t_”, which are compiled and (usually) executed when you run “make check”.  Sometimes the test program is self-contained, sometimes it produces output which is compared to an expected output file.  In a few cases, test programs are not executed (sometimes because they are merely tools to facilitate manual testing) or are executed but produce output which is not verified.

Partly because the framework for unit testing is so ad hoc, these unit tests are easy to write, and are popular among krb5 developers for that reason.  The primary challenges for unit testing in krb5 are isolation, coverage, and organization:

  • By isolation, I mean the difficulty of testing components which talk to the network, to a database, or something else much more complicated than the component itself.  In the Java web programming world, it is common to use “inversion of control” to facilitate unit testing.  Instead of referring to lower-level modules directly, classes are constructed with references to the network object or the database object or whatever.  During unit testing, the classes can be constructed with dummy versions of those dependencies which are rigged to produce the desired results, or even to yield fake errors to exercise failure paths.  That’s a bit harder to do in C, unfortunately, so in krb5 a lot of code is bypassed in the unit tests and tested only by system tests.
  • By coverage, I partly mean the amount of code not covered by unit tests, and partly mean the difficulty in measuring what is covered.  I made some progress on measuring coverage by bringing back partial support for static linking, which allows the use of gcov.
  • By organization, I mean that the ad hoc nature of our unit tests make them inflexible.  There’s no easy way to run unit tests without system tests, or to run all the unit tests and produce a report of which succeeded and which failed (instead, “make check” simply aborts on the first unit test failure), and no way to identify a particular unit test.  I’m not sure how important this problem really is.

Unit testing is great where it exists, because it allows code to be improved with confidence.  It’s no fun to be staring at a grotty function using outdated infrastructure and idioms, and knowing that if you bring it up to date you might introduce some subtle bug because the code has no tests.

Whereas unit tests exercise small isolated pieces of code, system tests exercise complete programs.  Most of our system tests are implemented in tcl and run in the dejagnu framework.  expect and dejagnu do not receive a lot of love and are sometimes buggy on any given machine, and there aren’t very many developers who are excited to learn more about tcl in order to write more krb5 system tests.  When I think about replacement infrastructure for dejagnu, I think about the following challenges:

  • Ease of setup and teardown: krb5 programs operate in an environment consisting of a KDC, a client, and (in many cases) a server, and in more complicated tests there may be multiple KDCs.  Test cases need to be able to construct environments to run programs in with minimal boilerplate.  This basically means that the testing infrastructure needs to be extensible with library functions like dejagnu is using tcl.
  • Program interaction: expect automates the footwork of testing programs which interact with the user via the tty.  Either our replacement infrastructure needs to duplicate this functionality, or we need to structure our programs to avoid the need for tty interaction in test cases.  (That’s probably easier now that we are unbundling the rlogin/telnet/ftp applications and their system tests.)
  • Output usefulness: our current dejagnu test suites output files named krb.log and dbg.log.  I have not been blown away by the accessibility of this information.  Hopefully any replacement infrastructure would be able to produce tidier and more useful debugging output.
  • Debuggability of test failures: when a system test fails, what does a developer  have to do in order to execute the relevant code inside a debugger?  For our current dejagnu tests, the answer varies from slightly annoying for the tests/dejagnu tests (add “spawn_shell” to the test case, figure out the exact command being executed, and execute it by hand under gdb in the spawned shell) to downright aggravating for the kadmin tests (add a sentinel loop to the appropriate part of the test case, gdb attach to the tcl interpreter in which the test is being run via bindings, set a breakpoint, touch a file to deactivate the sentinel loop, and continue the interpreter).  Any replacement infrastructure should have a decent answer to this question.
  • Performance: because of the amount of setup and teardown involved with each test case, system testing can be expensive.  In our case, because our software was originally designed to run on Vaxes, the actual setup and teardown costs are minimal, but the test suite can be slow because of sleep() statements peppered around the test suite, the delays from which are multiplied by multiple test passes.  We need to avoid these.
  • Barriers to entry: any reasonable system testing infrastructure necessarily involves a lot of locally built infrastructure to handle all the problems mentioned above.  How hard will it be for developers to come up to speed on all this machinery in order to write new tests?  The answer depends mostly on the quality of internal documentation.

I don’t have ready solutions in mind to these challenges at this time.  Our preferred scripting language at this time appears to be Python, so future developments for the test suite infrastructure will probably lean in that direction.

Leave a Reply

You must be logged in to post a comment.