on OpenSSL and documentation …

I think OpenSSL needs a documentation project. My first week of GSoC coding was dedicated to transport-tls, so I started with establishing a TLS connection and accessing different parts of the X.509 certificates to check them. I would have thought these are basic tasks for every TLS-enabled application and yet I found this unexpectedly difficult.

A typical task (like “how do I get the commonName from this?”) starts with fgrep -r xyz ./openssl, looking for likely function names in the .h files. Since many functions are not documented in man pages this is followed by grepping all #defines for it (preprocessor macros are used extensively) and finding the real implementation with source code to figure out if it does what I need. Sometimes the next step is even copying/reimplementing the found function, because often there are functions to output a structure to a BIO (the I/O abstraction layer) but not to copy its data as a string to work with it.

Several factors contribute to that:

  • There is more than one way to do it. You can use BIOs for everything or manage sockets and file descriptors yourself; or you can call (e.g.) bio_print_NID(x) or get_data_by_NID(x) or get_data_by_OBJ(NID2OBJ(x)). While it is good that every object has a pretty complete set of methods/functions, this makes it hard to reuse existing code or follow a documentation.
  • Documentation structure. Most parts of OpenSSL are object-oriented. It seems to me that a Javadoc-like documentation is better for that, because it maps the class/object structure and allows you to see what data is held and which methods exist to access it. The man page structure seems to be more apropriate for things like system calls where the functions are mostly independent and functionally orthogonal. OpenSSL shows this because although the existing man pages are pretty good, those written for a single function give too little information for the larger task, and many man pages describe 5-10 functions at once because they access the same object.
  • Knowledge not written down. The OpenSSL mailing lists probably contain more information than the man pages. Searching the lists answers many questions because they were asked before and quickly answered by the developers. — Their answers just did not make it into a documentation.
  • Not always clear APIs. Some technically unnecessary access methods are not implemented, probably because the developers knew that they could just access the underlying struct. I also found several code snippets (even given as advice on the mailing list) that implicitly assume data types to be equal or direct access to certain structure fields. This makes it harder to find the right methods.

Of course I know how an open source project works, and I am not the one to complain about missing functions because I do not have the time to implement them either (although I do think it is time to finally IPv6-enable the network-BIOs ;)

But OpenSSL seems to be too important to be left with so little good documentation on how to use it correctly (and securely). This all reminds me of Cyrus SASL, which has similar problems because we all have to rely on it but really few people understand it :-(

Comments are closed.