How to create OpenDocument texts with PHP

The problem: take a PHP/SQL web application and add a function to export form letters. Because they might have to be edited later, the letters are not to be exported as PDFs but as ODT files for word processing.

When I implemented this I could not find documentation about this, so I wrote a small how-to.

First create a template file (I use OpenOffice) with filler text. Include all images, fields, tables, paragraph styles, etc. you want to use later. Save the file in ODT format.

A .odt file is a zip file containing all document components as XML files (as well all embedded files like pictures). Use unzip to extract the content into a subfolder.

[mschuett@dagny] ~> mkdir newdoc && unzip -d newdoc Untitled 1.odt
Archive:  Untitled 1.odt
 extracting: newdoc/mimetype
   creating: newdoc/Configurations2/statusbar/
  inflating: newdoc/Configurations2/accelerator/current.xml
   creating: newdoc/Configurations2/floater/
   creating: newdoc/Configurations2/popupmenu/
   creating: newdoc/Configurations2/progressbar/
   creating: newdoc/Configurations2/menubar/
   creating: newdoc/Configurations2/toolbar/
   creating: newdoc/Configurations2/images/Bitmaps/
 extracting: newdoc/Pictures/1000000000000181000000AD6DC45DA5.png
  inflating: newdoc/content.xml
  inflating: newdoc/styles.xml
 extracting: newdoc/meta.xml
  inflating: newdoc/Thumbnails/thumbnail.png
  inflating: newdoc/settings.xml
  inflating: newdoc/META-INF/manifest.xml

The subdirectories Configurations2 and Thumbnails as well as the file settings.xml are not required and can be deleted. The file META-INF/manifest.xml contains a list of all files, so the corresponding entries should be removed from this list.

That leaves us with three significant files: meta.xml for document metadata, styles.xml for styles and page elements (headers, footers), and content.xml for the main text. Use xmllint --format (part of libxml) to make the XML somewhat readable and then open it in an editor to take a look at the general structure and to locate your filler texts. More information about the files can be found in the free book “OASIS OpenDocument Essentials”.

After a modification the files can be zipped again to verify the resulting ODF can be opened:

[mschuett@dagny] ~> cd newdoc && zip -r ../newdoc.odt * && cd ..

Now it’s up to you to write your application data into all the right places. I simply use str_replace() to substitute a number of variables, thus I have a PHP string with the text of content.xml and place holders like this:

 <text:p text:style-name="P2">%(FirstName) %(LastName)</text:p>
 <text:p text:style-name="P1">%(AddrComplement)</text:p>
 <text:p text:style-name="P1">%(Street)</text:p>
 <text:p text:style-name="P1">%(ZIP) %(City)</text:p>
 <text:p text:style-name="P1">%(Country)</text:p>

The replacement uses two arrays — one with all place holders and one with all strings to replace them with. (Note: Because the database is still in Latin-1 but the XML files declare a UTF-8 encoding, all strings have to be converted. The call to utf8() is just shorthand for a call to mb_convert_encoding().)

$replace_from = array(
    "%(FirstName)",
    "%(LastName)",
    // etc.
);
$replace_to = array(
    utf8($dbentry['firstname']),
    utf8($dbentry['lastname']),
    // etc.
);
$content_xml = str_replace($replace_from, $replace_to, $ODT['content_xml']);

When the document’s content is thus prepared, the files have to be zipped and sent to the user. In this example I change only the file content.xml and decide to zip all other files beforehand. (Because my files contain pictures I assume this is more efficient than creating everything on the fly.)

[mschuett@dagny] ~> cd newdoc && zip -r ../template.zip * --exclude content.xml && cd ..

Finally this is the output function at the end of the PHP file:

// headers for download and correct MIME type
header('Content-disposition: attachment; filename="letter.odt"');
header('Content-Type: application/vnd.oasis.opendocument.text');
 
// this is slightly magic
File_Archive::extract(
    File_Archive::readMulti(    // read sources
        array(
            // a) an existing archive
            File_Archive::read("template.zip/"),
            // b) our memory buffers to be named
            File_Archive::readMemory($content_xml, "content.xml"),
        )
    ),
    File_Archive::toArchive(  // combine sources into one output
        'letter.odt',
        File_Archive::toOutput(false),  // send to stdout = browser
        $type = 'zip'
    )
);

First the HTTP headers are sent, so the browser knows it will receive a file in OpenDocument format (it should then ask the user whether to save the file or open it immediately in a word processor). The PEAR package File_Archive is used for all zipfile actions: first the prepared zipfile is read, then the string $content_xml is added with file name "content.xml", at last the new zipfile is sent to the user. And that’s it.

If necessary the other files can be customized in the same way. If the ODT files are not only used for printing but are archived or send to an end-user then you probably want to set a correct document creation date, generator, author information, etc. in meta.xml.

If you are looking for PHP library support of OpenDocument files then take a look at the PEAR OpenDocument class. Just like all other projects I have found it is in early alpha status, but now that Christian Weiske took over maintainership recently it seems to be the only one under active development.

Last but not least: you can use the ODF Validator to validate your documents against the ODF Specification (without having to rely on the “it is correct if OOo can open it” approach).

Comments are closed.