Actually this task appeared to be not that hard as I first thought. To make the task more real, I was working with Russian language, i.e. with CP1251.
So at first I tried to open the documents with simple Lister. The screenshot of the experiment is provided in the attachments. Both files, odt and docx, look like binary files from this point of view. But the most important detail to notice (and it's quite small I would say) is th letters "PK" in the beginning of the data. That actually means that both files are, somewhere deep in their soul, just a zip-archive, which extension was renamed either to odt or to docx.
If we open any of the files in Total Commander using Ctrl+PageDown (Open element under the cursor) we will get a structured content with some folders and XML documents. The screenshot of the experiment is provided in the attachments.
The content that we need is situated in the file content.xml (in ODT) and word/document.xml (in DOCX).
So, in order to extract textual information from ODT ot DOCX formats, we will have to use the standard ZipArchive class and some functions to work with it.
The source code is provided in the attachments. The solution works under PHP 5.2+ and requires php_zip.dll for Windows or --enable-zip key for Linux. In case of unavailability of ZipArchive (like old PHP or lack of libraries) one could use PclZip (http://www.phpconcept.net/pclzip/index.en.php) .
References:
www.msdn.microsoft.com/en-us/library/aa338205.aspx
http://www.i-rs.ru/content/download/1447/8162/file/OpenDocument-v1.0-os.pdf