2019-07-07

18: Optimally Use LibreOffice as a Files Converter (a C++ Implementation)

<The previous article in this series | The table of contents of this series | The next article in this series>

Between OpenDocument/Microsoft Office/etc. formats and to the PDF/CSV format, fast and versatile with the document possibly tweaked. A workable sample.

Topics


About: UNO (Universal Network Objects)
About: LibreOffice
About: Apache OpenOffice
About: C++

The table of contents of this article


Starting Context



Target Context


  • The reader will understand how to implement file conversion directly using the UNO API in C++, and will have a workable sample.

Orientation


There is an article on how to build any sample program of this series.

There is an article on how to create a LibreOffice or Apache OpenOffice daemon.

There is an article on how to create a LibreOffice or Apache OpenOffice Microsoft Windows service.

There is an article on encrypting an Office file with any opening password with UNO.

There is an article on setting any editing password into an OpenDocument file with UNO.

There is an article on setting any editing password into a binary Word/Excel file with UNO.

There is an article on setting any editing password into an Office Open XML file with UNO.

There is an article on setting the size of any page of any word processor document with UNO.

There is an article on setting the size of any page of any spread sheets document with UNO.

There is an article on setting any page property of any word processor document with UNO.

There is an article on setting any page property of any spread sheets document with UNO.

There is an article on exporting any office document into a PDF file as per full specifications with UNO.

There is an article on writing any spread sheet to a CSV file in any format with UNO.

There is an article on writing all the spread sheets of any spread sheets document to CSV files with UNO.


Main Body

Stage Direction
Here are Special-Student-7, Morris (a C++ programmer), and Chloe (a C++ programmer) in front of a computer.


0: A Note: 'soffice --convert-to' Will Be Never Used


Special-Student-7
While it is clearly stated that the concept part of this article is supposed to have been read, I can guess that a goodly portion of the readers have not and will not read it, and considering the fact that many people are fixated on using 'soffice --convert-to', I feel it appropriate to remind you again that 'soffice --convert-to' will be never used here.

The reason is elaborated in the concept part.


1: An Implementation of the File Conversion Logic with the UNO API


Special-Student-7
Supposing that you already know what you need to know from the concept part of this article, let us immediately see a C++ implementation of the file conversion logic with the UNO API, where 'l_underlyingRemoteUnoObjectsContextInXComponentContext' is a UNO objects context to the LibreOffice or Apache OpenOffice instance, 'l_originalFileUrl' is the URL of the original file, 'l_targetFileUrl' is the URL of the target file, and 'l_fileStoringFilterName' is the file-storing-filter name.

@C++ Source Code
// header Start
#ifndef __theBiasPlanet_unoUtilitiesTests_fileConvertingTest1_Test1Test_hpp__
	#define __theBiasPlanet_unoUtilitiesTests_fileConvertingTest1_Test1Test_hpp__
	
	#include <optional>
	#include <regex>
	#include <com/sun/star/util/URL.hpp>
	
	using namespace ::std;
	
	namespace theBiasPlanet {
		namespace unoUtilitiesTests {
			namespace fileConvertingTest1 {
				class Test1Test {
					private:
						static regex const c_urlRegularExpression;
						static ::com::sun::star::util::URL getUrlInURL (string const & a_url);
					public:
						static int main (int const & a_argumentsNumber, char const * const a_argumentsArray []);
				};
			}
		}
	}
#endif

// header End

// source source Start
#include "theBiasPlanet/unoUtilitiesTests/fileConvertingTest1/Test1Test.hpp"
#include <iostream>
#include <com/sun/star/beans/PropertyState.hpp>
#include <com/sun/star/beans/PropertyValue.hpp>
#include <com/sun/star/frame/XDesktop.hpp>
#include <com/sun/star/frame/XDispatchProvider.hpp>
#include <com/sun/star/frame/XStorable2.hpp>
#include <com/sun/star/frame/XSynchronousDispatch.hpp>
#include <com/sun/star/uno/Any.hxx>
#include <com/sun/star/uno/Reference.hxx>
#include <com/sun/star/util/XCloseable.hpp>
~
#include "theBiasPlanet/unoUtilities/stringsHandling/UnoExtendedStringHandler.hpp"
~

using namespace ::std;
using namespace ::com::sun::star::beans;
using namespace ::com::sun::star::frame;
using namespace ::com::sun::star::lang;
using namespace ::com::sun::star::uno;
using namespace ::com::sun::star::util;
~
using namespace ::theBiasPlanet::unoUtilities::stringsHandling;
~

namespace theBiasPlanet {
	namespace unoUtilitiesTests {
		namespace fileConvertingTest1 {
			// the 'com.sun.star.util.URLTransformer' UNO service can be used if you want to, as I do this manually.
			::com::sun::star::util::URL Test1Test::getUrlInURL (string const & a_url) {
				OUString const l_urlInOUString (UnoExtendedStringHandler::getOustring (a_url));
				OUString const l_emptyInOUString;
				int l_portNumber = 0;
				string::const_iterator l_urlIteratorAtStart (a_url.cbegin ());
				string::const_iterator l_urlIteratorAtEnd (a_url.cend ());
				smatch l_regularExpressionMatcher;
				if (regex_search (l_urlIteratorAtStart, l_urlIteratorAtEnd, l_regularExpressionMatcher, c_urlRegularExpression)) {
					OUString const l_protocolInOUString (UnoExtendedStringHandler::getOustring (l_regularExpressionMatcher [1]));
					OUString const l_pathInOUString (UnoExtendedStringHandler::getOustring (l_regularExpressionMatcher [2]));
					return ::com::sun::star::util::URL (l_urlInOUString, l_urlInOUString, l_protocolInOUString, l_emptyInOUString, l_emptyInOUString, l_emptyInOUString, l_portNumber, l_pathInOUString, l_emptyInOUString, l_emptyInOUString, l_emptyInOUString);
				}
				return ::com::sun::star::util::URL (l_urlInOUString, l_urlInOUString, l_emptyInOUString, l_emptyInOUString, l_emptyInOUString, l_emptyInOUString, l_portNumber, l_emptyInOUString, l_emptyInOUString, l_emptyInOUString, l_emptyInOUString);
			}
			
			int Test1Test::main (int const & a_argumentsNumber, char const * const a_argumentsArray []) {
				~
						string l_originalFileUrl (a_argumentsArray [3]);
						string l_targetFileUrl (a_argumentsArray [4]);
						string l_fileStoringFilterName (a_argumentsArray [5]);
						
						string l_com_sun_star_frame_theDesktopSingletonUrl ("/singletons/com.sun.star.frame.theDesktop");
						Reference <XDispatchProvider> l_underlyingUnoDesktopInXDispatchProvider (* ( (Reference <XInterface> *) (l_underlyingRemoteUnoObjectsContextInXComponentContext->getValueByName (UnoExtendedStringHandler::getOustring (string ("/singletons/com.sun.star.frame.theDesktop"))).getValue ())), UNO_QUERY);
						Reference <XSynchronousDispatch> l_underlyingFileOpeningUnoDispatcherInXSynchronousDispatch (l_underlyingUnoDesktopInXDispatchProvider->queryDispatch (getUrlInURL ("file:///"), UnoExtendedStringHandler::getOustring (string ("_blank")), -1), UNO_QUERY);
						
						::com::sun::star::util::URL l_originalFileUrlInURL (getUrlInURL (l_originalFileUrl));
						Sequence <PropertyValue> l_unoDocumentOpeningPropertiesSequence (4);
						l_unoDocumentOpeningPropertiesSequence [0] = PropertyValue (UnoExtendedStringHandler::getOustring (string ("ReadOnly")), -1, Any (true), PropertyState_DIRECT_VALUE);
						l_unoDocumentOpeningPropertiesSequence [1] = PropertyValue (UnoExtendedStringHandler::getOustring (string ("Hidden")), -1, Any (true), PropertyState_DIRECT_VALUE);
						l_unoDocumentOpeningPropertiesSequence [2] = PropertyValue (UnoExtendedStringHandler::getOustring (string ("OpenNewView")), -1, Any (true), PropertyState_DIRECT_VALUE);
						l_unoDocumentOpeningPropertiesSequence [3] = PropertyValue (UnoExtendedStringHandler::getOustring (string ("Silent")), -1, Any (true), PropertyState_DIRECT_VALUE);
						Sequence <PropertyValue> l_unoDocumentStoringPropertiesSequence (2);
						l_unoDocumentStoringPropertiesSequence [0] = PropertyValue (UnoExtendedStringHandler::getOustring (string ("FilterName")), -1, Any (UnoExtendedStringHandler::getOustring (l_fileStoringFilterName)), PropertyState_DIRECT_VALUE);
						l_unoDocumentStoringPropertiesSequence [1] = PropertyValue (UnoExtendedStringHandler::getOustring (string ("Overwrite")), -1, Any (true), PropertyState_DIRECT_VALUE);
						::com::sun::star::uno::Any l_underlyingOriginalUnoDocumentInAny (l_underlyingFileOpeningUnoDispatcherInXSynchronousDispatch->dispatchWithReturnValue (l_originalFileUrlInURL, l_unoDocumentOpeningPropertiesSequence));
						bool l_hasSucceeded (false);
						if (l_underlyingOriginalUnoDocumentInAny.hasValue ()) {
							Reference <XStorable2> l_underlyingOriginalUnoDocumentInXStorable2 (* ( (Reference <XInterface> *) (l_underlyingOriginalUnoDocumentInAny.getValue ())), UNO_QUERY);
							try {
								l_underlyingOriginalUnoDocumentInXStorable2->storeToURL (UnoExtendedStringHandler::getOustring (l_targetFileUrl), l_unoDocumentStoringPropertiesSequence);
								l_hasSucceeded = true;
							}
							catch (exception const & l_exception) {
								throw l_exception;
							}
							Reference <XCloseable> (l_underlyingOriginalUnoDocumentInXStorable2, UNO_QUERY)->close (false); 
						}
						else {
							cout << string ("### The original file: '") << l_originalFileUrl << string ("' cannot be opened.") << endl << flush;
						}
				~
			}
		}
	}
}

// source source End

Chloe
. . . Hmm, that is simple enough.

Morris
I agree; if you have any trouble with that simple code, you can't be any C++ programmer.

Special-Student-7
Note that that "UnoExtendedStringHandler::getOustring" is my utility class method that creates an instance of '::rtl::OUString', which is the C++ mapping of UNO string; I basically do not use such utility classes in such explanation code, but that method is an exception, because creating a '::rtl::OUString' instance is too tedious to show in each explanation code (tedious because I have to do the from-UTF-8-to-UTF-16 conversion). You do not need to use that my utility class method, or the utility class is included in my sample introduced later.

Morris
We have already heard that in a previous article.

Special-Student-7
If somebody has a trouble, probably it will be about getting a UNO objects context, which is explained in an article (for just or for a scrupulous way).

The above code is doing exactly what I said in a section of the concept part of this article: open the original file, store the opened document in the target format, and close the opened document.

'dispatchWithReturnValue' is opening the original file, 'storeToURL' is storing the opened document, and 'close' is closing the opened document.

'l_underlyingFileOpeningUnoDispatcherInXSynchronousDispatch' is a UNO dispatcher and does not need to be gotten per file.

Possible filter names are cited in the concept part of this article.


2: On Properties for Opening or Storing


Special-Student-7
As is seen in the above code, you can set some properties for opening the file. In fact, these are the specifications of the possible properties, where the datum types are C++ types (instead of UNO types). Note that each datum should really be wrapped in a '::com::sun::star::uno::Any' instance.

The nameThe datum typeThe description
ReadOnlyboolwhether the opened document cannot be written back into the original file: 'true'-> cannot, 'false'-> can
Hiddenboolwhether the opened document is hidden: 'true'-> hidden, 'false'-> not hidden
OpenNewViewboolwhether the document is opened in a new view: 'true'-> in a new view, 'false'-> not in a new view
Silentboolwhether the document is opened silently: 'true'-> silently, 'false'-> not silently
Password::rtl::OUStringthe opening password

A property you may be interested with is the opening password.

Chloe
So, I can convert files that are protected by the opening passwords all right.

Special-Student-7
Please note that UNO 'string' does not officially accept 'null', so, if there is no opening password, the property itself should not be specified, although 'null' might happen to be accepted.

And please note that 'ReadOnly' does not mean that the opened document cannot be modified, but means that the document cannot be written back into the original file, so, it does not need to be 'false' in order for the document to be tweaked and written into a new file.

You can set some properties also for storing the document. In fact, these are the specifications of the possible properties, where the datum types are C++ types (instead of UNO types). Note that each datum should really be wrapped in a '::com::sun::star::uno::Any' instance.

The nameThe datum typeThe description
FilterName::rtl::OUStringthe filter name
FilterDatadepends on the casethe filter-specific data for some filters: not always but often an array of 'com.sun.star.beans.PropertyValue'
FilterOptions::rtl::OUStringthe filter-specific data for some filters
AsTemplateboolwhether the document is stored as a template: 'true'-> as a template, 'false'-> not as a template
Author::rtl::OUStringthe author name
DocumentTitle::rtl::OUStringthe document title
EncryptionDatacom.sun.star.beans.NamedValue []the encryption data for some filters
Password::rtl::OUStringthe opening password for some filters
CharacterSet::rtl::OUStringthe characters set
Versionshortthe version number
Comment::rtl::OUStringthe version description
Overwriteboolwhether the file is overwritten: 'true' -> overwritten, 'false' -> not overwritten
ComponentDatadepends on the casesome filter-specific data: not always but often an array of 'com.sun.star.beans.NamedValue'
ModifyPasswordInfodepends on the casethe editing password data for some filters

The opening password is really nothing but encryption data; the reason why there are both of 'EncryptionData' and 'Password' is that some filters use one and some other filters use the other; for details, please refer to another article.

The situation concerning setting an editing password is rather complicated: the editing password has to be hashed by a format-specific algorithm and the hash has to be set into a format-specific location, which is not necessarily inside the file-storing properties. . . . The details are explained in an article for OpenDocument files, for Microsoft binary files, and for Office Open XML files.

Morris
Well, I personally don't deem editing password worth such trouble, because it is not so much a security measure against malicious tampering as just for not to carelessly happen to edit the file, but I don't sympathize with people who are so careless.

Special-Student-7
That is a viable policy I think, but anyway, as my workable sample has implemented the logic, someone can use it, if he or she will.

As for what exactly can be set into those "filter-specific" properties like 'FilterData' and 'FilterOptions', it cannot be explained in this article for the general logic of file conversion: please refer to some other articles like for PDF files and for CSV files.


3: On Tweaking the Document


Special-Student-7
The target file may not be satisfactorily controlled only via the file-storing properties.

Then, you can tweak the opened document.

Chloe
Hm, I understand that the opened document can be tweaked via that "l_underlyingOriginalUnoDocumentInXStorable2" UNO object, theoretically speaking, I mean.

Special-Student-7
How exactly can be tweaked is of course document-specific and requirement-specific, and please refer to some other articles like for tweaking page properties of word processor documents (here and here) and for tweaking page properties of spread sheets documents (here and here). I intend to tell more in the series.


4: Here Is a Workable Sample


Special-Student-7
A workable sample can be downloaded from here.

The sample program takes as an argument the path of a JSON file (in fact, it can take multiple JSON file paths) that contains file conversion orders, and fulfills the orders.

The specifications of the JSON file are like these.

@JSON Source Code
[
	{Disabled: %whether this record is disabled or not -> 'true' or 'false'%, 
	OriginalFileUrl: %the original file URL, in which any operating system environment variable can be used like '%{HOME}'%, 
	TargetFileUrl: %the target file URL, in which any operating system environment variable can be used like '%{HOME}'%, 
	FileStoringFilterName: %the file-storing-filter name%, 
	OpeningPassword: %the opening password for the original file%, 
	Title: %the target file title%, 
	EncryptingPassword: %the encrypting password for the target file%, 
	EditingPassword: %the editing password for the target file%, 
	CertificateIssuerName: %the certificate issuer name for signing the target file%, 
	CertificatePassword: %the certificate password for signing the target file%, 
	SignatureTimeStampAuthorityUrl: %the time stamp authority URL for signing the target file%, 
	DocumentTailorNames: [%a document tailor name%, . . .], 
	AllSpreadSheetsAreExported: %whether all the spread sheets are exported or not: 'true' or 'false'%, 
	TargetFileNamingRule: %the naming rule for target CSV files: '0'-> each CSV file is named with the sheet index, '1'-> each CSV file is named with the sheet name%, 
	CsvItemsDelimiterCharacterCode: %the character code of the CSV items delimiter: '44'-> ',' for example%, 
	CsvTextItemQuotationCharacterCode: %the character code of the CSV text item quotation: '34'-> '"' for example%, 
	CsvCharactersEncodingCode: %the CSV characters encoding code: '76' -> UTF-8, '65535' -> UCS-2, '65534' -> UCS-4, '11' -> US-ASCII, '69' -> EUC_JP, '64' -> SHIFT_JIS%, 
	CsvAllTextItemsAreQuoted: %whether all the CSV text items are quoted or not: 'true' or 'false'%, 
	CsvContentsAreExportedAsShown: %whether the CSV contents are exported as shown on the sheets: 'true' or 'false'%, 
	CsvFormulaeThemselvesAreExported: %whether the sheet cell formulae themselves are exported into the CSV files: 'true' or 'false'%, 
	HiddenSpreadSheetsAreExported: %whether also the hidden spread sheets are exported%}, 
	. . .
	{. . .}
]

Morris
"tailor"? I don't remember such a term.

Special-Student-7
"tailor" is a term I have invented and means an object tasked with tweaking the document in a specific way.

Morris
So, I am meant to create ones myself.

Special-Student-7
I have prepared a few sample tailors, but yes, you will have to create ones yourself if you have specific needs.

Morris
I'll see.

Chloe
Am I supposed to specify the class name?

Special-Student-7
No, the object name: there may be some multiple tailors from a class, constructed with different parameters.

In case the orders are not from such a file, but from like Web requests, each order can be constructed into an in-memory JSON record, which can be passed into the 'executeConversionOrder' method.

If you need additional capabilities in the JSON record, you can modify the 'executeConversionOrder' method together with the 'FileConversionOrderExtendedJsonDatumParseEventsHandler' JSON parse-events-handler.

I have made the sample to accept multiple orders files, which are processed concurrently, in order to demonstrate that conversions can be done concurrently in this optimal way.


4-1: Building the Sample Project


Special-Student-7
How to build the sample project is explained in an article.

Especially, note that the projects contain also implementations in some other programming languages, which you, a C++ programmer, will probably not need; how to ignore such unnecessary code is explained there.

The main project is 'filesConverter'.


4-2: Executing the Sample Program


Special-Student-7
Before we execute the sample program, we have to have started a LibreOffice or Apache OpenOffice instance so that it accepts connections from clients, as we have learned how in a previous article.

Stage Direction
Special-Student-7 starts a LibreOffice instance with the port number 2002.

Special-Student-7
We need a conversion orders JSON file (in the specifications cited above), and we will use the sample JSON file included in the archive file as 'filesConverter/data/FileConversionOrders.json', but at least the file URLs inside it have to be adjusted.

We can execute the sample program like this with the current directory positioned at the sample project directory.

@cmd Source Code
gradle i_executeCplusplusExecutableFileTask -Pi_commandLineArguments="\"socket,host=localhost,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext\" \"data/FileConversionOrders.json\""

Stage Direction
Special-Student-7 executes the command in the terminal with the current directory positioned at the sample project directory.

Special-Student-7
We can just change the host name from 'localhost' if the LibreOffice or Apache OpenOffice instance is in a remote host, if some firewalls do not block the communication, of course.

And you can specify multiple JSON files like this.

@cmd Source Code
gradle i_executeCplusplusExecutableFileTask -Pi_commandLineArguments="\"socket,host=localhost,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext\" \"data/FileConversionOrders.json\" \"data/FileConversionOrders2.json\""


4-3: Understanding the Sample Code


Special-Student-7
Note that the sample code uses my utility projects ('coreUtilitiesToBeDisclosed' and 'unoUtilitiesToBeDisclosed'), which include code that is not directly related to the work of file conversion; please understand that it is not so easy to excise only irrelevant parts of the utility projects, because the code is intertwined.

The file conversion logic is in the 'convertFile' method (except for converting to CSV files) or the 'convertSpreadSheetsDocumentFileToCsvFiles' method (for converting to CSV files) of the '::theBiasPlanet::unoUtilities::filesConverting::FilesConverter' class, which (the method) takes the document tailors and the file-storing properties, among some other things.

In order to install a new document tailor, you have to have the class as a '::theBiasPlanet::unoUtilities::documentsHandling::UnoDocumentTailor' child and register an instance of the class with its unique name into the 'l_unoDocumentTailorNameToUnoDocumentTailorMap' variable of the '::theBiasPlanet::filesConverter::programs::FilesConverterConsoleProgram' class.


References


  • Apache OpenOffice Wiki. (2014/01/02). Apache OpenOffice Developer's Guide. Retrieved from https://wiki.openoffice.org/wiki/Documentation/DevGuide/OpenOffice.org_Developers_Guide
<The previous article in this series | The table of contents of this series | The next article in this series>