2019-12-29

33: Optimally Use LibreOffice as a Files Converter (a Basic Implementation)

<The previous article in this series | The table of contents of this series | The next article in this series>

Between OpenDocument/Microsoft Office/etc. formats and to the PDF format, fast and versatile. The file can be tweaked. Applicable to also OpenOffice.

Topics


About: UNO (Universal Network Objects)
About: LibreOffice
About: Apache OpenOffice
About: LibreOffice Basic
About: Apache OpenOffice Basic

The table of contents of this article

Notation

  • 'Office' means LibreOffice or Apache OpenOffice.

Starting Context



Target Context


  • The reader will have and understand an Office Basic implementation for converting (or tweaking) any valid file directly using the UNO API.
Stage Direction
Here are Hypothesizer 7, Objector 33A, and Objector 33B in front of a computer.


Orientation


Hypothesizer 7
In this part of this article, we will see and understand an Office Basic implementation for optimally converting any valid file.

Objector 33A
You mean, I can call it as a macro in LibreOffice?

Hypothesizer 7
Yes, sir.

Objector 33A
You mean, it can be called only from inside a LibreOffice instance?

Hypothesizer 7
Actually, any Office macro can be easily invoked from outside any Office instance.

Objector 33A
Oh? How?

Hypothesizer 7
Well, if you are willing to create a small program in Java, C++, a Microsoft .NET Framework programming language (probably, C# or Visual Basic.NET), or Python, we will see how in a future article.

Objector 33A
I am not, actually.

Hypothesizer 7
If not, you can invoke any macro subroutine via an 'soffice' command, although it has to be a subroutine, not any function, and the way is somewhat inefficient and restrictive.

Objector 33A
. . . Any subroutine will not do, as I have to specify the converted-from file and the converted-to file at least. Use your imagination!

Hypothesizer 7
My imagination says that such parameters can be passed in one of some ways, for example, via a CSV file.

Objector 33A
"a CSV file"? What do you mean?

Hypothesizer 7
For example, a line in the CSV file is a list of converted-from file URL, converted-to file URL, and filter name; you put some such lines into the CSV file and execute the 'soffice' command; the subroutine reads the CSV file and performs the conversions.

Objector 33A
. . . What if I try to add a line when the subroutine is reading the CSV file?

Hypothesizer 7
Such a thing will not happen if you just sequentially prepare the CSV file, execute the 'soffice' command, and wait until the subroutine execution is finished (as the 'soffice' command execution returns asynchronously, the subroutine has to notify you of the completion of its own execution via a result file).

Objector 33A
I want to run multiple macro executions simultaneously.

Hypothesizer 7
You cannot do so anyway, because Office Basic cannot run multiple executions simultaneously.

Objector 33A
. . . I may want to prepare the CSV file for the next round while the 'soffice' command is taking a long time. Use your imagination!

Hypothesizer 7
My imagination says that you can just prepare the CSV file for the next round in another file and rename the file just before you call the 'soffice' command.

Objector 33A
. . . Anyway, I have to check the completion of the macro execution via a result file. . . . Implementing such logic is certainly tiresome!

Hypothesizer 7
I do not deem it particularly tiresome, but certainly, the logic is not efficient. So, I would create a program that calls the macro synchronously.

Objector 33B
If I call 'soffice' anyway, why won't I just call 'soffice --convert-to'?

Hypothesizer 7
Madam, the functionality of 'soffice --convert-to' is limited; in order to do what 'soffice --convert-to' cannot do, you have to directly use the UNO API.


Main Body


1: Getting and Registering a UNO Utilities Basic Library UNO Extension and a Files Converter Sample Basic Library UNO Extension


Hypothesizer 7
Here is a ZIP file that contains a files converter sample Basic library UNO extension.

Actually, the ZIP file also includes a UNO utilities Basic library UNO extension, which includes common UNO utility code that is used also in other libraries that will be introduced in some future articles.

Any UNO extension file ('.oxt' file) is really a ZIP file, which can be expanded with any ZIP tool. So, you can (and, I think, should) look inside it to be sure that there is nothing suspicious there.

Stage Direction
Hypothesizer 7 expands the 2 UNO extensions with 'unzip' in the Linux computer. The objecters look into the expanded ingredient files.

Objector 33B
Well, I'm not sure whether a file is suspicious or not . . .

Hypothesizer 7
At least, you can see that all the files are text files, cannot you?

Objector 33B
. . . Yes, actually, all the files seem to be XML files.

Hypothesizer 7
The '.xba' files are Basic source files, and the other files are configuration files.

Objector 33B
Yeah, the contents of this '.xba' file look like Basic source code, but . . . look a little odd.

Hypothesizer 7
That is because some characters like '<' and '>' are XML escaped. You know, as the file is a XML file, some characters have to be XML escaped.

Objector 33B
. . . Ah--ha, certainly, these are Basic code, although I cannot understand what they do, in an instant.

Hypothesizer 7
Please take your time to examine them to your heart's content.

Objector 33B
. . .

Hypothesizer 7
The 2 UNO extensions can be registered into Office like this: shut down the Office instance (if one exists) and execute these commands in a terminal, with the directory paths adjusted as necessary.

@ Source Code
/usr/lib/libreoffice/program/unopkg add -f -v ${HOME}/myData/development/theBiasPlanet.basicUnoUtilities.unoExtension.oxt
/usr/lib/libreoffice/program/unopkg add -f -v ${HOME}/myData/development/theBiasPlanet.basicFilesConverter.unoExtension.oxt

Stage Direction
Hypothesizer 7 shuts down the existing LibreOffice instance, opens a terminal, and executes the commands in the terminal.

Objector 33B
Hmm.

Hypothesizer 7
The Basic code is registered in 3 libraries, as you can see like this.

Stage Direction
Hypothesizer 7 starts up a LibreOffice instance; in the LibreOffice instance, he clicks 'Tools' -> 'Macros' -> 'Organize Macros' -> 'LibreOffice Basic...'; in the dialog that has appeared, he clicks 'Edit'. The Basic IDE window opens.

Hypothesizer 7
The 3 libraries, 'theBiasPlanet_coreUtilities', 'theBiasPlanet_unoUtilities', and 'theBiasPlanet_filesConverter', are in 'My Macros & Dialogs'.

Objector 33B
Hmm.

Hypothesizer 7
As the preparation for executing the files conversion subroutine, we have to set the 'FilesConverterMacroProgram_s_conversionOrdersFilePath' in the 'programs' module in the 'theBiasPlanet_filesConverter' library at the path of the CSV file that will contain some lines, each of which specifies the converted-from file URL, the converted-to file URL, and the document-storing filter name of a conversion.

And if we want to specify some specific document-storing properties, we will have to set them in the 'programs' module.

Objector 33B
"some specific document-storing properties"?

Hypothesizer 7
Yes.

Objector 33B
. . . I'm asking what they mean.

Hypothesizer 7
Converting any file is, in fact, just opening the converted-from file, storing the opened document into the converted-to file, and closing the document. The document-storing filter determines the target file format; the other document-storing properties determine the specifics for the target file format.

Objector 33B
"the specifics"?

Hypothesizer 7
Yes.

Objector 33B
. . . I'm asking what they are.

Hypothesizer 7
Specifics are . . . specifics, for example, for the CSV format, the characters encoding, the items delimiter, whether all the text items are quoted, whether the formulae are written or the values are written, . . .

Objector 33B
I understand.

Hypothesizer 7
In the 'FilesConverterMacroProgram_main' subroutine in the 'programs' module, the 'l_documentStoringPropertyNamesArray' array variable and the 'l_documentStoringPropertyValuesArray' array variable contain the names and the values of the document-storing properties, respectively; each of some filters takes a properties array as the 'FilterData' (the value of 'UnoDocumentStoringEnumerablePropertyNamesSet_c_filterData_any') property value, and the 'l_documentStoringFilterDataInPropertiesArrayPropertyNamesArray' array variable and the 'l_documentStoringFilterDataInPropertiesArrayPropertyValuesArray' array variable contain the names and the values of the properties array, respectively; each of some filters takes a string as the 'FilterOptions' (the value of 'UnoDocumentStoringEnumerablePropertyNamesSet_c_filterData_string') property value.

Please note that the elements in any property values array have to be ordered corresponding to the order of the elements in the corresponding property names array, while the order of the elements in any property names array is arbitrary.

Objector 33B
Does the PDF filter take a 'FilterData' property value?

Hypothesizer 7
Yes.

Objector 33B
. . . You are supposed to tell me what properties it takes.

Hypothesizer 7
I cannot delve into it here ( will do in a future article), but all the parameters you can set from the GUI 'File' -> 'Export as PDF...' can be set there.

Objector 33B
. . .


2: Executing the Sample Program


Hypothesizer 7
The entrance to the sample program is the 'FilesConverterMacroProgram_main' subroutine in the 'program' module in the 'theBiasPlanet_filesConverter' library.

We leave 'FilesConverterMacroProgram_s_conversionOrdersFilePath' as it is and use the 'filesConverter/execution/FileConversionOrders.csv' file included in the ZIP file (at least the file URLs have to be adjusted).

Although the tab-separated CSV file already contains a line as a template, we have to edit the line because the URLs are incomplete containing "%the user name%".

Stage Direction
Hypothesizer 7 changes "%the user name%" to the real user name.

Objector 33B
. . . "tab-separated CSV" is an oxymoron . . .

Hypothesizer 7
Certainly. I have adopted it because it is easier to handle than a comma-separated CSV that would require the items quoted because they may contain some commas.

Objector 33B
I have guessed so.

Hypothesizer 7
Then, we start up an Office instance with the 'filesConverter' directory as the current directory because of the 'FilesConverterMacroProgram_s_conversionOrdersFilePath' setting; more specifically, we open a terminal, change the current directory to the 'filesConverter' directory, and execute 'soffice' for Linux and 'soffice.bin' for Windows, like this.

Stage Direction
Hypothesizer 7 opens a terminal, change the current directory to the 'filesConverter' directory, and executes 'soffice&'.

Objector 33B
Why 'soffice.bin' for Windows?

Hypothesizer 7
Because 'soffice' in Windows forcefully changes the current directory to its own directory.

Objector 33B
Ah, anyway, I don't need to worry about such a thing if I set 'FilesConverterMacroProgram_s_conversionOrdersFilePath' at an absolute path, do I?

Hypothesizer 7
You do not, but still, note that the log file is written in the current directory (so, the Office instance has to have the file-writing permission there), unless you change the behavior.

Objector 33B
Ah.

Hypothesizer 7
Then, we can invoke the file conversion like this, for example.

@bash or cmd Source Code
/usr/lib/libreoffice/program/soffice "vnd.sun.star.script:theBiasPlanet_filesConverter.programs.FilesConverterMacroProgram_main?language=Basic&location=application"

Stage Direction
In the previously opened terminal, Hypothesizer 7 executes the command, which just ends immediately.

Objector 33B
. . . Was that a success?

Hypothesizer 7
You need to check the log file, which is 'Basic.log' in the current directory.

Stage Direction
In the previously opened terminal, Hypothesizer 7 shows the contents of the log file.

Objector 33B
"the indices of the failed conversion orders are ''"? Does that mean a success?

Hypothesizer 7
Yes. "''" means that no line has failed; otherwise, the failed lines indices will be shown there.

Objector 33B
Ah.

Hypothesizer 7
Note that as the 'soffice' command invokes the Basic subroutine asynchronously, you have to check the log file for the completion of the subroutine.


3: Understanding the Sample Program



3-1: The Libraries Structure


Hypothesizer 7
As you have noticed, there are three libraries: 'theBiasPlanet_coreUtilities', 'theBiasPlanet_unoUtilities', and 'theBiasPlanet_filesConverter'. 'theBiasPlanet_coreUtilities' contains utility code pieces that are supposed to be commonly used in all kinds of libraries; 'theBiasPlanet_unoUtilities' contains utility code pieces that are supposed to be commonly used in UNO program libraries; 'theBiasPlanet_filesConverter' is the files conversion sample library.

The files conversion functionality is contained in the 'filesConverting.FilesConverter' pseudo class in the 'theBiasPlanet_unoUtilities' library, and the 'theBiasPlanet_filesConverter' library just uses the pseudo class.

Objector 33B
"pseudo class"?

Hypothesizer 7
Yes.

Objector 33B
. . .

Hypothesizer 7
. . . I know you are demanding an explanation. Actually, it is explained in this article.

Objector 33B
. . . Are you telling me to read such a thing?

Hypothesizer 7
If you do not want to, please understand just that the 'FilesConverter_convertFile' function is the conversion function and the first argument takes a 'FilesConverter' instance that is returned by the 'FilesConverter_FilesConverter' function.

Objector 33B
If you say so . . .


3-2: Opening the Converted-from File


Hypothesizer 7
In order to open the converted-from file, first, we have to prepare a 'com.sun.star.util.URL' object.

That is done in the 'createUrlInURL (a_url As String)' method of the 'theBiasPlanet_unoUtilities.connectionsHandling.UnoObjectsContext' pseudo class.

Objector 33B
. . .

Hypothesizer 7
If you do not understand what I mean, please understand that I am talking about the 'UnoObjectsContext_createUrlInURL (a_this As UnoObjectsContext, a_url As String)' function.

Objector 33B
. . . OK.

Hypothesizer 7
In it, an instance of the 'com.sun.star.util.URLTransformer' global UNO service is created if it has not been created yet, and the 'parseStrict' method of the 'com.sun.star.util.XURLTransformer' UNO interface implemented by the instance is called with an instance of 'com.sun.star.util.URL' specified as the argument.

The reason why the 'createUrlInURL (a_url As String)' method is defined there is that only a single 'com.sun.star.util.URLTransformer' global UNO service instance is required per UNO objects context (it would be inefficient to get a new UNO service instance each time we have to prepare a URL object).

Now that we have prepared a 'com.sun.star.util.URL' object, the converted-from file is opened by a UNO dispatch command with the dispatch command URL set at the file URL and some file-opening properties specified.

How to call any UNO dispatch command is described in a previous article, but here, it is done slightly differently. While we have to get a dispatcher, it is done in the 'getFileOpeningUnoDispatcher ()' method of the 'theBiasPlanet_unoUtilities.connectionsHandling.UnoObjectsContext' pseudo class (because only a single such a dispatcher is required per UNO objects context). The dispatcher can be gotten by calling the 'queryDispatch' method of the 'com.sun.star.frame.XDispatchProvider' UNO interface implemented by the UNO desktop UNO object with the URL set at 'file:///' and the special frame name set at '_blank' (the value of 'UnoSpecialFrameNamesConstantsGroup_c_new', meaning that a new frame will be created).

Then, we call the 'dispatchWithReturnValue' method of the 'com.sun.star.frame.XSynchronousDispatch' UNO interface implemented by the dispatcher with the URL set at the converted-from file URL and some file-opening properties specified.

Objector 33A
Hmm, "UnoPropertiesHandler_buildPropertiesArray". . . . That returns an array of 'PropertyValue's from the array of property names and the array of property values . . .

Hypothesizer 7
Yes. In short, we are opening the file in the read-only, hidden, opened-in-a-new-view, silent mode.

Objector 33A
Hmm . . .

Hypothesizer 7
Please note that being read-only does not mean that the opened document cannot be modified, but means that the document cannot be stored into the original file.

Objector 33A
I see.

Hypothesizer 7
Anyway, we get the document UNO object as the return of the UNO dispatch command execution.

Objector 33A
Hmm.

Hypothesizer 7
In fact, the file does not have to be opened by that UNO dispatch command way, but I have followed the way which 'soffice --convert-to' internally uses.


3-3: Tailoring the Opened Document, If You Will


Hypothesizer 7
If you want to tailor the opened document, so you can, as you have the document UNO object.

For example, you can move a specific spread sheet to the first position in the spread sheets document, like this, where 'a_unoDocument' and 'a_tailoringArguments (0)' are the document UNO object and the index of the specific spread sheet, respectively.

@Office Basic Source Code
		Dim l_targetSpreadSheetIndex As Integer
		l_targetSpreadSheetIndex = a_tailoringArguments (0)
		If Not HasUnoInterfaces (a_unoDocument, "com.sun.star.sheet.XSpreadsheetDocument") Then
			MsgBox ("The document is not any spread sheet.")
			Exit Function
		Else
			Dim l_spreadSheetsDocument As Object
			l_spreadSheetsDocument = a_unoDocument
			Dim l_spreadSheets  As Object
			l_spreadSheets = l_spreadSheetsDocument.getSheets ()
			On Error GoTo Catch1
				Dim l_spreadSheet As Object
				l_spreadSheet = l_spreadSheets.getByIndex (l_targetSpreadSheetIndex)
				l_spreadSheets.moveByName (l_spreadSheet.getName (), GeneralConstantsConstantsGroup_c_iterationStartingNumber)
				Exit Function
			Catch1:
				MsgBox ("Error: line number -> " + Erl + ", " + Error)
				Exit Function
		End If

If you want to specify the sheet by the name, you will be able to easily infer how from the above code.


3-4: Storing the Opened Document in the Specified Format


Hypothesizer 7
The opened document can be stored in the specified format by storing the document through the corresponding filter.

The filter can be specified as a document-storing property.

The document-storing properties array is set in the variable, 'l_documentStoringPropertiesArray', in the 'main' method of the 'theBiasPlanet_filesConverter.programs.FilesConverterConsoleProgram' pseudo class ('FilesConverterMacroProgram_main' subroutine).

As you can see, the filter name is set in the 'FilterName' (the value of 'UnoDocumentStoringEnumerablePropertyNamesSet_c_filterName_string') property.

Although there is that 'Password' (the value of 'UnoDocumentStoringEnumerablePropertyNamesSet_c_password_string') property, that is not valid for any PDF file, Microsoft Word Open XML ('docx') file, Microsoft Excel Open XML ('xlsx') file, etc., but for any OpenDocument Text ('odt') file, OpenDocument Spreadsheet ('ods') file, Microsoft Word '.doc' file, Microsoft Excel '.xls' file, etc.

The password for any PDF file, etc. can be set in a filter-specific property, although we will not delve into such filter-specific particulars here (please see them in some future articles (here for Office Open XML files)).

Anyway, the opened document can be stored by calling the 'storeToURL' method of the 'com.sun.star.frame.XStorable2' UNO interface implemented by the document UNO object, with the file URL and the properties array specified.


3-5: Closing the Opened Document


Hypothesizer 7
The opened document can be closed by calling the 'close' method of the 'com.sun.star.util.XCloseable' UNO interface implemented by the document UNO object, with 'false' specified (there will be no necessity to explain the meaning of 'false' here).


3-6: How to Use the Files Converter Pseudo Class


Hypothesizer 7
Only a single instance of the files converter pseudo class ('theBiasPlanet_unoUtilities.filesConverting.FilesConverter') is meant to exist per UNO objects context, although creating an instance per file conversion would not be any problem, except efficiency-wise.

As the 'convertFile' method accepts a document tailor name and the arguments for the document tailor, the opened-document can be tailored before it is stored. . . . In another programming language sample, the 'convertFile' method accepts the tailoring logic by accepting a document tailor subclass instance, but that mechanism does not work for this Basic sample because Basic does not support subclassing, or classing itself in fact. . . . So, we have to directly modify the 'tailor (a_unoDocument As Object, a_tailorName As String, a_tailoringArguments () As Variant)' method of the 'UnoDocumentTailor' pseudo class to add the logic for a specific needs, which (the logic) is selected by the 'a_tailorName' argument.

In fact, the 'UnoSpreadSheetsDocumentMoveSpecifiedSheetTo1stPositionTailor' tailor name specified in the sample program corresponds to the logic that moves the specified spread sheet to the first position.


4: The Conclusion and Beyond


Hypothesizer 7
Now, we understand how to convert any valid file directly using the UNO API in Office Basic.

As we have gained access to the document UNO object, we can even tailor the document before we store the document in the desired format, if we know how to manipulate such documents (to set the size of any page of any word processor document, to set the size of any page of any spread sheets document, to set any editing password into an Office document file (for the Office Open XML formats), etc.).

If we want to tweak the file rather than convert it, we can just store the tailored document over the original file in the original file format (a file-opening property will have to be tweaked to be not read-only, of course).

It is so easy that we do not need to use any third party tool or 'soffice --convert-to' enduring some inconveniences. Directly using the UNO API is the most efficient, least restrictive way.


References


<The previous article in this series | The table of contents of this series | The next article in this series>