2019-12-01

30: Dismiss 'soffice --convert-to' and Directly Use the UNO API

<The previous article in this series | The table of contents of this series | The next article in this series>

'soffice --convert-to' is good for only some limited cases. Why, with some diagrams. Fortunately, we can directly use the UNO API, instead, easily.

Topics


About: UNO (Universal Network Objects)
About: LibreOffice
About: Apache OpenOffice
About: Java programming language
About: C++
About: Microsoft .NET Framework
About: Python programming language

The table of contents of this article

Notation

  • 'Office' means LibreOffice or Apache OpenOffice.

Starting Context


  • The reader is a programmer (not particularly has to be any professional programmer).
  • The reader has knowledge of what 'soffice --convert-to' is (in fact, he or she does not need to know it at all (can just directly go to the UNO API)).

Target Context


  • The reader will know why 'soffice --convert-to' is good for only some limited cases and he or she can directly use the UNO API, instead, easily.
Stage Direction
Here are Hypothesizer 7, Objector 30A, and Objector 30B in front of a computer.


Orientation


Hypothesizer 7
In this article, we will know why 'soffice --convert-to' is good for only some limited cases and we can directly use the UNO API, instead, easily.

Although there are many people who try to use 'soffice --convert-to' from their programs and report that it fails with some errors or it does not do what they want (for example, it does not create a CSV file of each of all the sheets), probably, they should not use 'soffice --convert-to' at all.

Objector 30A
What's wrong with 'soffice --convert-to'?! Or what's wrong with you?!

Hypothesizer 7
. . . Sir, in fact, I do not say that 'soffice --convert-to' is wrong per se, but it is for makeshift, not-so-particular, odd jobs executed on a terminal, not for being called from any serious program.

Objector 30A
You said "Dismiss 'soffice --convert-to'" in your title!

Hypothesizer 7
Ah, that is actually a regrettable 'figure of title': expressing things accurately is one of my prime directives, but any accurate title for this article would be inevitability lengthy, which would be brutally cut by major search engines (the mutilated title would not fulfill the role). . . . What can I do instead? . . . More accurately, of course, you can legitimately use 'soffice --convert-to' in some limited cases.

Objector 30A
You are a complainer!

Hypothesizer 7
It does not matter who I am . . .

Objector 30B
You said "a programmer" in 'Starting Context', but of what programming language? It's not that any programming language will do, is it?

Hypothesizer 7
Madam, it is a righteous question. In fact, the UNO API can be directly used in Java, C++, any .NET Framework (probably, C# and Visual Basic.NET), Python, Office Basic, BeanShell, and JavaScript, but . . .

Objector 30B
That doesn't include my programming language, PHP.

Hypothesizer 7
. . . but that does not mean that any another programming language cannot gain the benefits of the UNO API, because it may be able to call some modules of one of the above programming languages. As for PHP, you should be able to create PHP extensions in C++.

Objector 30B
"you should be"? Oh, my! How I, who have not written any single line of C++, am supposed to be able to create a PHP extension in C++?

Hypothesizer 7
Well, if you cannot, I have an intention of creating one, although I have not written any single line of PHP, actually.

Objector 30B
When?

Hypothesizer 7
When a significant amount of people are convinced that they should not use 'soffice --convert-to', perhaps.

Objector 30B
. . .


Main Body


1: How 'soffice --convert-to' Works


Hypothesizer 7
This is a diagram that illustrates how 'soffice --convert-to' works if it is just called.


Objector 30A
What do you mean by "just called"?

Hypothesizer 7
It means that any Office full instance has not been started beforehand.

Objector 30A
Of course. Why would I start it beforehand?

Hypothesizer 7
. . . Let us look at the diagram. An Office full instance is started up and shut down per your 'soffice --convert-to' call, as is shown by the wave arrows. In fact, those wave arrows represent very heavy actions.

Objector 30A
Do they?

Hypothesizer 7
Yes, they do. The Office instance has to become a full-fledged Office instance in order to be able to do the file conversion, and becoming the full-fledged Office instance takes time.

Objector 30A
I specify the '--headless' option.

Hypothesizer 7
Ah, many people explicitly add that option together with the '--convert-to' option, but actually, that is not meaningful at all, practically speaking, because the '--headless' option is automatically implicitly added when the '--convert-to' option is used.

Objector 30A
Is it?

Hypothesizer 7
Yes, it is. A full-fledged Office instance is required anyway, and starting up and shutting down it is very heavy. Please look at the results of my tests for confirmation.

Objector 30A
. . . What if it is not "just called"?

Hypothesizer 7
This is a diagram that illustrates how 'soffice --convert-to' works if an Office full instance has been started up beforehand.


Objector 30A
. . . "Office client"?

Hypothesizer 7
Yes. The 'soffice' call starts a slim client that connects to the started-up-beforehand Office full instance using named pipe.

Objector 30A
And that is not any heavy action?

Hypothesizer 7
That is not so heavy, actually is far lighter than starting up and shutting up an Office full instance.

Objector 30A
However, that means that I have to create a daemon, doesn't it?

Hypothesizer 7
You do not have to necessarily create a daemon, but if you want to, it is quite easy to create a daemon or create a Windows service. In fact, I have introduced an Office daemon and an Office Windows service.

Objector 30A
. . . So, I don't have to ditch 'soffice --convert-to', but can just start up an Office full instance beforehand, after all.

Hypothesizer 7
However, any 'soffice --convert-to' Office client can connect only to an Office full instance of the same operating system user in the same computer.

Objector 30A
So?

Hypothesizer 7
So, you would not be able to seclude the file conversions workload, the files, or the permissions into a file conversions server.

Objector 30A
"a file conversions server"? Do I need that?

Hypothesizer 7
That depends on your requirements of course, but without the file conversions server, the file conversions workload could slow down the whole of your application, for example.

Objector 30A
I can just transfer any 'soffice --convert-to' command to the file conversions server via SSH or something.

Hypothesizer 7
You are right; in fact, I have tested the performance of calling 'soffice --convert-to' via SSH.

Objector 30A
Hmm . . .

Hypothesizer 7
However, a more critical matter is that you have to endure in a confinement with 'soffice --convert-to'.

Objector 30A
What do you mean by "confinement"?

Hypothesizer 7
Let us look at this diagram.


Converting any file is really just opening the converted-from file, storing the document with some document-storing properties (including the filter name, which determines the converted-to format), and closing the document.

Objector 30A
Is it?

Hypothesizer 7
Yes, it is. But you cannot fully specify the document-storing properties through 'soffice --convert-to', because some of them are not simple data like strings or numbers, but more complex data like UNO objects and sequences (how can you specify a UNO object in a command line?).

Objector 30A
Being asked "how" to me . . .

Hypothesizer 7
Besides, you will have to tailor the document if you want to accomplish some things.

Objector 30A
"some things"?

Hypothesizer 7
For example, you may want to convert an arbitrary sheet (not the first sheet) or each of all the sheets in a spread sheets document to a CSV file.

Objector 30A
I do, actually.

Hypothesizer 7
Or you may want to change the size (for example to A4 landscape) of a page in a document.

Objector 30A
Can I do that?

Hypothesizer 7
Not with 'soffice --convert-to', but yes, easily, with the UNO API.

Objector 30A
Why not with 'soffice --convert-to'?

Hypothesizer 7
Please look at the above diagram; the 'soffice --convert-to' function is a fixed function; there is no room to put in such tailoring procedures in it.

Objector 30A
. . .


2: How File Conversion with Direct Usage of the UNO API Works


Hypothesizer 7
This is a diagram that illustrates how file conversion with direct usage of the UNO API works.


As you can see, there is no starting up and shutting down of any process or making any connection, per file conversion.

Objector 30A
So, you claim that directly using the UNO API is faster, huh?

Hypothesizer 7
It is claimed by me and has been demonstrated by my tests.

Objector 30A
And you claim that I don't have to "endure in a confirmation", huh?

Hypothesizer 7
Yes, I do. Let us look at this diagram.


As it is your program, you can freely insert the "tailor the document" part and specify all the possible document-storing properties.


3: How Difficult Is Directly Using the UNO API?


Objector 30A
I understand that directly using the UNO API is the most efficient and the least restrictive way, but you know, I don't want to go to it because it is difficult.

Hypothesizer 7
In fact, it is not difficult at all, although I can guess why such a myth prevails.

Objector 30A
Then, guess!

Hypothesizer 7
To state succinctly, the official documentation is sloppy.

Objector 30A
. . . So, what should I do?

Hypothesizer 7
At least, let us understand the very basic of the basics of UNO by looking at this diagram.


Objector 30A
. . . "UNO proxy"? "UNO interface"?

Hypothesizer 7
First, let us understand that the converted-from file is not loaded into your program, but into the Office full instance.

Objector 30A
Um? Hmm, so, my program just asks the Office full instance to load the file.

Hypothesizer 7
So, the document is not handled in your program, but in the Office full instance.

Objector 30A
So, my program just asks the Office full instance to handle the document.

Hypothesizer 7
Your program asks the Office full instance to handle the document via its UNO proxies. This is a tedious (although not difficult at all) aspect of UNO: any UNO proxy corresponds to a UNO iterface, not the whole UNO object.

Objector 30A
Ah--ha.

Hypothesizer 7
For example, your program first gets an AAA UNO proxy to the document UNO object and handles (asks the Office full instance to handle, to be exact, but let me be less exact hereafter) the document UNO object in some ways, but if your program wants to handle the document UNO object in a way that is covered by the CCC UNO interface, your program will have to get a CCC UNO proxy to the document UNO object.

Objector 30A
It is tedious, certainly.

Hypothesizer 7
It is straightforward, but I cannot deny that it is tedious.

A tedious point is that each of some typical UNO objects implements many UNO interfaces (some tens).

Objector 30A
"some tens". . . . I expected only severals.

Hypothesizer 7
However, a more tedious point is that finding out what UNO interfaces a UNO object implements is tedious.

Objector 30A
Huh?

Hypothesizer 7
When you get an EEE UNO proxy as a return of a method, you know only that it is a UNO proxy of a whatever UNO object that implements the EEE UNO interface, generally speaking.

Objector 30A
Then, how can I know that the UNO object implements also the DDD UNO interface?

Hypothesizer 7
Actually, there is not any 100% sure way of knowing that, but in most cases, we can guess that based on the UNO API reference or know that programmatically.

Objector 30A
. . . Isn't there a reference that explains which UNO object implements what UNO interfaces?

Hypothesizer 7
There is no comprehensive reference on UNO components (please do not be scared away by the fact I have suddenly used the term, UNO component; I am distinctively using the terms), while the UNO API reference is on UNO interfaces.

Objector 30A
. . .

Hypothesizer 7
Anyway, as some workable samples are introduced in this site, you do not have to write your UNO API programs from scratch; if you want to save effort, you can just set your document-storing properties and document tailor based on a sample.

Objector 30A
. . .


4: How About Third-Party Tools?


Objector 30A
I can use a third-party tool, instead, can't I?

Hypothesizer 7
As I understand, any third-party tool is a wrapper of 'soffice --convert-to' or a wrapper of the UNO API.

I dismiss the former because it inevitably has the problems cited in the first section.

As for the latter, if the tool is about wrapping the core logic of file conversion, I do not see any necessity for such a wrapper.

Objector 30A
What do you mean by "the core logic"?

Hypothesizer 7
The core logic is connecting to the Office instance, opening the file, storing the document, closing the document, and disconnecting from the Office instance. It is simple enough, has been already provided in any decent working sample, and does not need to be changed at all; you can just copy the logic from the decent working sample into your code base.

Objector 30A
I don't think copying code is a good practice.

Hypothesizer 7
That is about copying code around inside your code base, in my opinion. Why does that someone somewhere in the world has created a library mean that all the others in the world have the obligation to blindly use the library as a black box?

Objector 30A
. . . Copying code is a bad practice!

Hypothesizer 7
In fact, it does not matter whether you use such a library or not: as the core logic is just a fixed routine, you can just promptly incorporate it in your program once for all somehow, and move on to your specific concern.

Objector 30A
"specific concern"? What is my "specific concern"?

Hypothesizer 7
Your specific concern should be specifying the right document-storing properties and the right document tailor for your requirements.


5: The Conclusion and Beyond


Hypothesizer 7
Plainly speaking, 'soffice --convert-to' is not for being called from any serious program, but for makeshift, not-so-particular, odd jobs executed on a terminal. That is because 'soffice --convert-to' is inefficient and restrictive.

Any third-party tool that wraps 'soffice --convert-to' should have inherited the same traits.

If you have to do what 'soffice --convert-to' cannot do, it is probable that you can do so by directly using the UNO API.

Using the UNO API is not difficult at all, but there is an obstacle that there is no comprehensive reference on UNO components.

In fact, the core logic of file conversion is simple enough to require no third-party wrapper, and our current concern is what document-storing properties and document tailor we should specify for each of our specific needs (for example, setting any file-opening password to any docx, xlsx, etc. file, setting any CSV format, setting any set of detailed PDF parameters, writing each sheet in a spread sheets document to a CSV file, setting the page sizes of any word processor document, setting the page sizes of any spread sheets document, etc.), which we will look into in some other articles.


References


<The previous article in this series | The table of contents of this series | The next article in this series>