2020-01-19

34: Write All the Spreadsheets to CSV Files with LibreOffice UNO

<The previous article in this series | The table of contents of this series | The next article in this series>

All the sheets in a spread sheets document, not just the 1st sheet. Applicable also to Apache OpenOffice.

Topics


About: UNO (Universal Network Objects)
About: LibreOffice
About: Apache OpenOffice
About: Java programming language
About: C++
About: Microsoft .NET Framework
About: Python programming language
About: LibreOffice Basic
About: Apache OpenOffice Basic
About: BeanShell
About: JavaScript

The table of contents of this article


Starting Context



Target Context


  • The reader will know how to write all the sheets in any spread sheets document (that can be opened by LibreOffice or Apache OpenOffice) to CSV files from his or her program.
Stage Direction
Here are Hypothesizer 7, Objector 34A, and Objector 34B in front of a computer.


Orientation


Hypothesizer 7
In this article, we will know how to write all the sheets in any spread sheets document (that can be opened by LibreOffice or Apache OpenOffice) to CSV files from our program.

In fact, the knowledge is almost self-evident from a previous article on file conversion (the concept, a Java implementation, a C++ implementation, a C# implementation, a Python implementation, and a LibreOffice or Apache OpenOffice Basic implementation) and a previous article on specifying the CSV format, but I have decided to go an extra yard.

Objector 34A
Honestly, I don't want any annoying explanation; I want a function that I can just call.

Hypothesizer 7
Sir, I am not any believer of such one-fits-all, black-box, just-blindly-use-this functions; as one may have some special needs, it is best that he or she creates his or her own functions, in my opinion. So, my intention is to offer a material that he or she can freely process, and my explanations are for facilitating such free processing.

Objector 34A
Stop your tedious lecture, and tell me whether you have a function!

Hypothesizer 7
Well, I have a function, actually, although I cannot declare that it will certainly fit your needs.

Objector 34A
. . . How special can my needs be so that your function does not fit my needs?

Hypothesizer 7
For example, your needs may require a different CSV format for each sheet.

Objector 34A
Ah, they don't, actually.

Hypothesizer 7
Your needs may require a different CSV file directory for each sheet.

Objector 34A
They don't.

Hypothesizer 7
Your needs may require a CSV file naming rule that my function does not support.

Objector 34A
What rule does your function support?

Hypothesizer 7
Actually, it supports 2 rules: 1) the specified base name + '_' + the sheet index + '.' + the specified extension and 2) the specified base name + '_' + the sheet name + '.' + the specified extension.

Objector 34A
For example?

Hypothesizer 7
When the specified base name and the specified extension are 'Abc' and 'csv', respectively, 'Abc_0.csv', 'Abc_1.csv', . . . and 'Abc_Sheet1.csv', 'Abc_Sheet2.csv', . . ., respectively, where 'Sheet1' and 'Sheet2' are sheet names.

Objector 34A
Do the indexs have to begin at '0'?

Hypothesizer 7
Unfortunately, they do for my function.

Objector 34A
You can modify your function so that I can specify the beginning index number, can't you?

Hypothesizer 7
Of course, I could, but such requests can be quite many: someone may want the indexs in 2 increments; someone may want them in 3 digits; someone may want them in hexadecimal; someone may want them before or even inside the specified base name; someone may want '-' instead of '_', . . .

Objector 34A
Actually, I want to dispose the '_' part.

Hypothesizer 7
. . . so, I think that you should create your own function according to your needs, than expect a ready-made function to fit your needs.

Objector 34A
It doesn't matter what someone may want; you can just implement what I want!

Stage Direction
Hypothesizer 7 sighs deeply and shakes his head.

Objector 34B
Your CSV file naming rules seem fine for me. So, can I just call the function without reading your annoying explanation?

Hypothesizer 7
. . . Madam, at least, you will need to build a development environment and do some preparations, and there are some explanations on them (for building a Linux environment, for building a Windows environment, for getting a UNO objects context to a LibreOffice or Apache OpenOffice instance for any external console Java, C++, C#, or Python program, for making any LibreOffice or Apache OpenOffice instance a UNO server, for creating an Office daemon, for creating an Office Windows service, etc.).

Objector 34B
. . . Why don't you just provide an executable, which can be called as an operating system command?

Hypothesizer 7
In fact, the source files of such an executable (in fact, there are a Java version, a C++ version, a C# version, a Python version, and a LibreOffice Basic version) are included in my offer, although you will have to build the executable (except the Python version and the LibreOffice Basic version). However, anyway, an environment and some preparations are required.

Objector 34B
. . . It seems that any BeanShell or JavaScript version will not be introduced while you falsely cite them as a part of your topics.

Hypothesizer 7
Although any BeanShell or JavaScript version will not be introduced here, it can be implemented by you based on this article, which is why BeanShell and JavaScript are not out of topics of this article.

Objector 34B
. . .


Main Body


1: Getting and Building a Files Converter Sample Programs Project


Hypothesizer 7
Here is a ZIP file that contains a files converter sample programs (a Java version, a C++ version, a C# version, and a Python version are included) project.

In order to build only some of the versions, please configure a Gradle script properly according to a previous article, or delete unnecessary source directories.

In fact, the project is the project that was introduced (but has been modified since) in the previous article on files conversion (a Java implementation, a C++ implementation, a C# implementation, and a Python implementation).

How to build your development environment is explained in a previous article (here for Linux and here for Windows); how to build any sample project of this series is described in a previous article.

In fact, there is also a LibreOffice or Apache OpenOffice Basic version here.

The ZIP file contains a UNO extension, which can be registered into LibreOffice or Apache OpenOffice according to the instruction described in a previous article.

Stage Direction
Hypothesizer 7 downloads the 2 ZIP files, expands them, opens a terminal, builds the sample project, and registers the UNO extension.


2: Executing the Sample Programs


Hypothesizer 7
Before we execute the sample programs, we have to have started a LibreOffice or Apache OpenOffice instance so that it accepts connections from clients, as we can know how in a previous article.

Stage Direction
Hypothesizer 7 starts a LibreOffice instance with the port number 2002.

Hypothesizer 7
We prepare a conversion orders CSV file (tab-separated and without-text-item-quotation), each of whose lines contains these items for converting a spread sheets document to CSV files.

The converted file URL
The converted-to file URL
The document-storing filter name: 'Text - txt - csv (StarCalc)' for storing to CSV files
The converted file opening password (optional)
The document tailor name (optional)
All the sheets are written: 'true' -> all the sheets are written, 'false' -> only the 1st sheet is written (optional with the default, 'false')
The CSV files naming rule for when all the sheets are written; '0' -> the sheet indexes are used, '1' -> the sheet names are used (optional with the default, '0')
The items separator character code (optional with the default, '44', meaning the comma character)
The text item quotation character code (optional with the default, '34', meaning the double quotation character)
The encoding code (optional with the default, '76', meaning UTF-8)
All the text items are quoted: 'true' -> all the text items are quoted, 'false' -> only the has-to-be-quoted (because includes the items-separator code) text items are quoted (optional with the default, 'false')
The items are written as shown: 'true' -> the items are written as shown, 'false' -> the items are written in certain fixed formats (optional with the default, 'false')
The formulae themselves are written: 'true' -> the formulae themselves are written, 'false' -> the values of the formulae are written (optional with the default, 'false')
The hidden sheets are written: 'true' -> the hidden sheets are written, 'false' -> the hidden sheets are not written (optional with the default, 'false')

Objector 34B
"optional" means . . .

Hypothesizer 7
It means that the item itself and all the succeeding items can be omitted or the item can be empty-valued.

Objector 34B
I cannot omit only an item in the middle, of course?

Hypothesizer 7
You cannot, of course, because of the nature of CSV file: if you omit an item, you will have to omit also all the succeeding items, although you can make an item empty-valued and have some succeeding items.

Objector 34B
Ah, of course.

Hypothesizer 7
In fact, a sample conversion orders CSV file exists as the 'filesConverter/execution/ConversionOrders.csv' file in each of the ZIP files.

How to execute each version is described in the previous article (a Java implementation, a C++ implementation, a C# implementation, a Python implementation, and a LibreOffice or Apache OpenOffice Basic implementation).

Stage Direction
Hypothesizer 7 executes each version in the terminal with the current directory positioned at the sample project directory.

Objector 34B
You know, invoking Gradle incurs significant overheads; how can I invoke the program without via Gradle?

Hypothesizer 7
The Gradle invocation should have displayed the commands it has executed; please look at the last command, which is the one you should use (just, please be aware that some adjustments have to be done to the displayed one because some double quotation marks are not displayed there).

Objector 34B
. . . Well, it is setting some environment variables before it executes the program.

Hypothesizer 7
Yes, that is the point.


3: Understanding the Sample Program



3-1: The Projects Structure or the Basic Libraries Structure


Hypothesizer 7
As for the projects structure or the Basic libraries structure, please refer to the descriptions of the previous article (a Java implementation, a C++ implementation, a C# implementation, a Python implementation, or a LibreOffice or Apache OpenOffice Basic implementation).


3-2: The Convert-Each-of-All-the-Sheets-to-a-CSV-File Method


Hypothesizer 7
The 'theBiasPlanet.unoUtilities.filesConverting.FilesConverter.convertSpreadSheetsDocumentFileToCsvFiles' (or its equivalent depending on the programming language) method is the concerned convert-each-of-all-the-sheets-to-a-CSV-file function.

The method takes these arguments: the converted file URL, the converted file password, the converted to file URL base (a value like 'file:///home/${HOME}/myData/development/filesConverter/execution/FilesConverterTestConvertedTo.csv' results in converted-to file URLs like 'file:///home/${HOME}/myData/development/filesConverter/execution/FilesConverterTestConvertedTo_0.csv'), the document storing properties, the document tailor, the converted-to files naming rule code, whether the hidden sheets are written.

I assume that the method is clear enough, but I will make some annotations.

The sheets are accessed through in the "for" loop, and if you want to skip some sheets based on some criteria, you can just skip calling 'storeToURL' on the corresponding turns.

The CSV file URL for each turn is set into the 'l_augmentedTargetFileUrl' variable, and if you want your own file naming rule, you can just tweak the logic to set the variable.

Although I used a document tailor in order to write a specific sheet to a CSV file in a previous article (a Java implementation, etc.), that document tailor does not have to be used for this method for that purpose (obviously); the document tailor for this method is meant to be for other purposes.


3-3: The Create-Storing-to-CSV-File-Properties Method


Hypothesizer 7
The method, 'theBiasPlanet.unoUtilities.filesConverting.FilesConverter.createCsvFileStoringPropertiesArray' (or its equivalent depending on the programming language), is a static method that creates an array or list of storing-to-CSV-file properties, from the specified CSV format attributes (the items separator character, etc.).


4: The Conclusion and Beyond


Hypothesizer 7
Now, we know how to write all the sheets in any spread sheets document (that can be opened by LibreOffice or Apache OpenOffice) to CSV files from our program.

If you want to skip some sheets or adopt an original CSV file naming rule, you can do so by modifying the introduced function.

If you want to tweak a spread sheets document before being written to CSV files, you can do so by specifying a document tailor.

It is because we are directly using the UNO API in converting files that we can freely do such things.


References


<The previous article in this series | The table of contents of this series | The next article in this series>