Writing in a 'tab-separated value', 'fixed-length record', 'text-value-quoted', and/or so on format, using LibreOffice or Apache OpenOffice and UNO.
Topics
About: UNO (Universal Network Objects)
About: LibreOffice
About: Apache OpenOffice
About: the Java programming language
About: C++
About: Microsoft .NET Framework
About: the Python programming language
About: LibreOffice Basic
About: Apache OpenOffice Basic
About: BeanShell
About: JavaScript
The table of contents of this article
- Starting Context
- Target Context
- Orientation
- Main Body
- 1: A Note
- 2: The Filter Name for Storing Any Spread Sheets Document to a CSV File
- 3: The File-Storing Property to Be Set for Specifying Any CSV File Format
- 4: The Conclusion and Beyond
Starting Context
- The reader has basic knowledge on one of the UNO-supported programming languages (Java, C++, the Microsoft .NET Framework programming languages (C# and Visual Basic.NET, in most cases), Python, LibreOffice or Apache OpenOffice Basic, BeanShell, and JavaScript).
- The reader has knowledge on what UNO is and how it is related to LibreOffice or Apache OpenOffice.
- The reader has knowledge on basic elements of UNO and the terminology used for them in this series.
- The reader has knowledge on how to get a UNO objects context to any LibreOffice or Apache OpenOffice instance (there are some articles on how to do it for any external console Java program, C++ program, and C# program; an article on how to do it for any external console Python program will come later; there is an article on how to do it for any external GUI Java program; some articles on how to do it for any external GUI C++ program, C# program, and Python program will come later).
- The reader has knowledge on how to access the intended document and how to store the document into a file, using UNO (the article on how to use LibreOffice or Apache OpenOffice as a files converter (the concept, a Java implementation, a C++ implementation, a C# implementation, a Python implementation, a LibreOffce Basic implementation will come later) contains a major part of the information).
Target Context
- The reader will know how to write any spread sheet (that is of any spread sheets document that can be opened by LibreOffice or Apache OpenOffice) to a CSV file in any format (that is allowed by LibreOffice or Apache OpenOffice) from his or her program.
Here are Hypothesizer 7, Objector 27A, and Objector 27B in front of a computer.
Orientation
Hypothesizer 7
In this article, we will know how to write any spread sheet to a CSV file in any format from our program.
Note that "any spread sheet" here means 'any spread sheet that is of any spread sheets document that can be opened by LibreOffice or Apache OpenOffice'. . . . Although LibreOffice or Apache OpenOffice can cope with most of the typical spread sheets document formats, there may be some formats that are not applicable to this article.
Also note that "any format" here means 'any format that is allowed by LibreOffice or Apache OpenOffice'. . . . Although LibreOffice or Apache OpenOffice allows it in 'the-items-are-separated-by-any-arbitrary-character (usual delimiters are 'comma' and 'tab')' or 'the-data-lengths-are-fixed-per-column', 'all-the-text-items-are-quoted-by-any-arbitrary-character (usual quotation marks are 'double quotation' and 'single quotation')' or 'only-the-indispensable-quotations-are-added (ditto as for the quotation mark)', 'the-formula-values-are-written' or 'the-formulas-themselves-are-written', 'the-data-are-written-as-shown' or 'the-data-are-written-as-stored', and 'any arbitrary characters-encoding', if you have a novelty format in your mind, that format may be not applicable to this article.
Objector 27A
What do you mean by "the-data-are-written-as-shown" and "the-data-are-written-as-stored"?
Hypothesizer 7
Sir, for example, a cell stores a decimal fraction, '1.234', in itself and is shown as '1.23'.
Objector 27A
Do you mean that the cell format is set so?
Hypothesizer 7
Yes. Then, they are about whether '1.23' is written or '1.234' is.
Objector 27A
So, tell me what parameters I should specify to 'soffice --convert-to' in order to write a spread sheet to a CSV file in the 'the-items-are-separated-by-tab', 'all-the-text-items-are-quoted-by-double-quotation', 'the-formulas-themselves-are-written', 'the-data-are-written-as-stored', 'UTF8' format.
Hypothesizer 7
Well, I cannot do so . . .
Objector 27A
Huh? Are you making fun of me?
Hypothesizer 7
No, sir, I would never . . .
Objector 27A
You said "we will know" and now say "I cannot", which is making fun of me!
Hypothesizer 7
. . . Please calm down, sir. I did not say that we would use 'soffice --convert-to'.
Objector 27A
Huh? Of course, we will.
Hypothesizer 7
Actually, we will not.
Objector 27A
When we convert files, we use 'soffice --convert-to', OK?
Hypothesizer 7
Ah, obviously, "we" by you is different from "we" by me. . . . Certainly, "we" by you can use 'soffice --convert-to' at your own discretion.
Objector 27A
How?
Stage Direction
Hypothesizer 7 sighs.
Hypothesizer 7
I do not know, and am not even interested, honestly. . . . You know it, do you not?
Objector 27A
. . . Why do you think I would know?
Hypothesizer 7
You are promoting 'soffice --convert-to' because you know that it can do what you want, are you not?
Objector 27A
. . . I'm not particularly promoting it, but it is being promoted to me.
Hypothesizer 7
"is being promoted"? . . . I see. You are right. I should humbly leave the task of answering your question to whoever is promoting it.
Objector 27A
. . .
Objector 27B
I want to write each of all the spread sheets in a spread sheets document to a CSV file in a specific format from my program.
Hypothesizer 7
Madam, it can be easily done.
Objector 27B
Really? What parameters should I pass to 'soffice --convert-to'?
Stage Direction
Hypothesizer 7's shoulders droop.
Hypothesizer 7
I do not know, madam.
Objector 27B
You said "it can be easily done"!
Hypothesizer 7
Yes, I said so, and it is true.
Objector 27B
So, what are the parameters to 'soffice --convert-to'?
Hypothesizer 7
. . . If you do not need to use 'soffice --convert-to', there is an easy solution (an implementation for Java, C++, C#, Python, or LibreOffice Basic; OK, they are not exactly writing all the spread sheets (P.S. here is an article for writing all the spread sheets), but an arbitrary spread sheet, but the solution will be clear), but if you are under a moral, religious, or whatever obligation to use 'soffice --convert-to', I should humbly leave the task of answering your question to whoever is promoting 'soffice --convert-to'.
Objector 27B
No kidding! Everybody uses 'soffice --convert-to'!
Hypothesizer 7
. . .
Main Body
1: A Note
Hypothesizer 7
In this article, I introduce only the document-storing properties to be set. I mean, I do not delve into how to store any document to any file using the document-storing properties. The previous articles on converting document files (the concept, a Java implementation, a C++ implementation, a C# implementation, a Python implementation, a LibreOffice Basic implementation) contain the necessary information.
2: The Filter Name for Storing Any Spread Sheets Document to a CSV File
Hypothesizer 7
The filter name for storing any spread sheets document to a CSV file is 'Text - txt - csv (StarCalc)'.
You should set the filter name to the 'FilterName' property.
3: The File-Storing Property to Be Set for Specifying Any CSV File Format
Hypothesizer 7
The document-storing property to be set for specifying any CSV file format is 'FilterOptions', which takes a string value.
Objector 27A
"string value"?
Hypothesizer 7
For example, '44,34,76,1,,0,true,true,true,false,false'.
Objector 27A
Huh? What, the hell, does that mean?
Hypothesizer 7
The format is '%the items delimiter character code%,%the text items quotation mark character code%,%the encoding code%,1,,0,%whether all the text items are quoted%,false,%whether the data are written as shown%,%whether the formula themselves are written%,false'.
Objector 27A
. . . What are those constants, "1", "", "0", "false", "false"?
Hypothesizer 7
They seem to be dummies, as far as I know. I mean, changing any of the "false"s to 'true' does not seem to bring any effect, as far as I notice, for example.
Objector 27A
So, what does "the encoding code" mean?
Hypothesizer 7
That means the code that identifies the encoding to be used, for example, '76' for UTF-8, '65535' for UCS-2, '65534' for UCS-4, '11' for US-ASCII, etc.
Objector 27A
Is UCS-2 realy UCS-2, not UTF-16?
Hypothesizer 7
Actually, I do not know.
Objector 27A
If the encoding is UCS-2 (if it really is UCS-2) what value should I specify as the items delimiter character code for an arbitrary character?
Hypothesizer 7
You can just specify the UCS-2 (if it really is UCS-2) code: for example, '12354' for 'U+3042'.
Objector 27A
Ah.
Objector 27B
Can't I specify a format to a column data, for example, specifying to thousands-separate each number datum?
Hypothesizer 7
You can set the format to the concerned cells, and then store the spread sheet to a CSV file with the 'the-data-are-written-as-shown' switch on.
Objector 27B
I don't want to modify the spread sheet itself!
Hypothesizer 7
It is only temporarily; you do not need to store the modifications to the original file.
Objector 27B
. . . How can I set the format from my program?
Hypothesizer 7
Although I have not yet written any article specifically on the theme, this 4 parts article should be helpful to figure that out ('l_integerExpressionFormatString' can be like '#,##0').
Objector 27B
. . . How can I exclude the header line from the CSV file?
Hypothesizer 7
By removing the header row from the spread sheet, temporarily; how to do so is described in an article.
Objector 27B
. . .
Hypothesizer 7
A major merit of directly using UNO, not 'soffice --convert-to' or any third-party tool, is that you can freely tweak the document as you like.
Objector 27B
. . . I assume that the column lengths in any 'the-data-lengths-are-fixed-per-column' format are determined automatically, but how can I know the lengths? I have to know the lengths in order to read the CSV file.
Hypothesizer 7
Ah, . . . a good question. . . . If the header row is retained, the column lengths should be able to be known by analyzing the header row.
Objector 27B
Sure. If not?
Hypothesizer 7
If not, . . . you may abandon the scheme of eliminating the header row.
Objector 27B
I may not!
Hypothesizer 7
Then, you may know the lengths by surveying the spread sheet in your program.
Objector 27B
. . . How?
Hypothesizer 7
It is easy to read the contents of any cell using UNO.
4: The Conclusion and Beyond
Hypothesizer 7
Now, we know how to write any spread sheet (that is of any spread sheets document that can be opened by LibreOffice or Apache OpenOffice) to a CSV file in any format (that is allowed by LibreOffice or Apache OpenOffice) from our program.
All we have to do is to set the appropriate document-storing properties appropriately.
If the data have to be tweaked somehow (the numbers have to be shown in a specific format, the header row has to be eliminated or added, some columns have to be eliminated, etc), they can be freely done so on the spread sheet, through UNO.
Any PDF file format can be also specified from our program, but we will see how in a future article.