2020-03-22

43: Export Office Document into a PDF as per Full Specs with UNO

<The previous article in this series | The table of contents of this series | The next article in this series>

Exported sections, images quality, watermark, tagging, form fields format, whether bookmarks or comments, encrypting, restrictions, signing, etc.

Topics


About: UNO (Universal Network Objects)
About: LibreOffice
About: Apache OpenOffice
About: The Java programming language
About: C++
About: Microsoft .NET Framework
About: The Python programming language
About: LibreOffice Basic
About: Apache OpenOffice Basic
About: BeanShell
About: JavaScript

The table of contents of this article


Starting Context



Target Context


  • The reader will know how to export any Office (word processor, spread sheets, or presentation) document into a PDF file as per detailed specifications (exported sections, images quality, watermark, tagging, exporting form fields in a specific format, exporting bookmarks, exporting comments, encrypting, setting restrictions, signing, etc.) from his or her program, using LibreOffice or Apache OpenOffice and UNO.
Stage Direction
Here are Hypothesizer 7, Objector 43A, and Objector 43B in front of a computer.


Orientation


Hypothesizer 7
In this article, we will know how to export any Office document into a PDF file as per detailed specifications from his or her program, using LibreOffice or Apache OpenOffice and UNO.

Objector 43A
Is it really "any Office document"?

Hypothesizer 7
Sir, actually, it is any LibreOffice or Apache OpenOffice Writer, Calc, or Impress document, regardless of the format or even the existence of the file associated with the document.

Objector 43A
Huh? "regardless of" what?

Hypothesizer 7
We export a document that has been loaded into LibreOffice or Apache OpenOffice Writer, Calc, or Impress; the format of the file from which the document has been loaded does not matter, even if there is such a file.

Objector 43A
There may be not such a file?

Hypothesizer 7
Right: if the document is a new document that is not stored yet, there is no such a file.

Objector 43A
Ah. Of course. So, the file can be a Microsoft Word file, a Microsoft Excel file, a plain text file, a CSV file, or whatever?

Hypothesizer 7
Whatever LibreOffice or Apache OpenOffice can load as a Writer, Calc, or Impress document, yes.

Objector 43B
What detailed specifications can I specify?

Hypothesizer 7
Madam, that is what you can know in this article, but in short words, all the specifications you can specify in the LibreOffice or Apache OpenOffice GUI.

Objector 43B
So, can I even sign the PDF file?

Hypothesizer 7
Yes, you can.


Main Body


1: The Mechanism


Hypothesizer 7
The mechanism for exporting any document into a PDF file is exactly what is described in a previous article (the concept, a Java implementation, a C++ implementation, a C# implementation, a Python implementation, and a LibreOffice or Apache OpenOffice Basic implementation), and I will not repeat it here.

The theme of this article is the document-storing property to be set for detailed PDF specifications.


2: The Document-Storing Property to Be Set


Hypothesizer 7
The document-storing property to be set is 'FilterData', which takes a 'sequence' of '::com::sun::star::beans::PropertyValue's.

Objector 43B
What is "'sequence'", exactly?

Hypothesizer 7
By 'sequence', I mean a UNO datum type, not 'sequence in general' (which is why I have quoted the term).

Objector 43B
So, is there a specific class that represents the datum type?

Hypothesizer 7
For C++, yes; for Java and C#, it is mapped to array; for Python, it is mapped to 'List'.

Objector 43B
Whatever.

Hypothesizer 7
These are the properties that can be included in the 'sequence', where the types except "any" are UNO datum types ('any' is not any datum type, but means that any UNO datum is accepted).

NameTypeValue
PageRangestringthe pages range: '' -> all the pages, '1-2,4', etc.
Selectionanythe exported sections: the selection gotten by 'XSelectionSupplier.getSelection ()' of the controller of the document, or an arbitrary object (for example, a cells range)
UseLosslessCompressionbooleanlossless compression is used: 'false' -> a JPEG compression is used
Qualityshortthe JPEG quality in %
ReduceImageResolutionbooleanthe image resolutions are reduced
MaxImageResolutionshortthe maximum image resolution in DPI: 75, 150, 300, 600, or 1200
Watermarkstringthe watermark string
IsAddStreambooleanthe original document is embedded
SelectPdfVersionshortthe PDF version: '0' -> PDF 1.4, '1' -> PDF/A-1
UseTaggedPDFbooleanthe tagged PDF is used
ExportFormFieldsbooleanthe form fields are exported
FormsTypeshortthe forms type: '0' -> FDF, '1' -> PDF, '2' -> HTML, '3' -> XML
AllowDuplicateFieldNamesbooleansome duplicate field names are allowed
ExportBookmarksbooleanthe bookmarks are exported
ExportPlaceholdersbooleanthe placeholders are exported
ExportNotesbooleanthe comments are exported
IsSkipEmptyPagesbooleanthe automatically inserted empty pages are skipped
UseReferenceXObjectbooleanthe Form XObjects are used
ViewPDFAfterExportbooleanthe target file is shown after the exporting has been completed
InitialViewshortfor the after-the-exporting-showing of the target file, the initial view style: '0' -> pages only, '1' -> bookmarks and pages, '2' -> thumbnails and pages
InitialPageshortfor the after-the-exporting-showing of the target file, the initial page number
Magnificationshortfor the after-the-exporting-showing of the target file, the initial magnification: '0' -> default, '1' -> the whole page is fitted into the window, '2' -> the page width is fitted into the window, '3' -> the page contents are supposed to be fitted into the window, but not necessarily are so
Zoomshortfor the after-the-exporting-showing of the target file, the initial zooming factor in %
PageLayoutshortfor the after-the-exporting-showing of the target file, the initial page layout style: '0' -> default, '1' -> single page, '2' -> continuous, '3' -> each 2 facing pages are shown abreast and those pairs are shown continuously connected vertically
FirstPageOnLeftbooleanfor the after-the-exporting-showing of the target file, the first page is on left: valid only when 'PageLayout' is '3'
ResizeWindowToInitialPagebooleanfor the after-the-exporting-showing of the target file, the window is resized to the initial page
CenterWindowbooleanfor the after-the-exporting-showing of the target file, the window is centered on the screen
OpenInFullScreenModebooleanfor the after-the-exporting-showing of the target file, the window is opened in the full screen mode
DisplayPDFDocumentTitlebooleanfor the after-the-exporting-showing of the target file, the document title is shown
UseTransitionEffectsbooleanfor the after-the-exporting-showing of the target file, the transition effects are used
HideViewerMenubarbooleanfor the after-the-exporting-showing of the target file, the menubar is hidden
HideViewerToolbarbooleanfor the after-the-exporting-showing of the target file, the toolbar is hidden
HideViewerWindowControlsbooleanfor the after-the-exporting-showing of the target file, the window controls are hidden
OpenBookmarkLevelsshortfor the after-the-exporting-showing of the target file, the opened bookmark levels: '-1' -> all
ExportBookmarksToPDFDestinationbooleanthe bookmarks are exported as named destinations
ConvertOOoTargetToPDFTargetbooleanthe document links are converted to PDF targets
ExportLinksRelativeFsysbooleanthe relative file links are exported
PDFViewSelectionshortthe cross documents links viewer: '0' -> the default viewer, '1' -> a PDF reader application, '2' -> an internet browser
EncryptFilebooleanthe file is encrypted
DocumentOpenPasswordstringthe opening password
RestrictPermissionsbooleansome actions are restricted
PermissionPasswordstringthe restricted actions password
Printingshortthe printing restriction: '0' -> not permitted, '1' -> only in a low resolution (150 DPI), '2' -> also in high resolutions
Changesshortthe changing restriction: '0' -> not permitted, '1' -> inserting, deleting, and rotating pages, '2' -> filling in form fields, '3' -> commenting and filling in form fields, '4' -> any except extracting pages
EnableCopyingOfContentbooleancopying of contents is allowed
EnableTextAccessForAccessibilityToolsbooleantext access for accessibility tools is allowed
SignPDFbooleanthe file is signed
SignatureCertificate::com::sun::star::security::XCertificatethe signature certificate
SignaturePasswordstringthe signature certificate password
SignatureLocationstringthe signature location
SignatureReasonstringthe signature reason
SignatureContactInfostringthe signature contact information
SignatureTSAstringthe signature time stamp authority URL

Objector 43B
Um? Hmm . . .

Hypothesizer 7
The properties correspond to the settings in the PDF exporting dialog, as you will be able to easily see.

Objector 43B
I can see it, certainly, but I don't understand some settings in the dialog.

Hypothesizer 7
Ah.

Objector 43B
For example, what does "Fit in window" mean? I mean, "Fit width" seems to mean that the page width is fitted into the window; in "Fit in window", what is fitted into the window?

Hypothesizer 7
That expression is inarticulate because it is missing the issue: anyway, something is fitted into the window (in fact, where else can it be fitted?) and the issue is 'what' is fitted there, but "Fit in window" is not stating 'what' at all. . . . An understandable expression will be 'fit the whole page (into the window)'.

Objector 43B
And what, the hell, does "fit visible" mean? . . . Is "visible" fitted into the window? "visible" what? . . . That expression seems nonsensical because nothing is "visible" until the magnification has been determined, so, the magnification cannot be determined by fitting "visible" into the window because, you know, "visible" doesn't exist yet. . . . How am I supposed to understand such an expression?

Hypothesizer 7
Ah, in fact, that is not any fault of LibreOffice or Apache OpenOffice, but a fault of Adobe Acrobat Reader, from which the expression has come.

Objector 43B
Oh . . ., I see, but anyway, what does that mean?

Hypothesizer 7
In fact, "visible" seems to mean 'the contents'.

Objector 43B
Huh?

Hypothesizer 7
For example the characters on the page are the contents.

Objector 43B
. . . I think, I can see also the margins as white spaces.

Hypothesizer 7
In their terminology, they seem to be called 'invisible'.

Objector 43B
. . . So?

Hypothesizer 7
So, basically, "Fit visible" seems to mean 'the area that contains the contents is fitted into the window'.

Objector 43B
"basically"?

Hypothesizer 7
In fact, I do not understand the exact intention: as far as I try, it is not that the minimum square that contains the whole page contents is snugly fitted into the window.

Objector 43B
. . . Also "continuous facing" seems inarticulate, although I can guess what is supposed to mean.

Hypothesizer 7
That means that each 2 facing pages are shown abreast and those pairs are shown continuously connected vertically.

Objector 43A
How can I get the certificate object of the type, '::com::sun::star::security::XCertificate'?

Hypothesizer 7
Preliminarily, you have to have registered a Network Security Services database into LibreOffice or Apache OpenOffice.

Objector 43A
"Network Security Services"? . . . Anyway, how can I register the database into LibreOffice?

Hypothesizer 7
On the LibreOffice or Apache OpenOffice menu, you can click 'Tools' -> 'Options...'; on the appeared dialog, in the left pane, you can select 'LibreOffice' -> 'Security'; in the right pane, you can click 'Certificate...'; on the appeared dialog, you can click 'Add...'; on the appeared dialog, you can select the database directory and click 'OK'; you can click 'OK' two more times.

Objector 43A
Hmm . . ., and?

Hypothesizer 7
This is a piece of Java code that creates the certificate object of a certificate in the database, where 'a_certificateIssuerName' and 'a_certificateSerialNumber' are the certificate issuer name and the serial number, respectively, like 'CN=Tanichida,C=JP' and '{(byte) 0x00, (byte) 0xAA, (byte) 0xDD, (byte) 0xDF, (byte) 0xBF}', respectively, and 'getServiceInstance' is a function that gets a global UNO service instance.

@Java Source Code
~
import com.sun.star.security.XCertificate;
import com.sun.star.xml.crypto.XSEInitializer;
import com.sun.star.xml.crypto.XSecurityEnvironment;
import com.sun.star.xml.crypto.XXMLSecurityContext;
~
public class UnoObjectsContext implements XComponentContext {
	~
	private XSecurityEnvironment i_networkSecurityServicesSecurityEnvironmentInXSecurityEnvironment;
	
	~
	
	public Object getServiceInstance (String a_serviceName, Class a_targetClass, List <Object> a_arguments) throws com.sun.star.uno.Exception {
		~
	}
	
	~
	
	public XCertificate getNetworkSecurityServicesCertificateInXCertificate (String a_certificateIssuerName, byte [] a_certificateSerialNumber) throws Exception {
		if (i_networkSecurityServicesSecurityEnvironmentInXSecurityEnvironment == null) {
			XSEInitializer l_networkSecurityServicesSecurityEnvironmentInitializerInXSEInitializer = (XSEInitializer) getServiceInstance ("com.sun.star.xml.crypto.SEInitializer", XSEInitializer.class, null);
			XXMLSecurityContext l_networkSecurityServicesSecurityContextInXXMLSecurityContext = l_networkSecurityServicesSecurityEnvironmentInitializerInXSEInitializer.createSecurityContext ("NetworkSecurityServices");
			if (l_networkSecurityServicesSecurityContextInXXMLSecurityContext.getSecurityEnvironmentNumber () > 0) {
				i_networkSecurityServicesSecurityEnvironmentInXSecurityEnvironment = l_networkSecurityServicesSecurityContextInXXMLSecurityContext.getSecurityEnvironmentByIndex (0);
			}
		}
		if (i_networkSecurityServicesSecurityEnvironmentInXSecurityEnvironment != null) {
			XCertificate [] l_networkSecurityServicesCertificatesArrayInXCertificate = i_networkSecurityServicesSecurityEnvironmentInXSecurityEnvironment.getPersonalCertificates ();
			return selectCertificateFromCertificatesArray (l_networkSecurityServicesCertificatesArrayInXCertificate, a_certificateIssuerName, a_certificateSerialNumber);
		}
		return null;
	}
	
	~
	
	private XCertificate selectCertificateFromCertificatesArray (XCertificate [] a_certificatesArrayInXCertificate, String a_certificateIssuerName, byte [] a_certificateSerialNumber) {
		if (a_certificatesArrayInXCertificate != null) {
			for (XCertificate l_certificateInXCertificate: a_certificatesArrayInXCertificate) {
				if (a_certificateIssuerName.equals (l_certificateInXCertificate.getIssuerName ())) {
					if (a_certificateSerialNumber == null) {
						return l_certificateInXCertificate;
					}
					else {
						byte [] l_certificateSerialNumber = l_certificateInXCertificate.getSerialNumber ();
						if (a_certificateSerialNumber.length == l_certificateSerialNumber.length) {
							int l_certificateSerialNumberByteIndex = 0;
							for (byte l_certificateSerialNumberByte: l_certificateSerialNumber) {
								if (a_certificateSerialNumber [l_certificateSerialNumberByteIndex] != l_certificateSerialNumberByte) {
									break;
								}
								l_certificateSerialNumberByteIndex ++;
							}
							if (l_certificateSerialNumberByteIndex == l_certificateSerialNumber.length) {
								return l_certificateInXCertificate;
							}
						}
					}
				}
			}
		}
		return null;
	}
}

Objector 43A
. . . How can I know "the serial number"?

Hypothesizer 7
You can see it from the LibreOffice or Apache OpenOffice PDF exporting dialog (select the 'Digital Signatures' tab; in the appeared dialog, click 'Select...'; select the certificate; click 'View Certificate...'; in the appeared dialog, select the 'Details' tab).

Objector 43A
Hmm.

Hypothesizer 7
Note that if the database has a password, the LibreOffice or Apache OpenOffice instance require the password once.

Objector 43A
"once"?

Hypothesizer 7
I mean, after the instance is up, when the above code is first called, the instance shows a dialog that requires the password.

Objector 43A
But the instance is in the headless mode!

Hypothesizer 7
Then, it will have to be in a non-headless mode.

Objector 43A
. . . That is inconvenient.

Hypothesizer 7
I know, but unfortunately, I do not know how to set the password programmatically.

Objector 43A
. . . Isn't that 'SignaturePassword' the one to be used?.

Hypothesizer 7
That is the password of the certificate, not of the database.

Objector 43A
Hmm . . .

Hypothesizer 7
An option would be to eliminate the password from the database, if that is permissible.

Objector 43A
Well . . .

Hypothesizer 7
I think that setting a password is not the only way to protect the database: for example, you can limit the access to the database files to only a single operating system user 'operating system file permissions'-wise.


3: The Conclusion and Beyond


Hypothesizer 7
Now, we know how to export any Office (word processor, spread sheets, or presentation) document into a PDF file as per detailed specifications from his or her program, using LibreOffice or Apache OpenOffice and UNO.

The basis of that technique is the way of converting any file directly using UNO (the concept, a Java implementation, a C++ implementation, a C# implementation, a Python implementation, and a LibreOffice or Apache OpenOffice Basic implementation). Once we have adopted the way, we can have many benefits efficiency-wise and functionality-wise.

What cannot be controlled by only specifying document-storing properties will be able to be controlled by tweaking the document in many ways that have been introduced or will be introduced in this series.


References


  • Apache OpenOffice Wiki. (2014/01/02). Apache OpenOffice Developer's Guide. Retrieved from https://wiki.openoffice.org/wiki/Documentation/DevGuide/OpenOffice.org_Developers_Guide
<The previous article in this series | The table of contents of this series | The next article in this series>