Documentation generated by XL CLAIRE v3.3.37 at Fri, 24 Nov 2006
| categories Introduction What is PDF ? |
Pdf module design |
PDF stands for Portable Document Format. As the name implies, it is a data format that can be used to describe documents. Adobe, the developers of PDF, market software to create, edit and visualize PDF files. Because the specifications of the file format are publicly available, a lot of other companies develop software for PDF as well. In prepress, PDF is used more and more as a format to exchange data between applications. For authenticity consideration PDF came with the ability to be digitally signed and/or crypt.
This module will creates document according to "PDF Reference third edition" (Version 1.4).
| What is PDF ? | categories Introduction Pdf module design |
Rectangles |
Additionally to traditional html elements, we introduce some custom elements intended to handle special PDF (interactive) features :
| Pdf module design | categories Introduction Rectangles |
Writer side - low level Document creation |
A little class is introduce to handle a rectangle object that hold the position of a box in a page coordinate system with the same orientation as page's rectangle. Some API will require a rectangle as parameter as created by :
| rect :: Pdf/rectangle!(100., // left border's X 200., // top border's Y 200., // right border's X 100.) // bottom border's Y |
| rect :: Pdf/rectangle!(100., 100.) (assert(rect.Pdf/left = -50.)) (assert(rect.Pdf/right = 50.)) (assert(rect.Pdf/top = 50.)) (assert(rect.Pdf/bottom = -50.)) |
| Introduction Rectangles |
categories Writer side - low level Document creation |
Sections and pages |
The whole writer API relies on a document object of class pdf_document. To create a new PDF document we would call document! and give a page format, margin and orientation :
| doc :: Pdf/document!( "A4", // format false, // landscape? 5) // margin in %tage of the format |
| (Pdf/print_in_port(doc, stdout)) (Pdf/print_in_file(doc, "test.pdf")) |
| Document creation | categories Writer side - low level Sections and pages |
Graphic state |
In this implementation the pdf_document object is organized in named sections each one having their own index of contained pages, a document has at least one section and one page. Once a document is created we have to create a new section (e.g. section "body") and create a new page in order to have a valid page target for all operations performed on the document :
| (Pdf/new_section(doc, "body")) // may be omitted (Pdf/new_page(doc)) // page 1 of section "body" |
All operations performed with the low level API will take place in the current page of the current section, these current section/page may be selected and set by hand provided the following methods :
| (Pdf/set_current_section(doc, "some section")) (Pdf/set_current_page(doc, 2)) (assert(Pdf/get_current_section(doc) = "some section")) (assert(Pdf/get_current_page(doc) = 2)) |
Each section have its own representation of the current page so that changing the current section would also restore the current page of that section.
| Sections and pages | categories Writer side - low level Graphic state |
Path construction |
Graphic operations operate with a graphic state that drives the rendering of graphic operation : a transformation matrix would serve positioning and styling options like colors, line width (etc...). Notice that in this implementation transformation matrix and the graphic state are handled together in a general drawing state. These states are organized in a stack and all modification of the graphic state would apply to the available state which handles the innermost transformation,
| (Pdf/push_state(doc)) // modification of the graphic state (Pdf/pop_state(doc)) // restore the previous state |
| (Pdf/set_matrix(doc, a, b, c, d, e, f)) |
| = |
| * |
|
| (Pdf/move(doc, dx, dy)) // translate (Pdf/scale(doc, sx, sy)) (Pdf/scale(doc, x, y, sx, sy)) // scale at (x,y) (PDf/rotate(doc, a)) (PDf/rotate(doc, x, y, a)) // rotate at (x,y) (PDf/skew(doc, a, b)) |
| (Pdf/line_width(doc, 4.)) // set line width to 4 point (Pdf/line_join(doc, Pdf/ROUND_JOIN)) (Pdf/line_cap(doc, Pdf/ROUND_CAP)) |
| (Pdf/color(doc, 0.0, 0.0, 0.0)) // sets line color and font color to black (Pdf/stroke_color(doc, 0.7, 0.2, 0.0)) // stroke with color red (with a few green) (Pdf/alpha(doc, 0.7)) // sets the opacity to 70% |
| Graphic state | categories Writer side - low level Path construction |
Rectangles, circles, quads and triangles |
A path object is made of an ordered list of points interconnected by a line or a bezier curve. A path may be stroked, filled or both with the current graphic state. In order to initialize a path object we call a begin_path restriction :
| (Pdf/begin_path(doc)) // starts a path at (0., 0.) (Pdf/begin_path(doc, x, y)) // starts a path at (x,y) |
| (Pdf/lineto(doc, 100., 100.)) // insert a line from (0,0) to (100., 100.) |
| (Pdf/curveto(doc, 0., 100., 100., 0, 100., 100.)) |
| (Pdf/begin_path(doc)) (Pdf/lineto(doc, 0., 100.)) // left border (Pdf/lineto(doc, 100., 100.)) // top border (Pdf/lineto(doc, 100., 0.)) // right border (Pdf/lineto(doc, 0., 0.)) // bottom border |
| (Pdf/begin_path(doc)) (Pdf/lineto(doc,...)) // insert component of the first subpath (Pdf/curveto(doc,...)) (Pdf/moveto(doc, 200., 200.)) // start a new subpath at (200., 200.) (Pdf/lineto(doc,...)) // insert component of a second subpath (Pdf/curveto(doc,...)) |
| (Pdf/[close_][fill_][stroke_]path(doc)) |
Pdf also defines variants of begin_path that would initialize a new path and insert lines or curves of predefined shapes :
| // rectangles are represented using four lines (Pdf/begin_path_rect(doc, rect)) // a quad is like a rectangle but having rounded corner // quads are represented using curves (Pdf/begin_path_quad(doc, rect)) // circles are approximations, they are defined by a given amount of segment // all segments are represented with a curve (Pdf/begin_path_circle(doc, rect)) // defaults to 16 segments (Pdf/begin_path_circle(doc, rect, 32)) // triangles are represented using three lines // one edge on the left border of the given rectangle // and a node in the middle of the right border of the given rectangle (Pdf/begin_path_triangle(doc, rect)) |
| Path construction | categories Writer side - low level Rectangles, circles, quads and triangles |
Fonts and AFM file metrics |
Pdf also comes with various methods to insert predefined shapes in a single step. When the method name contains 'stroke' then the method take a w argument used a the line width, all methods will take a color argument that have to be supplied as a string that identifies a named color. A named color is either made of a X11 name or an hexadecimal representation :
| (Pdf/stroked_rectangle(doc, rect, 4., "red")) (Pdf/filled_circle(doc, rect, "#FF0000")) // also red |
| Rectangles, circles, quads and triangles | categories Writer side - low level Fonts and AFM file metrics |
Text layout |
In order to properly handle text object we need font metrics information, this information is use the calculate the dimension of a text object as the computation of the sum of glyph dimension. Pdf support handling of AFM (Adobe Font Metrics) file format. The Pdf module distribution comes with default AFM files for default PDF fonts those that are supposed to be handled by any reader application. The Pdf module has to be informed where to find AFM files in order to properly load a font metrics description :
| (Pdf/set_afm_path("/path/to/AFM/files/folder")) |
| (Pdf/set_serialized_afm_path("/path/to/serialized/AFM/files/folder")) |
In this implementation, fonts are handled globally, for each loaded font file is associated a unique font descriptor id. A pdf_document object uses its own selection of system font. The method get_font would select a given font for a supplied document and if that font does not exists in the system an attempt would be made to load the font metrics from the repository :
| fontid :: Pdf/get_font(doc, "Helvetica", false, false) // Helvetica normal (not bold, not italic) |
| Fonts and AFM file metrics | categories Writer side - low level Text layout |
Text objects |
In order to properly arrange text boxes a layout facility is provided to get various metrics and calculate the circumscribed rectangle of a given text (these methods are heavily used by the HTML renderer). This is achieved using the metrics of a font (defined in the associated AFM file) so that each layout API requires a font id (as returned by get_font) and the font size that would be used if the text was actually rendered. These layouts are always given in the identity matrix (untransformed).
When lay-outing a text (with get_text_width or get_text_box), new line characters are handled like any other characters, that is by accounting the width of each glyph. The height of the lay-outed text would be set to the height metrics of the font modulo the font size. Notice that get_text_box would return a rectangle centered on the baseline of the font as illustrated by the method bellow that shows various layout informations for a given text and font :
| show_text_box_info(self:pdf_document, txt:string, // the text string to layout fontid:integer, // id of an font, as returned by get_font fsize:float) -> // a font size let rect := Pdf/get_text_box(self, "Hello World!", fontid, 14.), space_width := Pdf/get_text_width(self, " ", fontid, 14.), (underpos, thickness) := get_underline_metrics(self, fontid, 14.), xheight := get_xheight(self, fontid, 14.) in (printf("text box of [~A] ~S characters\\n", txt, length(txt)), printf("font size: ~Spt wide\\n", fsize), printf("space width: ~Spt long\\n", space_width), printf("height of lower 'x': ~Spt\\n", xheight), printf("underline position: ~Spt above baseline\\n", underpos), printf("underline tickness: ~Spt wide\\n", thickness), assert(rect.Pdf/left = 0.), printf("text ascender: ~Spt\\n", rect.Pdf/top), printf("text descender: ~Spt\\n", -(rect.Pdf/bottom)), printf("text width: ~Spt\\n", rect.Pdf/right)) |
| Text layout | categories Writer side - low level Text objects |
Images |
Mimicking the path construction API, we have a begin_text method that initialize a text object. Notice that texts, as paths, would be drawn using the current transformation matrix :
| // create a new text object at (0.,0.) in the current transformation matrix (Pdf/begin_text(doc)) // create a new text object at (100.,100.) in the current transformation matrix (Pdf/begin_text(doc, 100., 100.)) // create a new text object at (100.,100.) in the current transformation matrix // the text would be drawn with the given angle of 0.3 radian (Pdf/begin_text(doc, 100., 100., 0.3)) |
| // select 12pt Helvetica, normal (Pdf/select_font(doc, 12., "Helvetica", false, false)) // select 12pt Helvetica, bold (Pdf/select_font(doc, "Helvetica", 12., true, false)) // select 14pt Helvetica, italic (Pdf/select_font(doc, "Helvetica", 14., false, true)) |
| (Pdf/show_text(doc, "Hello world!")) |
| (Pdf/show_text(doc, "first line")) (Pdf/new_text_line(doc, 15.)) (Pdf/show_text(doc, "A second line ...")) (Pdf/show_text(doc, "... second line continued")) (Pdf/new_text_line(doc, 15.)) (Pdf/show_text(doc, "A third line")) ... |
| (Pdf/end_text(doc)) |
Pdf also comes with a special method that handles multilined text in a single step, it is an arrangement of the above methods which is a good illustration of text routines, it would also be a good base for a particular implementation :
- The following code is part of the source -
| show_multilined_text(self:Pdf/pdf_document, // target document txt:string, // string of text (may contain \n) f:integer, // font id (see get_font) fs:float, // font size in point il:float, // space between two lines x:float, y:float) -> // upper left corner let lines := explode(txt,"\n"), line_count := length(lines) in (Pdf/begin_text(self, x, y), Pdf/select_font(self, fs, f), for i in (1 .. line_count) let line := lines[i], rect := Pdf/get_text_box(self, line, f, fs) in (Pdf/show_text(self, line), Pdf/new_text_line(self, rect.top - rect.bottom - il)), Pdf/end_text(self)) |
But the user will probably prefer to use the show_html_box instead...
| Text objects | categories Writer side - low level Images |
Invisible attachments |
This module comes with its own handling of PNG image (without dependency), we can insert a PNG image object on a page using the current transformation matrix. An image is supplied given a PNG image file path, the generated PDF document will always embed the image data such to avoid dependencies to external resource. Also we may need a position within the current transformation matrix and or size constraint at the insertion time :
| (Pdf/show_image(doc, "car.png", 50., 50., // bottom left corner at (50., 50.) (in pt) 100., 100.)) // apportioned to a 100. by 100. box (in pt) (Pdf/show_image(doc, "sun.png", rectangle!(50., 600., 100., 550.)) // apportioned to the given rectangle |
| Images | categories Writer side - low level Invisible attachments |
Invisible digital signatures |
The PDF file format defines a way to embed file contents that could be extracted by a reader application. This is called attachment. Two methods are provided to insert a new attachment to the document. These two methods would add a new invisible attachment, i.e. without a visual appearance. When an attachment is submitted we have to specify a mime type for the attached file (e.g. "text/plain") :
| (Pdf/add_attachment(doc, "data.xml", "text/xml")) |
| Pdf/fill_attachment(usrdata:{1}, f:port) -> ptinrf(f, "Hello wolrd!") (Pdf/add_attachment(doc, "hello.txt", "text/plain", 1)) |
| Invisible attachments | categories Writer side - low level Invisible digital signatures |
Writer side - HTML/CSS renderer Design consideration |
The PDF reference also specifies a general way to append a digital signature to a document. A reader application should have a signature handler able to verify a given signature format. In this implementation we support both x509.rsa_sha1 and pkcs7.sha1 formats. The method sign appends a digital signature object to the given document. The actual signature value will be computed when the method print_in_file or print_in_port is called, the value of a signature is computed using Openssl module and we'll need a signer certificate and private key to complete the call to sign :
| // create a CA (Certificate Authority) ca_key :: Openssl/rsa!(512) ca :: Openssl/X509!(ca_key) (Openssl/add_subject_entry(ca, "CN","CARoot")) (Openssl/add_subject_entry(ca, "O","expert-solutions")) (Openssl/add_subject_entry(ca, "C","FR")) (Openssl/set_issuer(ca,ca)) // self issued (Openssl/set_serial(ca,0)) (Openssl/set_not_before(ca, -1)) (Openssl/set_not_after(ca, 30)) (Openssl/set_basic_constraints(ca, "critical,CA:true,pathlen:0")) (Openssl/set_subject_key_identifier(ca, "hash")) (Openssl/set_authority_key_identifier(ca, "keyid:always,issuer:always")) (Openssl/set_key_usage(ca, "critical,keyCertSign,cRLSign")) // create a user certificate cert_key :: Openssl/rsa!(512) cert :: Openssl/X509!(cert_key) (Openssl/add_subject_entry(cert, "CN","bob")) (Openssl/add_subject_entry(cert, "O","expert-solutions")) (Openssl/add_subject_entry(cert, "C","FR")) (Openssl/set_issuer(cert,ca)) // issued by ca (Openssl/set_serial(cert,2)) (Openssl/set_not_before(cert, -1)) (Openssl/set_not_after(cert, 30)) (Openssl/set_key_usage(cert, "critical,digitalSignature,nonRepudiation,keyEncipherment")) (Openssl/set_subject_key_identifier(cert, "hash")) (Openssl/set_authority_key_identifier(cert, "keyid:always,issuer:always")) |
| (Pdf/sign(doc, cert, list(ca), cert_key)) |
Notice that a reader application should complain about the validity of the above signature the supplied certificate can't be verified (unless you actualy define the ca certificate as trusted from the reader application point of view). But you'll probably use a different certificate for signing...
| Writer side - low level Invisible digital signatures |
categories Writer side - HTML/CSS renderer Design consideration |
CSS support |
This module embeds an HTML/CSS engine used to submit a content in a stream oriented way. Only a subset of specified HTML elements are implemented and some, unspecified, have been added for convenience in describing interactive PDF features such as annotations or signature. The goal here is to use this Pdf module in combination with Wcl syntax such to describe any layout/style information in the HTML/CSS languages that have shown to be very concise.
The conversion (HTML > PDF) is achieved with an auto-lay-outing algorithm based on RFC1942. Computed boxes are converted in simple PDF graphic operations affected to PDF pages using a page-break algorithm (as would do a web browser when processing an HTML page for a printer device).
The dictionary of understood HTML elements may be extended by defining substitution handlers for new elements. The description of new elements is done in HTML and would rely on simpler elements.
A scale factor may be applied implicitly for elements that overflows the parent element box. Unlike for screen device that may handle such overflow by inflating the parent box and finally add scroll bars. A scale processing is introduced to avoid artifacts generated by arbitrary wide elements. In a web browser UI, the problem is fixed with scroll bars and of course there is no scroll bar on a printed document! The renderer will fix this artifact by applying a scale on elements that are too wide. In the theory only root elements may need to be scaled. This assumption usually fails due to the recursive implementation of the layout processing which produce floating point mistakes at various level of the recursion. Then, nested scaling are sometimes applied to fix this issue.
| Design consideration | categories Writer side - HTML/CSS renderer CSS support |
HTML redirection and Wcl syntax |
CSS is at style This module comes with an implementation of CSS 2.
| property | inheritance | default | range |
|---|---|---|---|
| margin-top | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| margin-right | 0.0 | ((css_relative_length U float) U {"auto", "inherit"}) | |
| margin-bottom | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| margin-left | 0.0 | ((css_relative_length U float) U {"auto", "inherit"}) | |
| padding-top | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| padding-right | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| padding-bottom | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| padding-left | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| border-top-color | tuple(0.0, 0.0, 0.0) | (tuple(float, float, float) U {"inherit"}) | |
| border-right-color | tuple(0.0, 0.0, 0.0) | (tuple(float, float, float) U {"inherit"}) | |
| border-bottom-color | tuple(0.0, 0.0, 0.0) | (tuple(float, float, float) U {"inherit"}) | |
| border-left-color | tuple(0.0, 0.0, 0.0) | (tuple(float, float, float) U {"inherit"}) | |
| border-top-style | "none" | {"none", "solid", "dashed", "dotted", "inherit"} | |
| border-right-style | "none" | {"none", "solid", "dashed", "dotted", "inherit"} | |
| border-bottom-style | "none" | {"none", "solid", "dashed", "dotted", "inherit"} | |
| border-left-style | "none" | {"none", "solid", "dashed", "dotted", "inherit"} | |
| border-top-width | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| border-right-width | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| border-bottom-width | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| border-left-width | 0.0 | ((css_font_relative_length U float) U {"inherit"}) | |
| width | "auto" | ((css_relative_length U float) U {"auto", "inherit"}) | |
| height | "auto" | ((css_relative_length U float) U {"auto", "inherit"}) | |
| vertical-align | "baseline" | {"baseline", "super", "sub", "top", "middle", "bottom", "inherit"} | |
| page-break-before | "auto" | {"always", "avoid", "auto", "inherit"} | |
| page-break-after | "auto" | {"always", "avoid", "auto", "inherit"} | |
| page-break-inside | inherit | "auto" | {"avoid", "auto", "inherit"} |
| color | inherit | tuple(0.0, 0.0, 0.0) | (tuple(float, float, float) U {"inherit"}) |
| background-color | tuple(1.0, 1.0, 1.0) | (tuple(float, float, float) U {"inherit"}) | |
| background-image | "none" | string | |
| font-family | inherit | "Helvetica" | (string U list) |
| font-style | inherit | "normal" | {"normal", "italic", "oblique", "inherit"} |
| font-weight | inherit | "normal" | {"normal", "bold", "inherit"} |
| font-size | inherit | 12.0 | ((css_relative_length U float) U {"inherit"}) |
| text-align | inherit | "left" | {"left", "right", "center", "justify", "inherit"} |
| text-indent | inherit | 0.0 | ((css_relative_length U float) U {"auto", "inherit"}) |
| text-decoration | "none" | {"none", "underline", "inherit"} | |
| text-transform | "none" | {"none", "capitalize", "uppercase", "lowercase", "inherit"} | |
| letter-spacing | inherit | "normal" | ((css_font_relative_length U float) U {"inherit"}) |
| word-spacing | inherit | "normal" | ((css_font_relative_length U float) U {"inherit"}) |
| line-height | inherit | "normal" | ((css_font_relative_length U float) U {"normal", "inherit"}) |
| white-space | inherit | "normal" | {"normal", "pre", "nowrap", "inherit"} |
| border-collapse | inherit | "separate" | {"collapse", "separate", "inherit"} |
| border-spacing | inherit | 0.2em | ((css_font_relative_length U float) U {"inherit"}) |
| content | "none" | (list U {"none", "inherit"}) | |
| counter-reset | "none" | (list U {"none", "inherit"}) | |
| counter-increment | "" | (list U {"none", "inherit"}) | |
| debug | "no" | {"yes", "no", "inherit"} |
| CSS support | categories Writer side - HTML/CSS renderer HTML redirection and Wcl syntax |
XObject and simple HTML formated boxes |
In order to use standard printing methods and Wcl syntax this modules comes with an HTML redirection support, a redirection scope is introduced by a call to print_in_html and terminated by a a call to a restriction of the end_of_html* method family :
| (Pdf/print_in_html(DOC)) // starts HTML redirection ( ?><p> Hello world! </p><? ) (Pdf/enf_of_html*(DOC)) // ends redirection and apply a constructor |
| HTML redirection and Wcl syntax | categories Writer side - HTML/CSS renderer XObject and simple HTML formated boxes |
Main document HTML stream |
The HTML renderer can be used in a simple manner: render HTML in a given box. This would take place on the current page of the current section and would not use the auto page-break algorithm. If an overflow occurs during the process (i.e. the rendered HTML is wider than the supplied box) then a scale is applied such in any case, an arbitrary big HTML stream would fit the given box.
The autofit? flag, when true, tells to apply a final scale such the rendered HTML exactly fits the supplied box (when the rendered box appears smaller than the supplied box).
show_html_box is used to submit an HTML stream represented by a string :
| // autofit? true by default : auto-scaled to the box (Pdf/show_html_box(doc, "toto <i>titi</i>", Pdf/rectangle!(200., 300., 300., 200.))) // autofit? false : hopefully unscaled, unless an overflow occurs (Pdf/show_html_box(doc, "toto <i>titi</i>", Pdf/rectangle!(200., 600., 300., 400.), false)) |
| (Pdf/print_in_html(doc)) ( ?>toto <i>titi</i><? ) (Pdf/end_of_html_box(doc, Pdf/rectangle!(200., 300., 300., 200.))) |
XObject is a nice feature of PDF since it can hold a set of drawing routines and draw that XObject multiple times anywhere in the document each time with a particular transformation matrix (like an image).
XObjects are named object bound to the document so that they can be referenced by any page of any section (their value would default to "" when unspecified). We may define our own xobject by hand which is handled by the end_of_html_xobject constructor family. width, when given, specifies the width of the virtual rendering window, otherwise the width of the current page is taken :
| (Pdf/new_html_xobject(doc, "<p align=center>My XObject!</p>", "my_first_object", "a_value", 100.)) |
| (Pdf/print_in_html(doc)) ( ?><p align=center>My XObject!</p><? ) (Pdf/end_of_html_xobject(doc, "my_first_object", "a_value")) // page width rendering (Pdf/print_in_html(doc)) ( ?><p align=center>My XObject!</p><? ) (Pdf/end_of_html_xobject(doc, "my_second_object", 100.)) // 100pt width rendering |
| (Pdf/show_xobject(doc, "my_first_object", "a_value", Pdf/rectangle!(200., 300., 300., 200.))) |
- The following code is part of the source -
| [show_html_box(self:pdf_document, html:string, rect:rectangle, autofit?:boolean) -> (print_in_html(self), princ(html), let xname := uid() // unique name for the internal XObject in (end_of_html_xobject(self, xname, width(rect)), show_xobject(self, xname, rect, autofit?)))] |
Last, here is a complete sample that shows a custom xobject creation and its use in a transformed matrix. This sample creates a single page document with a formated HTML rendered in a rotated box centered in the middle of the page. A sroked red rectangle is added to show the unrotated box :
| // creates a new document, a section and a page DOC :: Pdf/document!() (Pdf/new_page(DOC)) // creates an XObject (Pdf/new_html_xobject(DOC, "<p border=1>toto <i>titi</i></p>", "xobj", 100.)) // the page rect pgrect :: Pdf/get_page_full_rect(DOC) // a 100pt by 100pt rectangle centered on 0.0 boxrect :: Pdf/rectangle!(100., 100.) // move to the center of the page (Pdf/move(DOC, width(pgrect) / 2., height(pgrect) / 2.)) // shows the HTML box unrotated in red (Pdf/stroked_rectangle(DOC, boxrect, 1., "red")) // rotation of 0.3 radians (Pdf/rotate(DOC, 0.3)) // shows our XObject in the rotated matrix (Pdf/show_xobject(DOC, "xobj", "", boxrect, true)) // save the document in file test.pdf (Pdf/print_in_file(DOC,"test.pdf")) |
| XObject and simple HTML formated boxes | categories Writer side - HTML/CSS renderer Main document HTML stream |
Headers and footer |
HTML streams may also be submitted for an entire section of a document without taking care of page creation. This solution relies on an auto page-break algorithm take would automatically create pages as required. Given a current section we would use print_in_html/end_of_html to submit a new HTML chunk, print_in_html/end_of_html are intended to be used multiple times, each time a new chunk would be appended to the section's stream :
| (printf("Appending HTML chunks to section ~A\n", Pdf/get_current_section(doc))) (Pdf/print_in_html(doc)) ( ?> first HTML chunk <? ) (Pdf/end_of_html(doc)) (Pdf/print_in_html(doc)) ( ?> next HTML chunk <? ) (Pdf/end_of_html(doc)) |
| Main document HTML stream | categories Writer side - HTML/CSS renderer Headers and footer |
Driving page break algorithm |
We may also define a header and a footer for the current section, both using HTML formatting :
| (printf("Initialize HTML header for section ~A\n", Pdf/get_current_section(doc))) (Pdf/print_in_html(doc)) ( ?>Page header<? ) (Pdf/end_of_html_header(doc)) |
| (printf("Intitialize HTML footer for section ~A\n", Pdf/get_current_section(doc))) (Pdf/print_in_html(doc)) ( ?>Page footer<? ) (Pdf/end_of_html_footer(doc)) |
We may call end_of_html_header (resp. end_of_html_footer) multiple times, this would define a different header for next pages :
| (Pdf/print_in_html(doc)) ( ?>Header first page<? ) (Pdf/end_of_html_header(doc)) (Pdf/print_in_html(doc)) ( ?>Header second page<? ) (Pdf/end_of_html_header(doc)) (Pdf/print_in_html(doc)) ( ?>Header third page and following<? ) (Pdf/end_of_html_header(doc)) |
| (print_in_css(DOC) ?> headerbody[section=body][page=1] {background-color: blue} <? end_of_css(DOC)) |
Notice that a header/footer may use special elements pagenum and pagecount that would be substituted by their actual value a the time of document generation.
| Headers and footer | categories Writer side - HTML/CSS renderer Driving page break algorithm |
Special HTML elements User area element |
The auto-break algorithm may be driven by a set of attribute allowed for any HTML element. These attributes are inspired by CSS. The goal of the auto-page-break algorithm is to maximize the amount of elements rendered on a single page. Notice that the page-break algorithm would only be applied on the main HTML stream, the one which is inserted with print_in_html/end_of_html. A set of the following attribute policies are used constrain the breaking policy of an element :
Other elements would have by default the "auto" policy for each before/inside/after attributes.
For instance if we had a paragraph that explain something followed by a table that illustrate that thing we may need to avoid page break inside and between the two elements, we would write :
| (print_in_html(doc) ?><p style='page-break-inside: avoid; page-break-after: avoid'> This paragraph shows key aspect concerning the following results... </p> <table border=1 style='page-break-inside: avoid'> <tr> <th>implementation<th>CPU<th>overhead <tr> <td>C++<td>12<td>4 ... </table><? end_of_html(doc)) |
Notice that the before policy and the after policy are someway related for a pair of contiguous elements, and a contradiction appends when an element avoids after page-breaks and that the following one requires before page-breaks (or vice versa). In case of such contradiction the avoid policy takes precedence over the always one.
| Writer side - HTML/CSS renderer Driving page break algorithm |
categories Special HTML elements User area element |
XObject element |
Our HTML dialect understand the special area element. An area element is defined by either a tuple (name, value) or user object. Areas are used as owner-draw boxes, a callback (fill_area restriction) is responsible to draw the area content :
| (print_in_html(doc) ?><area name="my_area" value="my_area_value" /><? end_of_html(doc)) |
| my_data_class <: ephemeral_object() DATA :: my_data_class() (print_in_html(doc) ?><area userdata="<?oid DATA ?>" /><? end_of_html(doc)) |
| Pdf/fill_area(self:pdf_document, val:{"my_area"}, val:{"my_area_value"}, wdth:float, hght:float) -> ... |
| Pdf/fill_area(self:pdf_document, userdata:my_data_class, wdth:float, hght:float) -> ... |
| Pdf/fill_area(self:pdf_document, userdata:my_data_class, wdth:float, hght:float) -> stroked_rectangle(self, rectangle!(0., hght, wdth, 0.), 1., "blue") |
Here is a complete sample that draws an area two times in boxes that will be lay-outed differently : creates a new document, a section and a page
| doc :: Pdf/document!() // userdata used for the fill_area callback my_data_class <: ephemeral_object() DATA :: my_data_class() Pdf/fill_area(self:pdf_document, userdata:my_data_class, wdth:float, hght:float) -> filled_rectangle(self, rectangle!(0., hght, wdth, 0.), "blue") // use our area in the main HTML stream (print_in_html(doc) ?><area userdata="<?oid DATA ?>" /> <table border=1> <tr> <td>cell <td>cell <tr> <td>cell <td> <area userdata="<?oid DATA ?>" /> </table><? end_of_html(doc)) // save the document in file test.pdf (Pdf/print_in_file(doc,"test.pdf")) |
| User area element | categories Special HTML elements XObject element |
pagenum/pagecount elements |
As seen above we can create XObjects (graphic objects that can be reference by any page of any section), these objet may be referenced for an HTML stream using the special element xobject :
| (Pdf/print_in_html(doc) ?><xobject name="my_first_object" value="a_value" /><? Pdf/end_of_html(doc)) |
| XObject element | categories Special HTML elements pagenum/pagecount elements |
Attachment element |
A PDF document can be seen as a page rendering device (also called 'page media' in CSS). This module uses a page-break algorithm to render an arbitrary HTML stream in multiple page. The actual page where would take place a given piece of HTML is a priori unknown until a document is generated, so does the amount of page this document will contain. The page index or page amount can however be inserted with the right value provided special lazy elements pagenum and pagecount. A good place to use these elements is in a header or a footer, for instance the following code will add a footer to the current section with the current page :
| (Pdf/print_in_html(doc) ?> <table width=100%> <tr> <td align=right> <pagenum>/<pagecount> </table> <? Pdf/end_of_html_footer(doc)) |
| pagenum/pagecount elements | categories Special HTML elements Attachment element |
Digital signature element |
An attachment file may be added to the document with the special attachment element. The file content is either inline (directly in the HTML stream), submitted by the fill_attachment callback or even given as the path of a file. In the two former cases we would have to supply a content attribute as the name of the embedded file. In the case of an inline attachment the content has to be escaped in order to handle special characters '<', '&' and '>'.
The mime type of the attached file is 'plain/text' by default or as specified by the mime-type attribute.
Such an attachment (vs. invisible attachment created by add_attachment) would have a visual appearance that default to 'PaperClip' that can be customized in the appearance attribute with a value in "Graph", "PushPin", "Paperclip", "Tag" (case sensitive).
For instance the following lines defines an attachment with name hello.txt that has an inline data and would default to a PaperClip appearance :
| (Pdf/print_in_html(doc) ?><attachment content="hello.txt"> <data>Hello world!</data> </attachment><? Pdf/end_of_html(doc)) |
| (Pdf/print_in_html(doc) ?><attachment filepath="/the/path/of/a/file" mimetype="mime/type" /><? Pdf/end_of_html(doc)) |
| my_attachment_data <: ephemeral_object() DATA :: my_attachment_data() fill_attachment(self:pdf_document, data:my_attachment_data, p:port) -> printf(p, "Hello world!") (Pdf/print_in_html(doc) ?><attachment content="hello.txt" mimetype="text/plain" userdata="<?oid DATA ?>" /><? Pdf/end_of_html(doc)) |
We may also support a user defined appearance for the attachment. Attachment are interactive features in the sense that the appearance may change its appearance depending on the user interaction. Three appearances may be defined :
| (Pdf/print_in_html(doc) ?><attachment content="hello.txt"> <data>Hello world!</data> <normal>Put your mouse here <i>(normal appearance)</i></normal> <rollover>Click here to select <i>(rollover appearance)</i></rollover> <down>Selected, click here to deselect <i>(down appearance)</i></down> </attachment><? Pdf/end_of_html(doc)) |
| Attachment element | categories Special HTML elements Digital signature element |
Index generator and outliner |
As for attachments, signatures may be defined with a visual appearance. In the following sample we assume that certificates have been created like in the sample in the "Invisible signature" category :
| (print_in_html(doc) ?><signature certificate=<?oid cert ?> key=<?oid cert_key ?> chain=<?oid list<Openssl/X509>(ca) ?> reason="I'm the author" location=my_city contact-info=0102030405> <normal>normal</normal> <rollover>rollover</rollover> </signature><? end_of_html(doc)) |
| Digital signature element | categories Special HTML elements Index generator and outliner |
Extensibility User defined HTML elements |
HTML titles (h1 .. h6) may be used to generate both an index and outlines for given section. The generated index would be appended to the current section so that we generally create a new section specially intended to the index. The upto_level parameter sets how deep will be the index : only title lower than or equal to upto_level will be part of the index. Here is a complete sample :
| doc :: Pdf/document!() // fill a document with titles... (print_in_html(doc) ?><h1>Introduction</h1> <h2>What is PDF ?</h2> <p>blablabla...</p> <h2>Design consideration</h2> <p>blablabla...</p> <? end_of_html(doc)) (print_in_html(doc) ?><h1>Writer side</h1> <h2>Low level API</h2> <p>blablabla...</p> <h2>HTML renderer</h2> <p>blablabla...</p> <? end_of_html(doc)) // create a new section for the index, // place this section before the body section (Pdf/new_section_before(doc, "toc", "body")) // insert the 'Index' title in the toc section (print_in_html(doc) ?><h1>Index</h1><? end_of_html(doc)) // generate the index of section body in section toc (generate_toc(doc, "body", 2)) // save the document in file test.pdf (Pdf/print_in_file(doc,"test.pdf")) |
| Special HTML elements Index generator and outliner |
categories Extensibility User defined HTML elements |
Blockquote implementation |
The dictionary of handled element may be extended. For a given element name we may define a substitution handler that describes the stream substituted in place of the element so that such element always relies on core elements. This is a convenient way to add new elements definition while being smoothly handled by the auto-layout and auto page-break engines. Substitution is used to implement block-quotes lists and bullet, and soonly you're own definitions...
The substitution is achieve inside general html element handlers that would defaults to all unknown elements. When a candidate restriction of substitution handlers (begin_substitution/end_substitution) is found for the requested element name, it is applied with an additional context arguments that may be handle with the above methods. Contexts are organized in a chained list, only made of substituted elements. This hides to the user the hierarchy of elements that are added by the substitution handler, the parent element from the context point of view is the first (if any) actual parent element that has itself been substituted.
The context also contain the table of attribute such that the substituted element may have its own attribute handling. The bracket notation may be used to get an attribute value. In some situation it is also necessary to transmit a value to a related sub element, in such situation one may use the context to store a per-context userdata (see lists implementation for a sample usage).
| User defined HTML elements | categories Extensibility Blockquote implementation |
Bullet implementation |
| Blockquote implementation | categories Extensibility Bullet implementation |
List implementation |
| Bullet implementation | categories Extensibility List implementation |
Reader side Loading and inspecting a PDF document |
| Extensibility List implementation |
categories Reader side Loading and inspecting a PDF document |
Attachment extraction |
| Loading and inspecting a PDF document | categories Reader side Attachment extraction |
Digital signature verification |
| Attachment extraction | categories Reader side Digital signature verification |
| categories | Rectangles | ephemeral | Pdf class |
rectangle is a simple class that represents a rectangular box with the same orientation as the page. It is holds the position of the four borders of the rectangle box.
| categories | Graphic state | inline | Pdf method |
alpha(self, _a) defines the current opacity %tage.
| categories | Graphic state | inline | Kernel method |
color(self, c) is equivalent to color(self, c[1], c[3], c[3]).
| categories | Graphic state | inline | Kernel method |
color(self, _r, _g, _b) sets the current color as an RGB value used for the fill operation.
| categories | Rectangles | inline | Pdf method |
deflate(r, d) deflates the supplied rectangle. Each border are moved of a distance d such that the rectangle area decreases.
| categories | Rectangles | inline | Pdf method |
deflate%(r, d%) deflates the supplied rectangle. Verticals border are moved of a distance d = d% * r.width / 100. such that the rectangle area decreases. So does the horizontal borders proportionally to the height.
| categories | Document creation | inline | Pdf method |
document!() creates a document instance with the format "A4" and no margins.
| categories | Document creation | inline | Pdf method |
document!(s, lndscp?) creates a document instance with the format s, with the landscape orientation when lndscp? is true and with no margins.
| categories | Document creation | inline | Pdf method |
document!(c, s, lndscp?) creates a document instance with the format s, with the landscape orientation when lndscp? is true and with no margins. c is a class that should inherit from pdf_document.
| categories | Sections and pages | inline | Pdf method |
get_current_page(self) returns the current page index in the current section of the given document.
| categories | Fonts and AFM file metrics | normal dispatch | Pdf method |
get_font_bold(self, num, b?) changes the current font and selects the font that have the same face as the font with id num and a bold attribute set (when b? is true) or not. If no such font exists in the system an attempt is made to load a font metrics file. The returned value is the id of the selected font.
| categories | Fonts and AFM file metrics | normal dispatch | Pdf method |
get_font_face(self, num, face) changes the current font and selects the font that have the same bold and italic attribute as the font with id num but a different face. If no such font exists in the system an attempt is made to load a font metrics file. The returned value is the id of the selected font.
| categories | Fonts and AFM file metrics | normal dispatch | Pdf method |
get_font_italic(self, num, b?) changes the current font and selects the font that have the same face as the font with id num and an italic attribute set (when b? is true) or not. If no such font exists in the system an attempt is made to load a font metrics file. The returned value is the id of the selected font.
| categories | Sections and pages | inline | Pdf method |
get_page_count(self) returns the amount of page in the current section of the given document.
| categories | Sections and pages | normal dispatch | Pdf method |
get_page_rect(self) return the rectangle that correspond to the given document's page format (not accounted of document's margins).
| categories | Sections and pages | normal dispatch | Pdf method |
get_page_rect(self) return the rectangle that correspond to the given document's page format accounted of document's margins.
| categories | Sections and pages | normal dispatch | Pdf method |
get_section_names(self) returns the list of section name that are currently defined in the given document.
| categories | Rectangles | inline | Pdf method |
height(r:rectangle) return the height of a rectangle (i.e. r.top - r.bottom).
| categories | Rectangles | inline | Pdf method |
inflate(r, d) inflates the supplied rectangle. Each border are moved of a distance d such that the rectangle area increases.
| categories | Rectangles | inline | Pdf method |
inflate%(r, d%) inflates the supplied rectangle. Verticals border are moved of a distance d = d% * r.width / 100. such that the rectangle area increases. So does the horizontal borders proportionally to the height.
| categories | Sections and pages | normal dispatch | Pdf method |
insert_page_after(self, pid) creates a new page and moves it just after the page with index pid.
| categories | Sections and pages | normal dispatch | Pdf method |
insert_page_after(self, pid) creates a new page and moves it just before the page with index pid in the current section of the given document.
| categories | Graphic state | inline | Pdf method |
line_cap(self, _m) sets the current line cap mode for the path drawing operation.
| categories | Graphic state | inline | Pdf method |
line_dash(self, lonoff) specifies a dash template with a list off ON and OFF lengths alternatively with no initial phase.
| categories | Graphic state | inline | Pdf method |
line_dash(self, non, noff) is equivalent to line_dash(self, non, noff, 0) (no phase).
| categories | Graphic state | inline | Pdf method |
line_dash(self, lonoff) specifies a dash template with a list off ON and OFF lengths alternatively with an initial phase of ph.
| categories | Graphic state | inline | Pdf method |
line_dash(self, non, noff, ph) sets the current line dash style for the path drawing operation. non specifies the length of an ON dash and noff the length of an OFF dash. ph specifies the phase.
| categories | Graphic state | inline | Pdf method |
line_join(self, _w) sets the current line join mode for the path drawing operation.
| categories | Graphic state | inline | Pdf method |
line_width(self, _w) set the current line width for the path drawing operation.
| categories | Graphic state | normal dispatch | Pdf method |
move(self, dx, dy) applies a translation with vector (dx,dy) on the current matrix.
| categories | Sections and pages | normal dispatch | Pdf method |
new_page(self) is either called by hand or remotely by another page creator API and would call the page_created callback once the page is properly inserted. The new page is always appended at the end of document with a new page index that is returned. At return the index is the amount of page in the document :
| categories | Sections and pages | normal dispatch | Pdf method |
new_section_after(self, secname, aft) creates a new section with name secname that is inserted to the list of doc's sections just after the section with name aft.
| categories | Sections and pages | normal dispatch | Pdf method |
new_section_before(self, secname, bef) creates a new section with name secname that is inserted to the list of doc's sections just before the section with bef.
| categories | Document creation | normal dispatch | Pdf method |
print_in_file(self, f) renders the current state of given document self in the PDF format and save it in a file.
| categories | Document creation | normal dispatch | Pdf method |
print_in_port(self, f) renders the current state of given document self in the PDF format and save it in a file.
| categories | Graphic state | normal dispatch | Pdf method |
rotate(self, sx, a) rotates the current matrix with an angle a
| categories | Graphic state | normal dispatch | Pdf method |
rotate(self, x, y, a) rotates the current matrix with an angle a around the origin (x,y).
| categories | Graphic state | normal dispatch | Pdf method |
scale(self, sx, sy) applies a scale factor on the current matrix. sx and sy represent the scale factor (1.0 for identity) in the direction X and Y of the current matrix.
| categories | Graphic state | normal dispatch | Pdf method |
scale(self, x, y, sx, sy) applies a scale factor on the current matrix at the position (x,y). sx and sy represent the scale factor (1.0 for identity) in the direction X and Y of the current matrix.
| categories | Sections and pages | normal dispatch | Pdf method |
set_current_page(self, pid) selects the page with index pid in the current section of the given document.
| categories | Graphic state | normal dispatch | Pdf method |
skew(self, a, b) skews the X axis by an angle a and the Y axis by an angle b (in radian).
| categories | Graphic state | normal dispatch | Pdf method |
skew(self, a, b) skews the X axis by an angle a and the Y axis by an angle b (in radian) from the origin (x,y).
| categories | Graphic state | inline | Pdf method |
stroke_color(self, c) is equivalent to stroke_color(self, c[1], c[3], c[3]).
| categories | Graphic state | inline | Pdf method |
stroke_color(self, _r, _g, _b) sets the current color as an RGB value used for the stoke operation.
| categories | Rectangles | inline | Pdf method |
width(r:rectangle) return the width of a rectangle (i.e. r.right - r.left).