Creating a table in PDF

Or maybe a story about fonts...

The goal was simple, I want to print bible verses on a table so I can print out multiple handouts on a single page. It would look something like the image below.

The idea struck me on the way to Church Sunday morning to use postscript to create the pdf. In my mind I could easily accomplish it in a couple hours that afternoon. After wolfing down my lunch I hurried to my computer to begin coding. The first issue I encountered is that I didn't remember the name postscript and the only thing I could type was typescript. After searching on Bing for awhile I found this video about postscript by "John's Basement." I quickly learned I needed an interpreter for the language. I chose GhostScript because it was the first result on Bing.

Armed with the documentation and software I needed I was ready to start coding. But then I thought, "Hey wouldn't it be great if I could put this on my website?" Now, if I put it on my website, I would want to create a preview of what the PDF would look like. I found a great blog showing how to use the html canvas element to 'real time' render postscript. In reality the blog demonstrates functions in javascript which perfectly imitate postscript statements. (I do not link it here, because last I checked it was compromised by an injection attack.) But then, if I had a preview I would need to be able to scale the preview so it would fit on my screen. But then I would need to be able to change the scale dynamically because some users have much larger screens. By now I should have noticed the feature creep but without another thought I began to code.

I quickly began implementing my plans. My webserver runs the php apache docker container, which uses Debian as the OS. I installed GhostScript in the container and will call it from php using shell_exec. I copied some postscript examples and verified it worked. I also copied some html canvas examples. Now, there were a few steps I needed to do.

Measure the width and height of a cell

Calculate how many cells I can fit in a page

Measure the width and height of a line

Measure how many words can fit in a line and lines in a cell

Cell Size and Placement

Understand that in html the canvas element has a 2d context I call cvc (canvas context). This is the object that is used to draw on an html canvas. I include the code below to give come context to where it comes from, but all that is necessary to know is that cvc contains functions I call to draw on the html canvas.

var cv = document.getElementById("myCanvasId");
var cvc = cv.getContext("2d");

Now to draw a rectangle using this cvc it requires an x,y coordinate and width and height of the rectangle.

cvc.strokeRect(x,y,width,height);

Now in postscript drawing a rectangle is slightly different. First I "moveto" a location to start drawing. Once there rlineto is used to draw a line. So "300 0 rlinteto" draws a line 300 to the right and 0 on the vertical. Now remember that at this point the "cursor" has moved from 6, to 306 on the x axis. Next "0 262.5 rlineto" moves on the y axis 262.5 and now the "cursor" is at (306, 264.75). The next statement "closepath" indicates we're done drawing and then "stroke" draws it.

%!PS
6 2.25 moveto
300 0 rlineto
0 262.5 rlineto
-300 0 rlineto
0 -262.5 rlineto
closepath
stroke

So I wrapped this up into a function that will call strokeRect and then return a string of postscript code. Since I was lazy I added a textarea element and just appended the plaintext of the postscript there. The canvas rendered as expected. Then I copied the postscript and rendered it with ghostscript only to find the boxes did not line up correctly. The issue was fonts. The html canvas element measures distance in pixels but postscript measures in points (pt). Yes the same points you use to define font sizes. Fortunately there are W3C standards on conversion between pixels and pts, so I wrote some conversion functions.

function pt2px(pt) {
    // conversion based on W3C
    return pt * (4/3);
}
function px2pt(px) {
    // conversion based on W3C
    return px / (4/3);
}

Now when producing the postscript plaintext all I needed to do was call these conversion functions. And with that I could easily define the width and height of my boxes. From there it was some basic math and for loops to add as many cells as possible.

Words and Line Sizes

While I am sure there are more efficient solutions, my plan to fit the text into the cells was simple. I would split the input into words, then keep adding words to a new line until its length was greater than the length of the cell. At that point I would add another line. Instead of not allowing lines to exit the cell vertically, I would let the user input text with more lines than fit in a cell. When a user sees their text going past the bottom of the cell I believe it is a very clear and succinct indication to the user their input is too large.

There was a problem. My plan didn't work, the words consistently went off the sides of cell. My measurement of words was off. I took the size in pixels (px) and multiplied it by the number of letters. The issue was fonts. The size of a font is its height, not the width. The width of each letter must be measured. My mind was spinning with the complexity of getting a unique measurement for every character. Fortunately, the html canvas context has a function to do this for me.

cvc.measureText(text).width;

Now the fonts were wrapping correctly and I was much closer to my goal.

Multiple Pages

Next I wanted to add multiple pages. I leave out the details of creating the javascript because it is both complex and nothing exciting happened, until I tried to create a PDF from the postscript plaintext. When creating multiple pages the cells simply never ligned up. I checked my ghostscript version and it was very out of date. So I tried to install it from the Debian package repository, which reported I had the latest version. I think checked the version of my Debian container. It had not been updated in seven years. One week later, on an updated docker container and updated ghostscript version the issue persisted. The problem was that I didn't specify the page size in the postscript code.

%!PS
<< /PageSize [612 792] >> setpagedevice % Set page size to 8.5x11 
%%Page: 1 1
6 2.25 moveto
300 0 rlineto
0 262.5 rlineto
-300 0 rlineto
0 -262.5 rlineto
closepath
stroke
...

And thats it, I was done, everything worked. Until it didn't.

The Problem is Fonts

I foolishly added a drop-down selection box to allow users to choose fonts. To determine which fonts to add, I ran a command to get ghostscript to write to a file a list of supported fonts. I then copied this into javascript to create a selection of a lot of different fonts. I noticed the preview didn't work for all of those fonts but I knew they were supported. When the canvas is provided a font it does not recognize it uses the default font. So I shrugged it off and added a note which said "Preview may not support all fonts." But this had a problem. Remember earlier how I measured the width of a line?

cvc.measureText(text).width;

If you haven't pieced it to gether yet, my code uses the preview to measure the length of a line. So when ghostscript renders the pdf with the correct font the alignment is wrong. I needed to come up with a list of fonts ghostscript and the browser would support. But that has a problem, not all browsers support the same fonts. Now I am looking at have different lists for different browsers. Not satisfied with that answer I spent the following week scouring Bing and Google for any solution.

Google Web Fonts

Google Fonts are fonts Google created to solve this problem. It is a free service from which fonts can be included into webpages. The Google Fonts website will generate the html required to include the font in a html page for you. I provide an example here.

<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=ABeeZee&family=Roboto&display=swap" rel="stylesheet">

But the work isn't done just yet, because that is just two fonts. And while I want many fonts, the provided solution loads the fonts whent he page is loaded. I only want to load fonts if the user selects that font. To restate that, I need to find a list of google fonts and a list of urls to fetch those fonts. Glossing over many hours of hopeless searching I found a github repository, google-fonts-complete, with json lists of exactly what I wanted. Well not exactly what I wanted. That repository has two lists, api-response.json and google-fonts.json. google-fonts.json is 11mb where api-response.json is much smaller. The api-response.json does not have urls, so if I used it I would have to construct it. But that was simple enough. I took the first example from stackoverflow of how to dynamically load a font in javascript, and threw it in a function that is called whenever the user changes a font.

// family example: "ABeeZee"
function createFontLink(family) {
    var link = document.createElement("link");
    link.href = "https://fonts.googleapis.com/css2?family=" + family + "&display=block";
    link.rel = "stylesheet";
    document.head.appendChild(link);
}

This worked, kind of. The font was pulled from Google, but the preview didn't change fonts. If I changed the size of the font (which redraws the preview) three times the preview would change. If I changed away from and back to the font three times it would change. I learned that there are two ways to fetch a font in javasript and I chose the wrong one. I needed to use FontFace, but there is a problem with that. FontFace does not use the nice google fonts api. It requires a url that is not easy to construct, a url to a binary font file. The url of ABeeZee looks like this:
https://fonts.gstatic.com/s/abeezee/v22/esDT31xSG-6AGleN2tCUkp8D.woff2
This means I have to resort to the 11mb google-fonts.json. Using python I trimmed out all of the data I didn't need, only capturing the woff2 file. I then replace my old javascript with the new FontFace constructor.

// family example: "ABeeZee"
// url example: "https://fonts.gstatic.com/s/abeezee/v22/esDT31xSG-6AGleN2tCUkp8D.woff2"
// page example: 1
function createFontLink(family, url, page) {
    var newFont = new FontFace('"' + family + '"', "url(" + url + ")");
    newFont.load().then((font) => {
        document.fonts.add(font);
        LOADED_FONTS.push(family); // I store loaded fonts so I don't load them twice.
        addWords(page); // This redraws the page
    });
}

And just like that the fonts in the preview are changing! Now that the preview works, I need to install google fonts on my Debian system and get ghostscript to recognize them.

Font Files

With fonts now loading I gleefully showed it off to my friends. They told me the font wasn't changing. My heart sank as I realized I only tested it in Firefox. Infact, it only worked in Firefox. Consulting google, it appears that woff2 isn't supported very well and most browsers are still using woff. So with the below change I finally got it working. In my list of font names and urls I only included woff2, so I had to go back and recreate the structure. Once that was done I had to update the new FontFace method with multiple urls. Supposedly the browser will only load the file it needs so I could include all the font files. But at this point I don't trust the browser with fonts, so I only included woff and woff2. With these two changes it now changed font on Edge, Firefox, and Chrome.

function createFontLink(family, url, page) {
    var fontName = option.innerHTML;
    var fontUrls = ["url(" + option.getAttribute("woff") + ")","url(" + option.getAttribute("woff2") + ")"];
    var newFont = new FontFace(fontName, fontUrls);
    newFont.load().then((font) => {
        document.fonts.add(font);
        addWords(page);
    });
    document.fonts.add(newFont);
    LOADED_FONTS.push(option.innerHTML);
}

Installing Fonts

I found a bash script that promised to install google-fonts on linux. It didn't work out of the box, I had to mark several changes to get it to install correctly. But then I had to add them to ghostscript. A simple process in theory. There is a file (in Debian that is), /usr/share/ghostscript/10.00.0/Resource/Init/Fontmap.GS
That file contains a list of fonts and to add a font I just need to add a line to the end like below:

/AbyssinicaSIL	(/usr/share/fonts/truetype/google-fonts/AbyssinicaSIL-Regular.ttf)	;

Don't let the simple looks fool you. Its not, fonts are a problem. The font name isnt "AbyssinicaSIL", it is "Abyssinica SIL". Now the font file isnt either of those, the font file is "AbyssinicaSIL-Regular.ttf" It gets more complicated still, because there are other font files like "AbyssinicaSIL-Italic.ttf", "WorkSans[wght].ttf", "Snippet.ttf", and so on. Basically, not only do I have to figure out the truncated name, I have to figure out which version of the font file to use. So I use a python script. font_convert.py reduces the google-fonts.json to just the elements I need. font_names.py creates the configuration lines needed to add the fonts to the ghostscript font map. It searches for the three most common "regular" font file names "*-Regular.ttf","*[wght].ttf", and "*[wdth,wght].ttf". If it doesn't find any of those three it prompts the user with a list of matching files and the user selects the one to use. At the end of the script it prints out a list of font names that could not be traced to a font file. Instead of trying to find those files, I chose instead to just exclude them from the list of font options. You can see this in font_convert.py where I declare a list of "unsupported_fonts."

# font_convert.py
# I used this script to generate a list of family font names and woff2 urls of google web fonts.
#  I grabbed the google-fonts.json from this repo: https://github.com/jonathantneal/google-fonts-complete?tab=readme-ov-file
#  I do this because the google-fonts.json is 11mb, but what I need from it is only a couple hundred kb.
#  Once the list is produced I copy it into the javascript where I want to use it.
import json
unsupported_fonts = ["Arima Madurai","Coda Caption","IM Fell DW Pica","IM Fell DW Pica SC","IM Fell Double Pica SC","IM Fell English","IM Fell English SC","IM Fell French Canon","IM Fell French Canon SC","IM Fell Great Primer","IM Fell Great Primer SC","Kumar One Outline","M PLUS Rounded 1c","Material Icons","Material Icons Outlined","Material Icons Round","Material Icons Sharp","Material Icons Two Tone","Material Symbols Outlined","Material Symbols Rounded","Material Symbols Sharp","Old Standard TT","PT Mono"];
woffUrls=[]
with open("google-fonts.json","r") as fid:
    obj = json.load(fid)
    for key in obj:
        if (not key in unsupported_fonts):
            #print(obj[key]["variants"])
            if ( not "normal" in obj[key]["variants"].keys()):
                url = obj[key]["variants"][list(obj[key]["variants"].keys())[0]]["400"]["url"]["woff2"]
                woffUrls.append([key, url])
                pass
            else:
                if ( not "400" in obj[key]["variants"]["normal"].keys()):
                    url = obj[key]["variants"]["normal"][list(obj[key]["variants"]["normal"].keys())[0]]["url"]["woff2"]
                    woffUrls.append([key, url])
                    pass
                else:
                    url = obj[key]["variants"]["normal"]["400"]["url"]["woff2"]
                    woffUrls.append([key, url])
                pass
    print(woffUrls)

# font_names.py
        # attempts to match
        import json
        import sys
        
        output = sys.argv[1]
        with open(output,"w") as output_file:
        
            font_names=[]
            with open("google-fonts.json","r") as fid:
                obj = json.load(fid)
                for key in obj:
                    font_names.append(key)
            file_names=[]
            with open("font_filenames.txt","r") as fontfiles:
                file_names = fontfiles.readlines()
        
        
            def normalize_name(name):
                # Normalize the name: remove spaces, convert to lowercase, and remove non-alphanumeric characters
                return ''.join(e for e in name if e.isalnum()).lower()
        
            # Dictionary to store matched pairs with lists
            matched_pairs = {font_name: [] for font_name in font_names}
        
            # Normalize and match
            for font_name in font_names:
                normalized_font_name = normalize_name(font_name)
                for file_name in file_names:
                    normalized_file_name = normalize_name(file_name)
                    if normalized_font_name in normalized_file_name:
                        matched_pairs[font_name].append(file_name)
        
            # Print matched pairs
            i = 0
            misses=[]
            for font_name, files in matched_pairs.items():
                found = False
                for file in files:
                    if (normalize_name(font_name+"-Regular") in normalize_name(file) 
                        or normalize_name(font_name+"[wght].ttf") in normalize_name(file)
                        or normalize_name(font_name+"[wdth,wght].ttf") in normalize_name(file) ):
                        nospace=font_name.replace(" ","")
                        nort=file.replace("\n","")
                        nort=nort.rstrip()
                        output_file.write(f"/{nospace}\t(/usr/share/fonts/truetype/google-fonts/{nort})\t;\n")
                        found=True
                        break
                if not found:
                    if len(files) == 1:
                        print(f"[{font_name}] Only one indirect match. Setting to {files[0].rstrip()}")
                    elif len(files) > 0:
                        query_string = f"[{font_name}]No *-Regular, *[wgth].ttf, or *[wdth,wght].ttf file found, choose one of the following: "
                        for i in range(0, len(files)):
                            query_string += (f"\n{i}: {files[i].rstrip()}")
                        query_string+="\n"
                        opt=""
                        while (not opt.isdigit() or int(opt) > 0 and int(opt) > len(files)):
                            opt = input(query_string)
                        intOpt = int(opt)
                        output_file.write(f"/{nospace}\t(/usr/share/fonts/truetype/google-fonts/{files[intOpt].rstrip()})\t;\n")
                    else:
                        misses.append(font_name)
                        print(f"[{font_name}] no matches in [{files}]")
                    pass
            
                print("could not find the following files")
                print(misses)

To say that I am glossing over a few steps would be an understatement. In my attempt to match up font names with their respective font files I encountered several exceptions. Some of those I have not resolved, as it stands the above python does not work as intended. Several font names didn't match up with any font files, so I added a list of font names to exclude. Some fonts are matching to their "-Italic" files instead of the correct "-Regular" files. But at this point I am dealing with familiar problem that is quite mundane.

Success

I am very impressed with the capabilities of the html canvas element. Having fonts provided for free by Google is deserving of praise. I belive if I were to continue adding features and refining my PDF Table project it could eventually turn into a text editor. But we have Google Docs for that. Outside of correcting lingering issues with fonts, which are the problem, I belive this project is a success.