Computer graphics, linux, and more from the cyberwitch's hut
I just spent an evening playing around with pdftk and it's form filling features, and since the process is a bit involved, I'll try to share my findings below.
For the purpose of filling PDF forms, pdftk has two commands that interest us:
dump_data_fields_utf8
and fill_form
. However, fill_form
expects a .fdf
file as input. That file is a basically a very specific kind of PostScript file
that is not realistically human writeable. For this reason, we have to add a
.fdf creation step to our agenda.
In the end, the high level view of the process looks like the following:
This step is pretty simple, pdftk does all the work:
pdftk form.pdf dump_data_fields_utf8 > fields.flds
If you have a look inside the generated fields.flds
file, you'll get a bunch
of fields metadata, formatted like the following:
---
FieldType: Text
FieldName: Adresse du domicile
FieldNameAlt: Indiquer l'adresse de votre domicile
FieldFlags: 8388610
FieldJustification: Left
---
FieldType: Button
FieldName: distinction Motif 1
FieldNameAlt: Motif 1
FieldFlags: 0
FieldValue: Off
FieldJustification: Left
FieldStateOption: Oui
FieldStateOption: Off
In there, the most interesting lines are FieldName
, because that's the name
that'll identify the field later while we're filling it, and in the case of a
button FieldStateOption
, because those are the available options you'll need
to chose from.
If you are interested in the FieldFlags
meanings, I compiled a reference in
here.
In order to generate the .fdf file, I chose to use the fdfgen python library. However, copying all the field names by hand to create the required array is just tedious, so we'll script this with a bit of awk:
#!/usr/bin/awk -f
BEGIN {
FS = ": ";
printf "fields = [";
}
/FieldName:/ {
# https://stackoverflow.com/a/23118210/5309963
printf "\n (\"%s\", \"\"),", substr($0, index($0,$2));
}
/FieldStateOption/ {
printf " # Opt: \"%s\"",$2;
}
END {
printf "\n]\n";
}
This script could be improved to detect the different FieldType
, or even add
a comment if the Required
flag is set.
Running this script on the fields.flds
file we generated previously should
give us a usable python data file fields.py
:
fields = [
...
("Adresse du domicile", ""),
("distinction Motif 1", ""), # Opt: "Oui" # Opt: "Off"
...
]
Now it's just a case of inputing our values in the second part of each tuple.
Taking a liberal amount of inspiration from the example code of fdfgen, we write the following script, in which the error checking is "left as an exercise to the reader":
#!/usr/bin/env python3
from fdfgen import forge_fdf
from fields import fields
import sys
fdf = forge_fdf(fdf_data_strings=fields)
with open(sys.argv[1], "wb") as f:
f.write(fdf)
Given the appropriate first argument, running this script should generate a .fdf file suitable for use with pdftk.
Now all the pieces are coming together! All that is left to do is to tell pdftk to take our brand spanking new .fdf file and use it to fill the original pdf form with all of our precious data:
pdftk form.pdf fill_form fields.fdf output filled_form.pdf
Lo and behold, the result:
And if you ended up here because of a genuine need to fill pdf forms via the CLI, I wish you a lot of luck, and May the Force be with you!
http://www.myown1.com/linux/pdf_formfill.shtml https://github.com/ccnmtl/fdfgen/