PDF fields with period "." do not work

Oct 8, 2014 at 8:46 PM
Great program, but i'm unable to populate pdf fields with a "." in their name (ie: First.Name). Any solution or workaround would be greatly appreciated.

Thank you!

Jeremy
Coordinator
Oct 10, 2014 at 2:08 PM
Edited Oct 20, 2014 at 11:42 AM
I believe this is a limitation of the PDF Form Field capabilities, but I'm not 100% sure. It is my understanding that fillable PDF forms must have field names that follow XML element name guidelines and that they're case sensitive. Here are some rules for PDF Field Names:
  1. Names can contain letters, numbers, and other characters
  2. Names cannot start with a number or punctuation character
  3. Names cannot contain spaces
Though there are some additional guidelines about XML that you should be aware of:
  • Avoid "-". If you name something "first-name", some software may think you want to subtract name from first.
  • Avoid ".". If you name something "first.name", some software may think that "name" is a property of the object "first."
  • Avoid ":". Colons are reserved to be used for something called namespaces.
I think in this case, the PDF file format doesn't support the . char in field names. That, or the PDF handling library that is used by this project doesn't. In any case, my software treats the entire name as a single string so there's nothing I can do to rectify the behavior.
Oct 10, 2014 at 7:47 PM
These are government created pdf forms that i cannot change... Funny thing is that it hasn't been an issue for any other excel to pdf mail merge process that I've tried; i was just hoping to use yours as it is the simplest, most elegant program i could find for the purpose. Thank you for the detailed response.
Dec 23, 2014 at 1:09 AM
I think this might be the answer to my problem with check boxes (question posed in another thread). The fields contained periods.
Mar 9, 2015 at 4:52 PM
Edited Mar 9, 2015 at 4:53 PM
Hi all,

I think I'm also running into same problem with IRS XFA forms that have duplicate field names.
With other programs, I was able to use the fully qualify field names using names like: X[0].Y[0].F and A[0].B[0].F
But it doesn't work with this program. I find this program to be most elegant to use as well.
Coordinator
Mar 10, 2015 at 7:08 AM
@19345y85 - Sorry, I don't think this software can support arrays of PDF form fields like that. The problem is that PDFs don't inherently name duplicate fields with [0] or [1] like you're used to using. PDFs just have fields with the same name. Actually, if I recall correctly, Adobe Acrobat actually references these with a #0 suffix. If you were to have two fields with the name "X", then Acrobat references them as X#0 and X#1 and such.

I do not believe that #0 will solve your problem, just trying to make the point that these are syntaxes manufactured by your PDFs reader, and not native to the PDF file format.


In theory, I could try to figure out how to support array notation like [0]. But it's not high on the priority list.
Mar 10, 2015 at 4:10 PM
Edited Mar 10, 2015 at 4:16 PM
Hi TigerC10,

I think what you are referring to may apply to old PDF data formats. The forms we are using are the new XFA (XML) PDFs. I downloaded the project and was able to find that the DataTable data reader was replacing the [ ] and . notations from Template header names to ( ) and # .
I tried a quick fix in Merger.cs:
                        // DataTable structure was replacing X[0].Y[0] to X(0)#Y(0), so we need to replace it back to handle fully qualified field names of XFAs
                        pdfFormFields.SetField(field.ToString().Replace('(', '[').Replace(')', ']').Replace('#','.'), row[field].ToString());
and looks like it's working.
I've submitted the new Merger.cs as a patch to this project.

Thanks for your help in making a great product. Now, we can generate thousands of employee IRS forms without issue from our Cognos report!
Mar 10, 2015 at 5:43 PM
Slightly modified the replace so that it should be able to handle the older PDF formats with field names like foo#1 and foo#2.
                        // DataTable structure was replacing X[0].Y[0] to X(0)#Y(0), so we need to replace it back to handle fully qualified field names of XFAs
                        pdfFormFields.SetField(field.ToString().Replace('(', '[').Replace(")#", "].").Replace(')',']'), row[field].ToString());
I've re-uploaded the Merger.cs file with this change
Coordinator
Mar 11, 2015 at 7:18 AM
Thanks for submitting your patch!

I took a look at this and I found that it's not the DataTable structure that's doing it. It's Microsoft's OleDbDataAdapter that's converting the square brackets.

See: http://stackoverflow.com/questions/13882846/import-a-excel-data-to-the-datatable-via-oledbdataadapter-replaces-square-bracke

Therefore, the fix for this belongs in the DataReaderExcel.cs rather than Merger.cs but I think you're on the right track.

Your patch is, however, a breaking change - I believe that current users of this app can use parenthesis in their Excel column names and PDF field names (though I haven't confirmed it yet). If this change were to be made in the app - then at the very least we should write a new test case to confirm that parenthesis in field names are not negatively impacted by this new code.

I think more thought needs to be put into how we address the OleDbDataAdapter problem. I'll continue to think it over, but in the mean time I'm declining the patch for now. Please, by all means, feel free to move the fix to the correct file and add a new test case and submit again.
Mar 11, 2015 at 10:20 PM
Edited Mar 11, 2015 at 10:21 PM
I've resubmitted patch with new code to read in actual header names.
Basically, I just stored the field names in the Caption property of the DataTable and reused it in the Merger.cs

Hopefully, this can help those like me who wanted to use [] . notations in merging with XFA pdfs... and still satisfy your requirement of allowing foo(x) field names in pdfs.

Thanks,
Coordinator
Mar 12, 2015 at 12:09 AM
Edited Mar 12, 2015 at 12:18 AM
I see what you did there! This is looking better. I'm still not a fan of leaking the logic into the Merger.cs though, mostly because "the dream" was to eventually write a CSV parser that wouldn't have these issues. Merger.cs would still execute on a DataSet that could be read in more easily, and I don't want to force a Caption to be set for every record.

Is there a reason you're creating a second OLEDB connection? You should just be able to re-use the existing connection "workbook" right?

So it seems that the DataColumn doesn't like to see [ and ] in a ColumnName... That's why OleDbDataAdapter has to convert the name's invalid chars. That makes sense now.

So then... Rather than using a caption... we could use the column's ExtendedProperties to set an alias for the column. Then if, and only if, the alias property is set for the DataColumn, we use the alias value rather than the column name itself. Starting to like this a lot.