VBA interacting with PDF

kit_sune

New member
Joined
Jan 14, 2014
Messages
6
Reaction score
0
Points
0
Excel Version(s)
2010
I was wondering if there was a way to list out the names of the fields within a pdf form. In my office we are required to retain a specific form for up to a year. We review these forms on a quarterly basis to verify the users of our database. It's a tedious process. We don't know the names of any of the fields however. Eventually I would like to iterate through all the pdf files in a folder (something I'm already doing to list the names of the files). My first focus is to get the names of the fields from the pdf so I can actually work with them. I pulled (and modified) the code below from another website while trying to figure out if this is possible.

I had to enable the Adobe Arobat 10.0 Type library. I know there are errors in the code below but I wanted to illustrate what the goal is. When I run it, it tells me "Object required"

Code:
Private Sub ListPDFFieldNames()
    Dim AcroApp As Acrobat.CAcroApp
    Dim theForm As Acrobat.CAcroPDDoc
    Dim jso As Object
    Dim rowNum As Integer
    
    Set AcroApp = CreateObject("AcroExch.App")
    Set theForm = CreateObject("AcroExch.PDDoc")
    theForm.Open ("C:\Users\me\Documents\sampleForm.pdf")
    Set jso = theForm.GetJSObject
    
    rowNum = 1
    For Each field In jso
        Range("A" & rowNum) = field.Name
        rowNum = rowNum + 1
    Next field
    
    theForm.Close

    AcroApp.Exit
    Set AcroApp = Nothing
    Set theForm = Nothing
    MsgBox "Done"
End Sub

Thank you for your time.
~Kit
 
Welcome to the forum!

Obviously, the Acrobat (not the Reader) object is the best 3rd party program to do that. As such, you must reference Acrobat int Tools > References... which looks like that was what you want and did.

I have not used this method in a while. Note the 2nd object to reference. Note how that after a run, the Debug.Print results will be in the Immediate window.
Code:
' http://www.excelhero.com/blog/2010/04/excel-acrobat-pdf-form-filler.html
Public Sub ListPDF_Fields()
             
    Dim AcroExchAVDoc As CAcroAVDoc
    Dim AcroExchApp As CAcroApp
    Dim AFORMAUT As AFORMAUTLib.AFormApp
    Dim FormField As AFORMAUTLib.Field
    Dim FormFields As AFORMAUTLib.Fields
    Dim bOK As Boolean
    Dim sFields As String
    Dim sTypes As String
    Dim sFieldName As String
                             
    ' For this procedure to work, computer must have a full version
    ' of Adobe Acrobat installed. Also, a reference to the following
    ' Type Libraries must be made:
    '     AFormAut 1.0
    '     Adobe Acrobat 7.0 (or newer)
    
    On Error GoTo ErrorHandler
    
    Set AcroExchApp = CreateObject("AcroExch.App")
    Set AcroExchAVDoc = CreateObject("AcroExch.AVDoc")
    bOK = AcroExchAVDoc.Open(ActiveWorkbook.Path & "\" & PDF_FILE, "")
    AcroExchAVDoc.BringToFront
    AcroExchApp.Hide
    
    If (bOK) Then
        Set AFORMAUT = CreateObject("AFormAut.App")
        Set FormFields = AFORMAUT.Fields
        For Each FormField In FormFields
            With FormField
                sFieldName = .Name
                If .IsTerminal Then
                    If sFields = "" Then
                        sFields = .Name
                        sTypes = .Type
                    Else
                        sFields = sFields + "," + .Name
                        sTypes = sTypes + "," + .Type
                    End If
                End If
            End With
        Next FormField
        AcroExchAVDoc.Close True
    End If
    Debug.Print sFields
    Debug.Print sTypes
   
    Set AcroExchAVDoc = Nothing
    Set AcroExchApp = Nothing
    Set AFORMAUT = Nothing
    Set Field = Nothing
    Exit Sub
         
ErrorHandler:
    MsgBox "FieldList Error: " + str(Err.Number) + " " + Err.Description + " " + Err.Source
    
End Sub
 
Wonderful!! This worked so well! I didn't know what the Immediate window was beforehand, but I figured it out. I appreciate your help!

~Kit
 
Ok, on to the next step!

The end goal is to iterate through the PDF files in a folder and populate a range within Excel with four specific values from each pdf file (At the moment I have just two). I combined pieces from three separate projects that I had to try to accomplish this. I'm not fully sure what I'm doing as I'm still learning all this stuff, but below is what I've put together so far. I've marked where excel errors out at. The error message I get is this:
"FieldList Error: 91 Object variable or With block variable not set VBAProject"

I'm not entirely sure what it means however, could someone point out what I need to address to change this?
Thank you...
~Kit
Code:
Private Sub ListPDFData()
    ' For this procedure to work, computer must have a full version
    ' of Adobe Acrobat installed. Also, a reference to the following
    ' Type Libraries must be made:
    '     AFormAut 1.0
    '     Adobe Acrobat 7.0 (or newer)
             
    Dim AcroExchAVDoc As CAcroAVDoc
    Dim AcroExchApp As CAcroApp
    Dim AFORMAUT As AFORMAUTLib.AFormApp
    Dim FormField As AFORMAUTLib.Field
    Dim FormFields As AFORMAUTLib.Fields
    Dim bOK As Boolean
    Dim theForm As Acrobat.CAcroPDDoc
    Dim objFSO As Object
    Dim objFolder As Object
    Dim objFile As Object
    Dim i As Integer
    Dim dir As String
    Dim jso As Object

    'Create an instance of the FileSystemObject
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    'Get the folder object

    Set objFolder = objFSO.GetFolder("C:\Users\me\Documents\TEMP")
    Set theForm = CreateObject("AcroExch.PDDoc")
    
    i = 2

    
    On Error GoTo ErrorHandler
    
    Set AcroExchApp = CreateObject("AcroExch.App")
    Set AcroExchAVDoc = CreateObject("AcroExch.AVDoc")
    'loops through each file in the directory
    For Each objFile In objFolder.Files
        bOK = AcroExchAVDoc.Open(objFile, "")
        AcroExchAVDoc.BringToFront
        AcroExchApp.Hide
        Set jso = theForm.GetJSObject
        If (bOK) Then
            Set AFORMAUT = CreateObject("AFormAut.App")
            Set FormFields = AFORMAUT.Fields
    
'The run errors out here************************************
            Cells(i, 1) = jso.getField("name").Value

            Cells(i, 2) = jso.getField("userid").Value

            Cells(i, 3) = jso.getField("clr_level").Value

            Cells(i, 4) = jso.getField("userdate").Value
            AcroExchAVDoc.Close True
            i = i + 1
        End If
    Next objFile
   
    Set AcroExchAVDoc = Nothing
    Set AcroExchApp = Nothing
    Set AFORMAUT = Nothing
    Set Field = Nothing
    Exit Sub
         
ErrorHandler:
    MsgBox "FieldList Error: " + Str(Err.Number) + " " + Err.Description + " " + Err.Source
End Sub
 
Last edited:
I see 3 problems, maybe. (1) Is it a PDF file that was found? (2) Does it have a filed named name? (3) PDF file was flattened.

There are ways to just iterate PDF files. For now, you can try looking at the file extension. e.g.
Code:
'in the for loop
If Right(lcase(object), 3)="pdf" then ...
'or
If lcase(object) Like "*.pdf" then ...

One could get all of the content of the PDF but if the file was not flattened though it sounds like fields might work for you.
 
I'm not sure what you mean by a flattened file - is that where you print the pdf to make it uneditable, like a big picture? Each of the two files are still editable (Although they have been signed digitally).

I first tested to see if I could pull the specific data fields with the following code, which worked like a charm:

Code:
Private Sub ReadPDF()
    Dim AcroApp As Acrobat.CAcroApp
    Dim theForm As Acrobat.CAcroPDDoc
    Dim jso As Object
    Dim text1, text2, text3, text4 As String
    Set AcroApp = CreateObject("AcroExch.App")
    Set theForm = CreateObject("AcroExch.PDDoc")
    theForm.Open ("C:\Users\me\Documents\TEMP\form1.pdf")
    Set jso = theForm.GetJSObject
    ' get the information from the form fields
    text1 = jso.getField("name").Value
    text2 = jso.getField("userid").Value
    text3 = jso.getField("clr_level").Value
    text4 = jso.getField("userdate").Value

    MsgBox "Values read from PDF: " & text1 & " " & text2 & " " & text3 & " " & text4
    theForm.Close
    AcroApp.Exit
    Set AcroApp = Nothing
    Set theForm = Nothing

End Sub
I did test out your lcase (second option) and was still receiving the same error as before at the same spot.
Could it be the "Cells" part that's not compatible with the rest of it?
 
Last edited:
Yes, flattened means no text readable content.

The only problem that I see with cells is that "i" was not Dimmed as LONG. For most cases, that will not be a problem. These days, even integers are ok to be Dimmed as LONG.

I stole, errrr have, a routine that sort of does what this macro does but iterates all pdf files in a folder and shows field information in a worksheet.

Here is some code that you have probably seen. You have to change the pdf filename and the ThisWorkbook.Path if that is not the path to the pdf file. Otherwise, it does a nice job showing your field information less the values in the Immediate window after a run. Enable it in the VBE View menu if not active. I use it often.

Code:
Public Sub ListPDF_Fields()             
    Dim AcroExchAVDoc As CAcroAVDoc
    Dim AcroExchApp As CAcroApp
    Dim AFORMAUT As AFORMAUTLib.AFormApp
    Dim FormField As AFORMAUTLib.Field
    Dim FormFields As AFORMAUTLib.Fields
    Dim bOK As Boolean
    Dim sFields As String
    Dim sTypes As String
    Dim sFieldName As String
    Dim PDF_FILE As String
    
    PDF_FILE = "aspiretest.pdf"
                             
    ' For this procedure to work, computer must have a full version
    ' of Adobe Acrobat installed. Also, a reference to the following
    ' Type Libraries must be made:
    '     AFormAut 1.0
    '     Adobe Acrobat 7.0 (or newer)
    
    On Error GoTo ErrorHandler
    
    Set AcroExchApp = CreateObject("AcroExch.App")
    Set AcroExchAVDoc = CreateObject("AcroExch.AVDoc")
    bOK = AcroExchAVDoc.Open(ThisWorkbook.Path & "\" & PDF_FILE, "")
    AcroExchAVDoc.BringToFront
    AcroExchApp.Hide
    
    If (bOK) Then
        Set AFORMAUT = CreateObject("AFormAut.App")
        Set FormFields = AFORMAUT.Fields
        For Each FormField In FormFields
            With FormField
                sFieldName = .Name
                If .IsTerminal Then
                    If sFields = "" Then
                        sFields = .Name
                        sTypes = .Type
                    Else
                        sFields = sFields + "," + .Name
                        sTypes = sTypes + "," + .Type
                    End If
                End If
            End With
        Next FormField
        AcroExchAVDoc.Close True
    End If
    Debug.Print sFields
    Debug.Print sTypes
   
    Set AcroExchAVDoc = Nothing
    Set AcroExchApp = Nothing
    Set AFORMAUT = Nothing
    'Set Field = Nothing
    Exit Sub
         
ErrorHandler:
    MsgBox "FieldList Error: " + Str(Err.Number) + " " + Err.Description + " " + Err.Source
    
End Sub
 
Hello Kenneth and Kit,

I am attempting to run the ListPDF_Fields() and get a the following error. Screen Shot 2019-12-20 at 12.33.46 AM.png

The file is located on my desktop at C:\Users\braydenfarnsworth\Desktop and is named ARAA Form. Any help would be greatly appreciated!
 
Welcome to the forum!

Is that path in the current workbook and named aspiretest.pdf as set in my example? If you changed those parts, I suspect that the file is password protected.

For password protected files, other 3rd party programs might work better.
 
Thank you. The forum has been a great help thus far.

I have updated the code to include the new path and file name, see below for details. The file is not password protected. What other 3rd party programs do you recommend?

Code:
bOK = AcroExchAVDoc.Open("C:\Users\braydenfarnsworth\Desktop" & PDF_FILE, "Agent Registration Attestation and Acknowledgement Form")
 
Acrobat should be fine if not password protected and the file is not flattened. If you put this before that line and run it, what does the Immediate Window show?

Code:
Debug.Print [COLOR=#333333] "C:\Users\braydenfarnsworth\Desktop" & PDF_FILE[/COLOR]

If that path exists, then this should have worked barring the exceptions noted.
Code:
[COLOR=#333333]bOK = AcroExchAVDoc.Open("C:\Users\braydenfarnsworth\Desktop" & PDF_FILE, "")[/COLOR]

If not sensitive, you could attach the pdf file here or post a link.
 
Of course Acrobat is best most times.

For those without Acrobat, I have used this with some success.
Code:
Sub Main()    
    Dim fdf As String
    'fdf = "C:\Users\lenovo1\Dropbox\wp\pdf\FDF\39325.fdf"
    'fdf = "C:\Users\lenovo1\Dropbox\wp\pdf\FDF\39325C.fdf"
    'fdf = "C:\Users\lenovo1\Dropbox (Personal)\_Excel\pdf\FDF\wp array method\UACSIR Sample.pdf"
    fdf = "C:\Users\lenovo1\Dropbox (Personal)\_Excel\pdf\FDF\wp array method\doc.unc.pdf"
    mba GetFDFdata(fdf, "f")                       'Show fieldnames in fdf
    mba GetFDFdata(fdf, "d")                      'Show data values in fdf
End Sub


Function GetFDFdata(fdf As String, fod As String)
    Dim a, sB As Integer, eB As Integer, fd As String, b
    a = FileToArray(fdf)                              'Array to hold each line of fdf file.
    sB = InStr(a(2), "[") + 1                          'Begin location for fieldnames and data string.
    eB = InStr(a(2), "]")                              'End location for fieldnames and data string.
    fd = Mid(a(2), sB, eB - sB)              'Fieldnames and data string.
    b = Split(fd, "<</T")   'fd string to array b delimited by "<</T"
    b = ArrayFirstElementStrip(b)                            'Remove first element which is null.
    GetFDFdata = ForD(b, fod)
End Function




'"f" as 2nd parameter means return fieldnames in array.
'"d" or other value as 2nd parameter means return data in array.
Function ForD(fd, fod As String)
    Dim a, i As Integer, b
    a = fd  'makes a same size
    For i = 0 To UBound(a)
        '(tbPhone)/V(\(444\) 444-4444)>>hr(tbZip)/V(44444-4444)>>
        b = Split(a(i), "/V")            'element 1 is fieldname part, 2 is data part.
        If fod = "f" Then 'parse fieldnames
            a(i) = Right(b(0), Len(b(0)) - 1)         'Trim first "("
            a(i) = Left(a(i), Len(b(0)) - 2)          'Trim last ")"
            Else    'parse data values
                a(i) = Right(b(1), Len(b(1)) - 1)     'Trim first "("
                a(i) = Left(a(i), Len(b(1)) - 4)      'Trim last ")>>"
                a(i) = Replace(a(i), "\(", "(")  'Replace escaped ()'s characters
                a(i) = Replace(a(i), "\)", ")")
                a(i) = Replace(a(i), "\r", vbCrLf)             'Replace escaped vbcrlf characters (multiline)
        End If
    Next i
    ForD = a
End Function


Function FileToArray(fn)    'Read File into array and return array
    Dim a, f As Object, fso As Object
    'Const ForReading = 1, ForWriting = 2
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set f = fso.OpenTextFile(fn, 1)
    a = Split(f.ReadAll, vbCrLf)
    Set fso = Nothing
    FileToArray = a
End Function


Sub mba(a)
    MsgBox Join(a, vbCrLf)
End Sub


Function ArrayFirstElementStrip(a)
    Dim i As Long, b
    ReDim b(LBound(a) To UBound(a) - 1)
    For i = LBound(b) To UBound(b)
        b(i) = a(i + 1)
    Next i
    ArrayFirstElementStrip = b
End Function
 
Welcome to the forum!

Obviously, the Acrobat (not the Reader) object is the best 3rd party program to do that. As such, you must reference Acrobat int Tools > References... which looks like that was what you want and did.

I have not used this method in a while. Note the 2nd object to reference. Note how that after a run, the Debug.Print results will be in the Immediate window.
Code:
' http://www.excelhero.com/blog/2010/04/excel-acrobat-pdf-form-filler.html
Public Sub ListPDF_Fields()
            
    Dim AcroExchAVDoc As CAcroAVDoc
    Dim AcroExchApp As CAcroApp
    Dim AFORMAUT As AFORMAUTLib.AFormApp
    Dim FormField As AFORMAUTLib.Field
    Dim FormFields As AFORMAUTLib.Fields
    Dim bOK As Boolean
    Dim sFields As String
    Dim sTypes As String
    Dim sFieldName As String
                            
    ' For this procedure to work, computer must have a full version
    ' of Adobe Acrobat installed. Also, a reference to the following
    ' Type Libraries must be made:
    '     AFormAut 1.0
    '     Adobe Acrobat 7.0 (or newer)
   
    On Error GoTo ErrorHandler
   
    Set AcroExchApp = CreateObject("AcroExch.App")
    Set AcroExchAVDoc = CreateObject("AcroExch.AVDoc")
    bOK = AcroExchAVDoc.Open(ActiveWorkbook.Path & "\" & PDF_FILE, "")
    AcroExchAVDoc.BringToFront
    AcroExchApp.Hide
   
    If (bOK) Then
        Set AFORMAUT = CreateObject("AFormAut.App")
        Set FormFields = AFORMAUT.Fields
        For Each FormField In FormFields
            With FormField
                sFieldName = .Name
                If .IsTerminal Then
                    If sFields = "" Then
                        sFields = .Name
                        sTypes = .Type
                    Else
                        sFields = sFields + "," + .Name
                        sTypes = sTypes + "," + .Type
                    End If
                End If
            End With
        Next FormField
        AcroExchAVDoc.Close True
    End If
    Debug.Print sFields
    Debug.Print sTypes
  
    Set AcroExchAVDoc = Nothing
    Set AcroExchApp = Nothing
    Set AFORMAUT = Nothing
    Set Field = Nothing
    Exit Sub
        
ErrorHandler:
    MsgBox "FieldList Error: " + str(Err.Number) + " " + Err.Description + " " + Err.Source
   
End Sub
Hi Kenneth,

I came across your code here. It worked perfectly, but for some reason, it did not recognize the fields in this form https://www.irs.gov/forms-pubs/about-form-8986. Any thoughts on the possible reasons? Appreciate your help in advance! Thank you!
 
@derek92 This is a three year old thread. It is better to start a new thread eventually containing a link to this one
 
Back
Top