Loading only some info from PDF

abaker77

New member
Joined
Oct 11, 2011
Messages
5
Reaction score
0
Points
0
Location
Toronto, Canada
Website
www.jkjl.com
Excel Version(s)
Office 365
Hello,
I am using Excel 365 to load tables from a single PDF.
The PDF contains several tables and pages (image below.)
I only need to load certain tables, based on text found on pages.
For instance (in the image below), my objective is to load just the table that corresponds to "Sue's Projected Income."
The text "Sue's Projected Income" is found in Page 11, therefore, I want to load the table on Page 11, which happens to be Table 16.
However, the applicable table continues onto page 12 as well so, of course, I'd want to load the entire table.
So basically my question is - can I use PQ to "search" for text then load the corresponding table, even if that table spans more than 1 page?
from PDF.jpg

Thank you so much !
 
I would test it. However, I would need a PDF that is identical in structure but contains only anonymous data.
 
At the very beginning of the query, when you have selected any table, you must delete the "Navigation" step. After that you can expand the whole PDF. You will see all tables once as table and once as page. Table would be the right choice. But then John's age column on the 2nd page is shifted down by 2 lines. Thus the data of "Page" must be used. The problem here is that the column layout is different for Sue and John. But by following the sequence of steps: unpivot, group, index to group and pivot again, the problem can be solved quite easily. Furthermore, I noticed that the column titles are partially different for Sue and John. For the final display I used the column titles of Sue.
 

Attachments

  • xlguru - Loading only some info from PDF (PQ).xlsx
    27.7 KB · Views: 16
wow - your proficiency with PQ is several orders of magnitude above mine; your solution is very impressive and is obviously correct. (If only I could follow it and replicate what you did.) :confused2:
I got it to work by changing the Path and File on the Params sheet. Then, double-clicking on the query to open the PQ editor, I could see the applied steps and by clicking on the settings icon next to the step, I could (sort of) follow along what you did. Your approach was brilliant and I thank you so much for your time and effort.
I am going to try to replicate this separately on my own but I suspect I will have a few questions. May I run those by you when I do ?
 
The biggest effort was to analyze the data after the data expansion to identify patterns and develop a plan of action.

Compared to, for example, VBA programming, one of the great advantages of Power Query is that you take one step, immediately see the result, and can then decide how best to proceed. And if a way is not so good, you usually only have to go back a few steps to find a better way.

If you still have questions, I can gladly try to answer them.
 
Back
Top