Extracting data from Web HTML

Ken Puls

Administrator
Staff member
Joined
Mar 13, 2011
Messages
2,521
Reaction score
6
Points
38
Location
Nanaimo, BC, Canada
Website
www.excelguru.ca
Excel Version(s)
Excel Office 365 Insider
Hi there,

I looked at this with both Excel and Power BI (which has a newer/better From Web experience.) Neither of them can extract data from the body of the page. Excel's connector requires <table> tags in the HTML (which this page doesn't have.) Power BI can also extract things based on the page's CSS. Whatever the programmer did, they aren't using CSS to render these either.

So unfortunately, it looks like Power Query isn't going to be able to extract data from these pages.
 

nscjpn

New member
Joined
Apr 20, 2018
Messages
6
Reaction score
0
Points
0
Excel Version(s)
Excel 2016
Hi there,

I looked at this with both Excel and Power BI (which has a newer/better From Web experience.) Neither of them can extract data from the body of the page. Excel's connector requires <table> tags in the HTML (which this page doesn't have.) Power BI can also extract things based on the page's CSS. Whatever the programmer did, they aren't using CSS to render these either.

So unfortunately, it looks like Power Query isn't going to be able to extract data from these pages.
Thank you.I understand that the site is based on Java and hence PQ is not working
 

garylhaas

New member
Joined
Apr 3, 2016
Messages
5
Reaction score
0
Points
1
Location
milwaukee, wi
Excel Version(s)
office 365
option 1: CTRL S / Save webpage HTML to PC - use Power Query to extract data from text (without navigating CSS)
option 2: Power Automate Desktop

Remember what they said about Power Query
Power Query is powerful they said
Power Query can do lots of stuff they said
Power Query will build the code for you they said
Power Query is easy they said

I was able to extract data from the link in under 3 minutes
(after tinkering for 15 minutes and inserting a wait statement)

Power Automate Desktop will also step through the web pages of the link

Think of the following as Power Automate Desktop's "M-code"
Code:
WebAutomation.LaunchFirefox.LaunchFirefox 
  Url: $'''https://fcra2010.in/ngos-raising-over-a-crore-of-fc-annually/''' 
  WindowState: WebAutomation.BrowserWindowState.Maximized 
  ClearCache: False 
  ClearCookies: False Timeout: 
  60 BrowserInstance=> Browser

WAIT 20

WebAutomation.ExtractData.ExtractTableInExcel 
   BrowserInstance: Browser Control: 
   $'''html > body > div > div > div > div > main > article > div > div > div > 
   section:eq(2) > div > div > div > div > div > div > table > tbody > tr''' 
   ExtractionParameters: {
        [$'''td:eq(0)''', $'''Own Text''', $'''''', $'''Value #1'''],
        [$'''td:eq(1)''', $'''Own Text''', $'''''', $'''Value #2'''],
        [$'''td:eq(3)''', $'''Own Text''', $'''''', $'''Value #3'''] } 
   ExcelInstance=> ExcelInstance

Excel.CloseExcel.CloseAndSaveAs 
    Instance: ExcelInstance 
    DocumentFormat: Excel.ExcelFormat.FromExtension 
     DocumentPath: $'''c:\\temp\\trashme.xlsm'''
 

Attachments

  • trashme.xlsm
    9.3 KB · Views: 1
Last edited by a moderator:

pinarello

Member
Joined
Jun 21, 2019
Messages
190
Reaction score
2
Points
18
Location
Germany
Excel Version(s)
Office 365
In Windows 11 the online version of Power Automate is available. This is probably not sufficient to reproduce your demo?
 

pinarello

Member
Joined
Jun 21, 2019
Messages
190
Reaction score
2
Points
18
Location
Germany
Excel Version(s)
Office 365
Which function can I use to access the generated code and change it if necessary?
 

garylhaas

New member
Joined
Apr 3, 2016
Messages
5
Reaction score
0
Points
1
Location
milwaukee, wi
Excel Version(s)
office 365
PowerAutomate Desktop

Which function can I use to access the generated code and change it if necessary?

have you installed Power Automate Desktop
have you looked at videos on YouTube about web scrapping with Power Automate Desktop

i was not aware that you had posted a question

it might be helpful to communicate through email
garylhaas2005@yahoo.com
 
Top