DirectorAlwyn
New member
- Joined
- Apr 4, 2016
- Messages
- 8
- Reaction score
- 0
- Points
- 0
To start, I am also working with the Microsoft team on this (Minh specifically) after sending a number of frowns, but I am hoping someone has ideas on what we might be able to do to help with this, as right now Power Query is causing a number of issues, and as the local Power Query advocate I am in not the best place.
I have a tool that basically was my main learning piece for Power Query that incorporates Power Query, Power Pivot, Excel and Outlook VBA, and some other internal aspects to my company. It has been working nicely since October, with the main analysis step (where quite a few different queries are performed) taking between five and ten minutes. Unfortunately when I installed the February update, the tool completely ceased working and would just hang Excel. After a number of discussions we tried it on 64 bit Excel (32 bit Office 2013 is currently the standard at my company), and discovered that it would work there, but takes 30-50 minutes to run. Beyond this tool specifically, a number of other queries I built have started taking a lot longer than they used to, and one very simple query built to pull from SharePoint fails to even open and causes Excel to crash out entirely. While this doesn't explain everything, I have particularly noticed issues with combining files together, with even small numbers of files (three to five with less than ten rows/columns each) taking a couple of minutes. Excel is frequently using over 500MB of RAM itself, and the various Mashup processes often add enough that all told it's eating over 1 GB and 99% of my CPU. It used to be that it might spike that high for a few seconds right when opening a file, but never stayed that way, which it now does pretty much whenever I have anything with Power Query open (and the Excel usage is high even with a blank workbook, around 300 MB.
In addition, a number of others around the office that I have worked with on Power Query are now experiencing some pretty significant Excel instability even when not using Power Query actively, with Excel crashing out multiple times a day.
The details on the main tool problem are below (from my discussions with Minh at Microsoft), as that is where I have squeezed in most of the analysis on trying to lock this down. For reference most everyone (including me) was using Windows 7 Enterprise 64 bit, with Office 2013 32 bit, RAM varying between 4-16 GB (I have 8 GB), processors are a bit all over the place but nothing we are working on has more then 50,000 rows so I was not anticipating major issues.
If there is any other information I can provide, please let me know. This is an amazing tool that I love working with, I just wish it was actually working at the moment.
---------------------
The particular step of the process I’m using that was hanging for so long had a number of parts (several queries run, analysis performed in PowerPivot, updates done to the pivot table, etc.), but the part that’s having the problem is the step where multiple Excel workbooks have data combined into a single column (M code below). Before the update a couple of weeks back, this would take maybe a couple of minutes to run, but this is now taking over half an hour to run for ~3,000 workbooks, with a total of ~32,000 rows. If I remove the step where this query is refreshed, all the other steps combined complete in less than a minute, but the overall process with this is now taking ~40 minutes to run.
Visually, what I see on the bottom right corner is that it is “Connecting to Datasource…” for the entire time, once that shifts everything else completes rapidly. The workbooks are all saved locally on my PC, so it’s not a network issue. The workbook with all the Power Query normally is on our network, but I have tried it with a version on my local PC as well to see if that made a difference, but that did not change anything.
I was originally doing the combination with a custom function, but tried doing it using a different set of steps to see if that made a difference, but it seemed to run the same way.
If there is anything else I can provide on this, please let me know, I would love to get this working better.
Original Version of combination:
//Pull the raw data from the Expo Customs queries
let
//Bring in data from the folder that holds all of the query responses from Customs
Source = Folder.Files("C:\Expo Customs Query Responses"),
//Clear out other file types that might contaminate, for instance .xlsm of the tool being stored in the same folder
#"Only show .xlsx files" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
//Pull in the actual C1 records response so we can do our own formatting
#"Keep Name and Folder Path columns only" = Table.RemoveColumns(#"Only show .xlsx files",{"Content", "Date accessed", "Date modified", "Date created", "Attributes", "Extension"}),
#"Use GetC1 Function" = Table.AddColumn(#"Keep Name and Folder Path columns only", "File Contents", each GetC1([Folder Path],[Name])),
#"Pull in C1 Records" = Table.ExpandTableColumn(#"Use GetC1 Function", "File Contents", {"Column1"}, {"File Contents.Column1"}),
#"Keep C1 Records only" = Table.RemoveColumns(#"Pull in C1 Records",{"Name", "Folder Path"}),
#"Remove Header" = Table.SelectRows(#"Keep C1 Records only", each ([File Contents.Column1] <> "C1 Record")),
#"Rename C1 Records" = Table.RenameColumns(#"Remove Header",{{"File Contents.Column1", "C1 Records"}})
in
#"Rename C1 Records"
C1 Function definition:
//Function to extract the C1 Records raw data from each file in the defined folder
let ExcelFile = (FilePath, FileName) =>
let
//Pull data from a variable folder
Source = Folder.Files(FilePath),
//Pull data from a variable workbook
#"File" = Source{[#"Folder Path"=FilePath,Name=FileName]}[Content],
//Open the workbook up to see what tables are available
#"Imported Excel" = Excel.Workbook(#"File"),
//Navigate to the C1 Records sheet and pull all the data from it
#"C1 Records_Sheet" = #"Imported Excel"{[Item="C1 Records",Kind="Sheet"]}[Data]
in
#"C1 Records_Sheet"
in
ExcelFile
Alternate Version of combination:
//Pull the raw data from the Expo Customs queries
let
//Bring in data from the folder that holds all of the query responses from Customs
Source = Folder.Files("C:\Expo Customs Query Responses"),
//Clear out other file types that might contaminate, for instance .xlsm of the tool being stored in the same folder
#"Only show .xlsx files" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
#"Filtered Rows" = Table.SelectRows(#"Only show .xlsx files", each Date.IsInCurrentDay([Date created])),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "C1 Records", each Excel.Workbook([Content])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
#"Expanded C1 Records" = Table.ExpandTableColumn(#"Removed Columns", "C1 Records", {"Name", "Data"}, {"C1 Records.Name", "C1 Records.Data"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded C1 Records", each ([C1 Records.Name] = "C1 Records")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"C1 Records.Name"}),
#"Expanded C1 Records.Data" = Table.ExpandTableColumn(#"Removed Columns1", "C1 Records.Data", {"Column1"}, {"C1 Records.Data.Column1"}),
#"Filtered Rows2" = Table.SelectRows(#"Expanded C1 Records.Data", each ([C1 Records.Data.Column1] <> "C1 Record")),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows2",{{"C1 Records.Data.Column1", "C1 Records"}})
in
#"Renamed Columns"
I have a tool that basically was my main learning piece for Power Query that incorporates Power Query, Power Pivot, Excel and Outlook VBA, and some other internal aspects to my company. It has been working nicely since October, with the main analysis step (where quite a few different queries are performed) taking between five and ten minutes. Unfortunately when I installed the February update, the tool completely ceased working and would just hang Excel. After a number of discussions we tried it on 64 bit Excel (32 bit Office 2013 is currently the standard at my company), and discovered that it would work there, but takes 30-50 minutes to run. Beyond this tool specifically, a number of other queries I built have started taking a lot longer than they used to, and one very simple query built to pull from SharePoint fails to even open and causes Excel to crash out entirely. While this doesn't explain everything, I have particularly noticed issues with combining files together, with even small numbers of files (three to five with less than ten rows/columns each) taking a couple of minutes. Excel is frequently using over 500MB of RAM itself, and the various Mashup processes often add enough that all told it's eating over 1 GB and 99% of my CPU. It used to be that it might spike that high for a few seconds right when opening a file, but never stayed that way, which it now does pretty much whenever I have anything with Power Query open (and the Excel usage is high even with a blank workbook, around 300 MB.
In addition, a number of others around the office that I have worked with on Power Query are now experiencing some pretty significant Excel instability even when not using Power Query actively, with Excel crashing out multiple times a day.
The details on the main tool problem are below (from my discussions with Minh at Microsoft), as that is where I have squeezed in most of the analysis on trying to lock this down. For reference most everyone (including me) was using Windows 7 Enterprise 64 bit, with Office 2013 32 bit, RAM varying between 4-16 GB (I have 8 GB), processors are a bit all over the place but nothing we are working on has more then 50,000 rows so I was not anticipating major issues.
If there is any other information I can provide, please let me know. This is an amazing tool that I love working with, I just wish it was actually working at the moment.
---------------------
The particular step of the process I’m using that was hanging for so long had a number of parts (several queries run, analysis performed in PowerPivot, updates done to the pivot table, etc.), but the part that’s having the problem is the step where multiple Excel workbooks have data combined into a single column (M code below). Before the update a couple of weeks back, this would take maybe a couple of minutes to run, but this is now taking over half an hour to run for ~3,000 workbooks, with a total of ~32,000 rows. If I remove the step where this query is refreshed, all the other steps combined complete in less than a minute, but the overall process with this is now taking ~40 minutes to run.
Visually, what I see on the bottom right corner is that it is “Connecting to Datasource…” for the entire time, once that shifts everything else completes rapidly. The workbooks are all saved locally on my PC, so it’s not a network issue. The workbook with all the Power Query normally is on our network, but I have tried it with a version on my local PC as well to see if that made a difference, but that did not change anything.
I was originally doing the combination with a custom function, but tried doing it using a different set of steps to see if that made a difference, but it seemed to run the same way.
If there is anything else I can provide on this, please let me know, I would love to get this working better.
Original Version of combination:
//Pull the raw data from the Expo Customs queries
let
//Bring in data from the folder that holds all of the query responses from Customs
Source = Folder.Files("C:\Expo Customs Query Responses"),
//Clear out other file types that might contaminate, for instance .xlsm of the tool being stored in the same folder
#"Only show .xlsx files" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
//Pull in the actual C1 records response so we can do our own formatting
#"Keep Name and Folder Path columns only" = Table.RemoveColumns(#"Only show .xlsx files",{"Content", "Date accessed", "Date modified", "Date created", "Attributes", "Extension"}),
#"Use GetC1 Function" = Table.AddColumn(#"Keep Name and Folder Path columns only", "File Contents", each GetC1([Folder Path],[Name])),
#"Pull in C1 Records" = Table.ExpandTableColumn(#"Use GetC1 Function", "File Contents", {"Column1"}, {"File Contents.Column1"}),
#"Keep C1 Records only" = Table.RemoveColumns(#"Pull in C1 Records",{"Name", "Folder Path"}),
#"Remove Header" = Table.SelectRows(#"Keep C1 Records only", each ([File Contents.Column1] <> "C1 Record")),
#"Rename C1 Records" = Table.RenameColumns(#"Remove Header",{{"File Contents.Column1", "C1 Records"}})
in
#"Rename C1 Records"
C1 Function definition:
//Function to extract the C1 Records raw data from each file in the defined folder
let ExcelFile = (FilePath, FileName) =>
let
//Pull data from a variable folder
Source = Folder.Files(FilePath),
//Pull data from a variable workbook
#"File" = Source{[#"Folder Path"=FilePath,Name=FileName]}[Content],
//Open the workbook up to see what tables are available
#"Imported Excel" = Excel.Workbook(#"File"),
//Navigate to the C1 Records sheet and pull all the data from it
#"C1 Records_Sheet" = #"Imported Excel"{[Item="C1 Records",Kind="Sheet"]}[Data]
in
#"C1 Records_Sheet"
in
ExcelFile
Alternate Version of combination:
//Pull the raw data from the Expo Customs queries
let
//Bring in data from the folder that holds all of the query responses from Customs
Source = Folder.Files("C:\Expo Customs Query Responses"),
//Clear out other file types that might contaminate, for instance .xlsm of the tool being stored in the same folder
#"Only show .xlsx files" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
#"Filtered Rows" = Table.SelectRows(#"Only show .xlsx files", each Date.IsInCurrentDay([Date created])),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "C1 Records", each Excel.Workbook([Content])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
#"Expanded C1 Records" = Table.ExpandTableColumn(#"Removed Columns", "C1 Records", {"Name", "Data"}, {"C1 Records.Name", "C1 Records.Data"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded C1 Records", each ([C1 Records.Name] = "C1 Records")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"C1 Records.Name"}),
#"Expanded C1 Records.Data" = Table.ExpandTableColumn(#"Removed Columns1", "C1 Records.Data", {"Column1"}, {"C1 Records.Data.Column1"}),
#"Filtered Rows2" = Table.SelectRows(#"Expanded C1 Records.Data", each ([C1 Records.Data.Column1] <> "C1 Record")),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows2",{{"C1 Records.Data.Column1", "C1 Records"}})
in
#"Renamed Columns"