Nick Burns
Member
- Joined
- May 24, 2017
- Messages
- 162
- Reaction score
- 0
- Points
- 16
- Excel Version(s)
- Office 365
Here's the outline of how an XML Spreadsheet looks like:
I can drill down to the cell element, but that puts all the data into a single column. Using Index / Modulo / Pivot I can get the data horizontally. However my data has 113K rows x 63 columns. Doing this creates 7 million plus records. I tested it with 2000 records and with the Pivot I was already over 120K records and the refresh was slow. Imagine that with 7 million records.
Is there a better way of reading this? Right now I'm taking an extra step of opening the file then saving it as Excel. If I can read the XML directly to a table in PQ, I can eliminate that step. If this really is the best solution, so be it.
Code:
<?xml version="1.0"?><?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="date"><NumberFormat ss:Format="Short Date"/></Style>
<Style ss:ID="bold"><Font ss:Bold="1"/></Style>
<Style ss:ID="default"></Style>
</Styles>
<Worksheet ss:Name="Data">
<Table>
<Row ss:StyleID="bold"><Cell><Data ss:Type="String">Company</Data></Cell><Cell><Data ss:Type="String">Job</Data></Cell><Cell><Data ss:Type="String">Emp Num</Data></Cell><Cell><Data ss:Type="String">Emp Name</Data></Cell><Cell><Data ss:Type="String">SSN</Data></Cell><Cell><Data ss:Type="String">Period Date</Data></Cell><Cell><Data ss:Type="String">Skill</Data></Cell><Cell><Data ss:Type="String">Hour Type</Data></Cell><Cell><Data ss:Type="String">ST</Data></Cell><Cell><Data ss:Type="String">OT</Data></Cell><Cell><Data ss:Type="String">DT</Data></Cell><Cell><Data ss:Type="String">HW</Data></Cell><Cell><Data ss:Type="String">HP</Data></Cell><Cell><Data ss:Type="String">Skill Category</Data></Cell><Cell><Data ss:Type="String">Skill Category Name</Data></Cell><Cell><Data ss:Type="String">FD</Data></Cell><Cell><Data ss:Type="String">Calc No.</Data></Cell><Cell><Data ss:Type="String">Description</Data></Cell><Cell><Data ss:Type="String">Contribution</Data></Cell><Cell><Data ss:Type="String">VH</Data></Cell><Cell><Data ss:Type="String">Contribution Amount</Data></Cell><Cell><Data ss:Type="String">Has Dues</Data></Cell><Cell><Data ss:Type="String">Dues Amnt</Data></Cell><Cell><Data ss:Type="String">Has PAC</Data></Cell><Cell><Data ss:Type="String">PAC Amount</Data></Cell><Cell><Data ss:Type="String">Has 401k</Data></Cell><Cell><Data ss:Type="String">401K Amount</Data></Cell><Cell><Data ss:Type="String">401K Rate</Data></Cell><Cell><Data ss:Type="String">Premium</Data></Cell><Cell><Data ss:Type="String">Has ROTH</Data></Cell><Cell><Data ss:Type="String">ROTH Amount</Data></Cell><Cell><Data ss:Type="String">ROTH Rate</Data></Cell><Cell><Data ss:Type="String">Roth Premium</Data></Cell><Cell><Data ss:Type="String">Address</Data></Cell><Cell><Data ss:Type="String">Local</Data></Cell><Cell><Data ss:Type="String">Union Skill</Data></Cell><Cell><Data ss:Type="String">Skill Name</Data></Cell><Cell><Data ss:Type="String">Skill Class Prior Rate</Data></Cell><Cell><Data ss:Type="String">Skill Class Current Rate</Data></Cell><Cell><Data ss:Type="String">Skill Class Due</Data></Cell><Cell><Data ss:Type="String">Skill Class HRA</Data></Cell><Cell><Data ss:Type="String">Skill Class 401A</Data></Cell><Cell><Data ss:Type="String">Skill Credit Union Code</Data></Cell><Cell><Data ss:Type="String">Skill Credit Union Rate</Data></Cell><Cell><Data ss:Type="String">Skill 401k Code</Data></Cell><Cell><Data ss:Type="String">Skill 401k Rate</Data></Cell><Cell><Data ss:Type="String">Unassigned</Data></Cell><Cell><Data ss:Type="String">Company Name</Data></Cell><Cell><Data ss:Type="String">CO Addr1</Data></Cell><Cell><Data ss:Type="String">CO Addr2</Data></Cell><Cell><Data ss:Type="String">CO City</Data></Cell><Cell><Data ss:Type="String">CO State</Data></Cell><Cell><Data ss:Type="String">CO Zip</Data></Cell><Cell><Data ss:Type="String">Master Agreement</Data></Cell><Cell><Data ss:Type="String">Year</Data></Cell><Cell><Data ss:Type="String">Period</Data></Cell><Cell><Data ss:Type="String">Check</Data></Cell><Cell><Data ss:Type="String">Tran</Data></Cell><Cell><Data ss:Type="String">Key</Data></Cell><Cell><Data ss:Type="String">Union</Data></Cell><Cell><Data ss:Type="String">Gross</Data></Cell><Cell><Data ss:Type="String">Manual Check</Data></Cell></Row>
</Table>
</Worksheet>
</Workbook>
I can drill down to the cell element, but that puts all the data into a single column. Using Index / Modulo / Pivot I can get the data horizontally. However my data has 113K rows x 63 columns. Doing this creates 7 million plus records. I tested it with 2000 records and with the Pivot I was already over 120K records and the refresh was slow. Imagine that with 7 million records.
Is there a better way of reading this? Right now I'm taking an extra step of opening the file then saving it as Excel. If I can read the XML directly to a table in PQ, I can eliminate that step. If this really is the best solution, so be it.