Microsoft Hive ODBC Driver and Power Query

beagleton

New member
Joined
Jun 28, 2017
Messages
26
Reaction score
0
Points
0
Hello all,

I am using Power Query to connect to a Cassandra database. The vendor instructions say to use the Microsoft Hive ODBC driver. My understanding is that the driver converts SQL statements to a format used by Cassandra (although I am not an expert and could easily be wrong about this). Here is what happens when I attempt the query after setting up the driver:

Open Excel
New Query -> From Other Sources -> From ODBC

This opens a dialog box where I can select Data source name (DSN) and Advanced Options which opens a optional Connection string field and a SQL statement field.

Here's my problem:

If I simply select my Data Source in the above mentioned dialog box and push OK I get the standard Power Query Load navigator window allowing me to select which tables to load. This is a hierarchical structure starting with the ODBC driver, then showing the HIVE, then HIVE SYSTEM[1], and finally a table named HIVE_SYSTEM. This is the only table that is available to load; however, this table is not at all what I need. This table basically only holds two columns named ENVKEY and ENVVALUE. The ENVKEY contains things like spark.sql.hive.version ... etc. ...

Suffice it to say, this is not the table I want, but it is all that PQ is able to see (at least in this window)

IF HOWEVER, during the previous step I had opened the Advanced Options and in the SQL statement dialog box I had entered a SQL statement e.g. SELECT <column> FROM <the table I want> ... I get the data I need. If I use wildcards in the SELECT statement, I can get the whole table I want, but some columns are formatted weirdly.

What I really want is to avoid writing SQL statements. I am not very good at SQL, and this query needs to be used by people who are not able to write any sort of SQL. Not to mention, I really want to see all the available tables in the GUI and for PQ to write the SQL statements for me ... If I can get it this way, then I can use PQ to do complex operations and never have to worry about SQL commands (and hopefully use query folding!)

I figure I am just doing something wrong in my connection, but so far, I can't figure out what. Any guidance here is greatly appreciated!
Cheers,
Brendan
 
Hi Brandon,

I've just been browsing around for some pointers re your query and came across this web page. It is for Power BI Desktop, so I'm not sure if it will be suitable if you are working exclusively in Excel. Anyways... just thought I'd contribute.
 
**UPDATE**

The problem was the Microsoft Hive Driver. Updated with the new Microsoft Spark ODBC driver and all is well. Can see all my tables and load them into PQ without needing SQL. :)
 
Back
Top