Hello all,
I am using Power Query to connect to a Cassandra database. The vendor instructions say to use the Microsoft Hive ODBC driver. My understanding is that the driver converts SQL statements to a format used by Cassandra (although I am not an expert and could easily be wrong about this). Here is what happens when I attempt the query after setting up the driver:
Open Excel
New Query -> From Other Sources -> From ODBC
This opens a dialog box where I can select Data source name (DSN) and Advanced Options which opens a optional Connection string field and a SQL statement field.
Here's my problem:
If I simply select my Data Source in the above mentioned dialog box and push OK I get the standard Power Query Load navigator window allowing me to select which tables to load. This is a hierarchical structure starting with the ODBC driver, then showing the HIVE, then HIVE SYSTEM[1], and finally a table named HIVE_SYSTEM. This is the only table that is available to load; however, this table is not at all what I need. This table basically only holds two columns named ENVKEY and ENVVALUE. The ENVKEY contains things like spark.sql.hive.version ... etc. ...
Suffice it to say, this is not the table I want, but it is all that PQ is able to see (at least in this window)
IF HOWEVER, during the previous step I had opened the Advanced Options and in the SQL statement dialog box I had entered a SQL statement e.g. SELECT <column> FROM <the table I want> ... I get the data I need. If I use wildcards in the SELECT statement, I can get the whole table I want, but some columns are formatted weirdly.
What I really want is to avoid writing SQL statements. I am not very good at SQL, and this query needs to be used by people who are not able to write any sort of SQL. Not to mention, I really want to see all the available tables in the GUI and for PQ to write the SQL statements for me ... If I can get it this way, then I can use PQ to do complex operations and never have to worry about SQL commands (and hopefully use query folding!)
I figure I am just doing something wrong in my connection, but so far, I can't figure out what. Any guidance here is greatly appreciated!
Cheers,
Brendan
I am using Power Query to connect to a Cassandra database. The vendor instructions say to use the Microsoft Hive ODBC driver. My understanding is that the driver converts SQL statements to a format used by Cassandra (although I am not an expert and could easily be wrong about this). Here is what happens when I attempt the query after setting up the driver:
Open Excel
New Query -> From Other Sources -> From ODBC
This opens a dialog box where I can select Data source name (DSN) and Advanced Options which opens a optional Connection string field and a SQL statement field.
Here's my problem:
If I simply select my Data Source in the above mentioned dialog box and push OK I get the standard Power Query Load navigator window allowing me to select which tables to load. This is a hierarchical structure starting with the ODBC driver, then showing the HIVE, then HIVE SYSTEM[1], and finally a table named HIVE_SYSTEM. This is the only table that is available to load; however, this table is not at all what I need. This table basically only holds two columns named ENVKEY and ENVVALUE. The ENVKEY contains things like spark.sql.hive.version ... etc. ...
Suffice it to say, this is not the table I want, but it is all that PQ is able to see (at least in this window)
IF HOWEVER, during the previous step I had opened the Advanced Options and in the SQL statement dialog box I had entered a SQL statement e.g. SELECT <column> FROM <the table I want> ... I get the data I need. If I use wildcards in the SELECT statement, I can get the whole table I want, but some columns are formatted weirdly.
What I really want is to avoid writing SQL statements. I am not very good at SQL, and this query needs to be used by people who are not able to write any sort of SQL. Not to mention, I really want to see all the available tables in the GUI and for PQ to write the SQL statements for me ... If I can get it this way, then I can use PQ to do complex operations and never have to worry about SQL commands (and hopefully use query folding!)
I figure I am just doing something wrong in my connection, but so far, I can't figure out what. Any guidance here is greatly appreciated!
Cheers,
Brendan