From Bug to Bug

We recently moved to a new database at work, snowflake. This of course meant changing a bunch of connections, or in many cases adding new connections to the new database.

I wanted to communicate with the database using R. We've tended to use the RODBC package, which works pretty well, until I tried including it directly in Shiny applications, where it gave an atrocious seg fault, very sad. Apparently it's a known bug, and a weird one, related to callstack depth. Oh well. There's a solution posted that did not work for me, but the update from the link says to use the odbc package instead. To be honest I had meant to start using that anyway, since it's faster and uses more conventional practices, so this was a good opportunity.

I'm currently putting together a small package to do some routine operations for us on snowflake (which hopefully will be open-sourced very soon, assuming it's useful for other people). Currently, it uses our RODBC connections, which was fine for our purposes up until this Shiny bug.

OK no big deal, I'll just generalise it a bit to use both. So I start plugging away, changing connections to be of the form DBI::dbConnect(odbc::odbc(), dsn = 'snowflake'), and making all functions a bit more general, using whatever methods are appropriate for the connection used.

I'm testing all this on my mac, and it's all working fine in my R session, good times. I then open a clean R session, and run devtools::test(), to disaster: all tests involving the new connection failed. Hmm. I of course test the connection myself, to get a nice message:

Error: nanodbc/nanodbc.cpp:950: HY000: [Snowflake][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.

Not cool. Wait, did I do something to break my system ODBC connections? I check with RODBC::odbcConnect(dsn = 'snowflake'), and that still works. OK so it's something specific to odbc? I run the odbc connection again, and this time it succeeds. Ugh. Fresh R session, try again, same result: odbc fails unless RODBC is run first - and in fact any RODBC connection solves it, and any odbc connection fails prior and succeeds after, so it's not directly related to snowflake. Seems impala gives some people the same issue. So something is messed up behind the scenes.

Googling the error message lead me to this thread, where a user points out that the issue could be a misspecified ODBCInstLib variable, in the product-specify .ini files, which in Redshift's case is in the folder /opt/amazon/redshift/lib/ on my mac. For snowflake's case, it's in /opt/snowflake/snowflakeodbc/lib/universal/. For snowflake, the .ini file (actually named simba.snowflake.ini) indeed has the variable ODBCInstLib set to just libodbcinst.dylib, which I guess isn't known by my computer. So I simply fix it... by hardcoding it to my actual libodbcinst.dylib location (something like /usr/local/Cellar/unixodbc/<version>/lib/libodbcinst.dylib, found with find / -name "libodbcinst.dylib"). That is it: the conections now run just fine.

What's kind of maddening is that I couldn't pin down what changed in the environment after running the RODBC connection that caused the odbc connections to run. The only difference I could find was that SF_OCSP_RESPONSE_CACHE_SERVER_URL was set in the environment of R, but setting it manually did not fix the issue, so it wasn't that. Someone with more knowledge of C or C++ can maybe pin it down.

Some people seemed to have some idea, but as someone new to this odbc stuff, reading such posts wasn't super insightful. Note that the simba .ini file says your DYLD_LIBRARY_PATH should be set. So why would it not be set? Apple's System Integrity Protection! I guess having to do a bit of work when installing new drivers is probably worth the security trade-off, but it never feels that way until it REALLY feels that way.