We’ve been testing some Spark code that will eventually be moved to AWS. For now, to save costs, we’ve created a 8 node Spark cluster that runs on a set of Virtual Machines running Ubuntu on VirtualBox. We’ve developed some bash-scripts to make starting (and shutting down) the VMs easy.
Spark INFO logging
Spark is great, and the more I work with it on my PhD thesis the more changes I make to my local installation on my rMBP. One of the modifications I came across the other day is how to dial down logging messages in one of the Spark shells. Specifically, how to dial dow the messages in PySpark when programming in Python.
R, Spark, and TextMate
I've been keeping my eye on Spark for a while now, and decided to take the plunge recently after having to do some brief R analyses that were not that complicated and were perfect for Spark. I use TextMate as my R IDE, and I wanted to run my scripts from TextMate right into Spark, and the following are a couple of tips & tricks I found on how to setup everything so that you can start Spark from a command-R (⌘-R) shortcut.