I've been keeping my eye on Spark for a while now, and decided to take the plunge recently after having to do some brief R analyses that were not that complicated and were perfect for Spark. I use TextMate as my R IDE, and I wanted to run my scripts from TextMate right into Spark, and the following are a couple of tips & tricks I found on how to setup everything so that you can start Spark from a command-R (⌘-R) shortcut.
• Installing Spark on a Mac
Getting Spark on a Mac is easily done with Homebrew:
• Environment Variables
Next, you're going to need to set up some environment variables. Open your ".bash_profile" and add the following to your PATH:
PATH=$PATH:/usr/local/Cellar/apache-spark/1.6.0 export PATH export SPARK_HOME="/usr/local/Cellar/apache-spark/1.6.0/libexec/"
• R Source Files
Once the environment variables are set, you'll need to add the following lines to the beginning of your R script. These lines are the ones that fire up Spark and get things rolling:
library(SparkR, lib.loc="/usr/local/Cellar/apache-spark/1.6.0/libexec/R/lib") sc = sparkR.init(sparkHome = "/usr/local/Cellar/apache-spark/1.6.0/libexec") sqc = sparkRSQL.init(sc)
The following link is a great resource for getting started with SparkR:
https://spark.apache.org/docs/1.6.0/sparkr.html