R-ohjelmointi.org
Tilastotieteellistä ohjelmointia R-kielellä
A quick note on TensorFlow and R
TensorFlow is an open source software for machine learning. TensorFlow seems to be especially prominently used for fitting deep learning models, but it also includes many other kinds of algorithms. Data is shaped into tensors in Tensorflow. The data analysis is run using a computational graph, where computational operations are nodes of the said graph.
RStudio has produced an R package tensorflow that makes installing and running TensorFlow rather easy. In practise, TensorFlow API is a set of Python modules that allows construction of computational graphs. On top of that, R package tensorflow allows one to access the complete TensorFlow Python API.
Next, let’s take a look at the tensorflow R package.
Installation
A prerequisite for TensorFlow is Python 3.5. It is recommended to install Anaconda Python.
After installing Python, installing tensorflow package should be a breeze. As outlined on RStudio’s site, running these commands in R should install TensorFlow from GitHub:
devtools::install_github("rstudio/tensorflow") library(tensorflow) install_tensorflow() |
After that TensorFlow is ready for use.
There is also a possibility to install a version utilizing GPU (graphics processing unit) for computations, but it takes more effort.
Running analysis on TensorFlow
The following analysis has been adopted from an article by Hoang Nguyen.
Let’s fit a simple linear regression model on Iris dataset using TensorFlow. The model is simple to fit in R with the usual lm()
function:
data(iris) x_data <- iris$Petal.Length y_data <- iris$Petal.Width lm(y_data~x_data)$coef # Note the estimates: #(Intercept) x_data # -0.3630755 0.4157554 |
In TensorFlow the same takes a bit more works.
# Load tensorflow library(tensorflow) # Specify the model A <- tf$Variable(c(1), name="Coefficient") b <- tf$Variable(c(1), name="Intercept") y <- A * x_data + b # Initialize session and variable sess <- tf$Session() sess$run(tf$initialize_all_variables()) # To init all the vars print(sess$run(y)) sess$close() # Specify the optimizer # Linear regression minimizes the mean squared error (MSE) # Below value 0.03 is the learning rate that affects the optimization result MSE <- tf$reduce_mean((y_data - y)^2) optimizer <- tf$train$GradientDescentOptimizer(0.03) train <- optimizer$minimize(MSE) sess <- tf$Session() sess$run(tf$initialize_all_variables()) # To init all the vars # Run the model (in a loop) for (i in 1:2000) { sess$run(train) } cat("Coefficient: ", sess$run(A), "\n Intercept: ", sess$run(b), "\n") # For practical purposes estimates are the same ones # given by lm() # # Coefficient: 0.4157551 # Intercept: -0.363074 |
Summary
Installation of TensorFlow went without hickups, but running it from R took some adjusting to. Something as simple as a linear regression takes a few extra steps compared to the ready-made functions in R (or Python for that matter). But even with this simple test case, it is easy to see that TensorFlow offers more flexibility for specifying the model than these ”usual suspects”. For example, if ordinary least squares is for some reason undesirable, it can be changed in TensorFlow: just specify a new optimizer! I’m not saying it can’t be done in R, but it takes a bit more work. However, for most cases, running a linear regression analysis on TensorFlow might be overkill, and it remains to be seen where it fits best in everyday life of a data analyst.
Tags: tensorflow
Vastaa