R-ohjelmointi.org

Tilastotieteellistä ohjelmointia R-kielellä

A quick note on TensorFlow and R

TensorFlow is an open source software for machine learning. TensorFlow seems to be especially prominently used for fitting deep learning models, but it also includes many other kinds of algorithms. Data is shaped into tensors in Tensorflow. The data analysis is run using a computational graph, where computational operations are nodes of the said graph.

RStudio has produced an R package tensorflow that makes installing and running TensorFlow rather easy. In practise, TensorFlow API is a set of Python modules that allows construction of computational graphs. On top of that, R package tensorflow allows one to access the complete TensorFlow Python API.

Next, let’s take a look at the tensorflow R package.

Installation

A prerequisite for TensorFlow is Python 3.5. It is recommended to install Anaconda Python.

After installing Python, installing tensorflow package should be a breeze. As outlined on RStudio’s site, running these commands in R should install TensorFlow from GitHub:

devtools::install_github("rstudio/tensorflow")
library(tensorflow)
install_tensorflow()

After that TensorFlow is ready for use.

There is also a possibility to install a version utilizing GPU (graphics processing unit) for computations, but it takes more effort.

Running analysis on TensorFlow

The following analysis has been adopted from an article by Hoang Nguyen.

Let’s fit a simple linear regression model on Iris dataset using TensorFlow. The model is simple to fit in R with the usual lm() function:

data(iris)
x_data <- iris$Petal.Length
y_data <- iris$Petal.Width
 
lm(y_data~x_data)$coef
 
# Note the estimates:
#(Intercept)      x_data 
# -0.3630755   0.4157554

In TensorFlow the same takes a bit more works.

# Load tensorflow
library(tensorflow)
 
# Specify the model
A <- tf$Variable(c(1),	name="Coefficient")
b <- tf$Variable(c(1),	name="Intercept")
y <- A * x_data + b
 
# Initialize session and variable
sess <- tf$Session()
sess$run(tf$initialize_all_variables()) # To init all the vars
print(sess$run(y))
sess$close()
 
# Specify the optimizer
# Linear regression minimizes the mean squared error (MSE)
# Below value 0.03 is the learning rate that affects the optimization result
MSE <- tf$reduce_mean((y_data - y)^2)
optimizer <- tf$train$GradientDescentOptimizer(0.03)
train <- optimizer$minimize(MSE)
sess <- tf$Session()
sess$run(tf$initialize_all_variables()) # To init all the vars
 
# Run the model (in a loop)
for (i in 1:2000) {
    sess$run(train)
}
cat("Coefficient: ", sess$run(A), "\n Intercept: ", sess$run(b), "\n")
 
# For practical purposes estimates are the same ones
# given by lm()
#
# Coefficient:  0.4157551 
# Intercept:  -0.363074

Summary

Installation of TensorFlow went without hickups, but running it from R took some adjusting to. Something as simple as a linear regression takes a few extra steps compared to the ready-made functions in R (or Python for that matter). But even with this simple test case, it is easy to see that TensorFlow offers more flexibility for specifying the model than these ”usual suspects”. For example, if ordinary least squares is for some reason undesirable, it can be changed in TensorFlow: just specify a new optimizer! I’m not saying it can’t be done in R, but it takes a bit more work. However, for most cases, running a linear regression analysis on TensorFlow might be overkill, and it remains to be seen where it fits best in everyday life of a data analyst.

Tags: