DISCLAIMER: I consider myself a R newbie. All this was written for the private purpose of recording the process and verifying if I am not missing something or I get a better idea how to tackle some of this stuff. Prior to that I had read a lot of other people’s material and the only thing I do is joining small parts of various solutions to problems contained in those articles.

Automation

I love using R. Truly. I had to learn a lot of stuff prior to engaging it at work but scripting R code, especially when I can mix R and SQL within one document is really unbeatable. Then for a moment it is all nice and cool but when you reach certain number of scripts that you run manually every now and then it is not fun anymore. Then you know that what you need is automation. But how to do it when you are on a budget, you want to keep it simple and implement really fast? Oh, and I forgot, it would also be nice to be able to pass the whole system to somebody who will not have to spend a week learning how it works really.

I have heard a lot about different tools on #measure-slack: Luigi, Airflow, what-not. I have checked documentation on some of them and it did not check my most important box: keep it simple. Therefore I decided to make it work on the stuff I already knew, that is two libraries: taskscheduleR and the second one: googlesheets. So, how it works?

taskscheduleR

library(taskscheduleR) utilises Windows Task Scheduler to schedule and automatically run your scripts. It is not hard to use the package but you can make it even simpler by using taskscheduleR add-in from Addins menu.

Addins list

Addins list

In order to find it on your Addins list you have to install the package itself:

install.packages(“taskscheduleR”)
library(taskscheduleR)

When installed you will see a new option in Addins menu. Clicking it opens a really easy to use scheduling panel.

 

TaskscheduleR add-in in the menu

TaskscheduleR add-in in the menu

You can schedule many scripts although this can take a lot of clicking and get messy at some point.

You can also create one R script with a list of your other scripts that you want automated.

setwd(“C:/Users/Kamil/Documents”)

source(“script_1.R”)
source(“script_2.R”)
source(“script_3.R”)

This way you can pack even twenty-something scripts into one script and schedule just this one and it will run all of the scripts listed within the single file.

I was happy with the result but I started to think of some form of notification upon successful completion (or not!) of the scheduled scripts. My colleague gave me a hint that we could use good old Google Sheets.

Google Sheets

The idea was simply to get some form of confirmation that the script went through all the lines of the script and returned some message. I decided to use Google Sheets with one sheet including all the names of the scheduled scripts. Next to them there would be timestamps (current date & time) inserted every time the script is run.

R script statuses in Google Sheets

R script statuses in Google Sheets

In order to do this another library has to be loaded:

library(googlesheets) 

# We have to define the spreadsheet url and a sheet name 
# as well as cells address that we want to print our data into.

my_ws <- gs_url(“https://docs.google.com/spreadsheets/d/gdf/edit#gid=0”)

my_ws <- my_ws %>%
gs_edit_cells(ws = "Sheet1", input= Sys.time(), Anchor = "B2")

Now, I do copy and paste these couple of lines into my scripts (changing anchor address for the corresponding script name in my worksheet). I use either sys.time() as an input returning date & time or simply function today() from package lubridate to show the current date.

To be able to run it in a non-interactive environment (more about it in previous parts of this blog) we have to save & keep the authorisation token. Best address to find out how to do it.

Now, when we schedule the scripts we will be able to see timestamps/date next to them. I ran them every day so I can share the link to this doc to my colleagues and they will be able to see if the relevant data was prepared this day. Additionally I use simple conditional formatting so that when timestamps = current_date -> cells are painted green, when painted yellow -> script was successfully ran yesterday, when painted red -> script was not successfully ran for more than two days.

Bonus – e-mail report

If the status of the script is not enough, you can go even further and when your scheduled script has been launched, you can automatically send email reports to your stakeholders. An email can contain simple information that the script was run or data was updated. It can contain a link to a spreadsheet with updated data or the file with data itself. To do this you can use a library such as:

library(mailR)

More about it can be found at this address.