Having decided to improve my capability in this space, I started to research available options. It became clear quite quickly that the choice would come down to Python or R for me, as both of these free and open sourced options had excellent reputations, fitted my budget and had a larged community of active users.
I can’t remember where I saw this tip but for me it was the decider and something worth sharing:
If you are a business user, with experience in Microsoft Excel, you should select R to start your foray into data science. R is a single threaded object oriented functional programming language and once you understand the key commands, it’s logical to use. Being single threaded it is quite predictable, which has been great for me as a novice programmer. R has a great graphics package which supplements what i really shines at, wrangling data.
If you have some coding experience on the other hand, Python is a much more general purpose language and on par for most things R is ‘good’ at.
Both groups have strong communities that share their knowledge in various blogs and events and really I think both are great choices.
Now that I have been using R for about a year, I’v had a chance to use it in the workplace showing off R’s capabilities by building machine learning churn and classification models to provide some business insights. A Python expert with an interest in machine learning saw the few lines of code required to wrangle the dataset, train a model and predict outcomes and quite frankly, he was shocked. We had a bit of an awkward R to Python conversation trying to understand the differences between data arrays and data frames afterwards, though for me this clarified where R was strong, ease of wrangling and building a model quickly for iterative improvement.
At the end of this my recommendation is, just pick one and start. Though I am still learning R, it is clear to me I will also need to learn Python to gain the unique benefits each language provides, magnified through collaboration.