R and Python programming languages are today's major rivals when it comes to data science and Big Data development. Both have pros and cons and the choice of this or that language depends on each particular situation, project goals, user experience (UX) requirements, learning curve and other factors.
Python and R are perfectly suited for Big Data and statistics. While R was developed specifically to address the needs of statisticians (it has a very strong data visualization capability), Python is famous for its clear syntax.
R and Python Use Cases
R was introduced back in 1995 as a healthier alternative to S and as an attempt to ensure a higher quality and a clearer approach to data analysis, stats and graphical models. Originally, R was used for scientific and R&D purposes only, but gradually it has penetrated the corporate world, too. That's why R is one of today's most dynamic and rapidly evolving programming languages used for development of corporate data science solutions.
One of the biggest advantages of R is a huge community of adepts who help support the language in email campaigns, create user documentation and share knowledge in an extremely active Stack Overflow group. All R enthusiasts can freely participate in CRAN, a gigantic repository of recommended R packages that provides immediate access to the newest approaches and functions and saves developers from having to re-invent a bicycle.
One of the biggest disadvantages of R is a very steep learning curve. Experienced software developers can easily acquire R skills, while rookies may find it extremely difficult. As such, your search for R skills and competences may take a while, which is one of the reasons why many data scientists opt to Python development.
Introduced back in 1991, Python focuses on efficiency and code readability. There're many active Python users among programmers willing to dive deep into data analytics and statistical approaches. And the deeper you submerge, the more you're going to like coding in Python (that's what many Python devs actually say). This flexible language is a perfect fit for building innovative Big Data solutions. Considering its simplicity and legibility, the learning curve is very flat and smooth. That being said, you can easily convert your on-staff developers skilled in other technologies to become Python coders.
Just like R, Python has packages, too. PyPi is a list of Python packages that contains libraries that can be appended by any user. Python also has a big community of practitioners and evangelists; however, this community is rather heterogeneous because Python is a versatile language. Yet, data science is one of the key areas today that takes advantage of Python capabilities. More and more apps that deal with data analysis are built with Python.
Although there're many R and Python comparison infographics available online today, it's hard to compare them objectively. The main reason is that R use cases are limited to data science, while Python, being more universal, is applied extensively to other areas such as web development. Therefore, most rankings distort the truth in favor of Python, while R specialists are claimed to make more money than Pythonians.
When to use R?
R is used on projects when data analysis requires dedicated computing or separate servers. R is perfectly tailored to research work and is convenient to use for any type of data analysis, as it contains a lot of packages and ready tests that ensure the right tooling for any project kickoff.
Before starting to code in R, you need to install RStudio IDE first and then get acquainted with the following popular packages:
- dplyr, ply and data.table that simplify and facilitate package manipulations;
- stringer to work with code lines;
- zoo to work with regular and irregular time series;
- ggvis, lattice and ggplot2 for data visualization;
- caret for machine learning
When to use Python?
Python will be handy when your data analysis tasks need to be embedded in web apps or if statistics code needs to be integrated with a production database. Being a full-fledged programming language, Python can be used to implement algorithms for production use.
Not so long ago Python data analysis packages were in infancy, which posed a certain problem to developers. However, the situation is much better now. To work with Python, you need to install NumPy /SciPy (scientific computation) and pandas (data analysis library). Also, you may need additional libraries such as matplotlib for graphics and scikit-learn for machine learning.
Unlike R, Python has several IDEs and your IDE choice should be based on your project specific objectives and end goals.
R and Python: Big Data Market Share
Recent polls by Stack Overflow clearly mark the leadership of R within Big Data developer community (see image below).
However, contrary to the Stack Overflow stats and according to the Intersog insights, more developers are migrating from R to Python today. To say more, there's a growing number of developers skilled both in R and Python. We recommend that young developers learn R and Python equally to use them as a stack on data analytics projects. If you're going to pursue a career in Big Data, it's a must to have skills in this stack! Market trends suggest that both R and Python are in high demand these days and they both pay higher than the average IT market salaries.
To conclude with, it's up to you which language to choose for your Big Data project: R or Python. When making your choice, do ask yourself the following questions:
- What problems do you want to solve with your solution?
- How much will it cost you to train / re-qualify your in-house programmers to code in R vs Python?
- Is there a sufficient supply of R vs Python developers in your local market? If not, what alternative locales are you ready to source your talent from?
- What tools are most actively used in your professional environment?
- Are there any alternative tools to use?