DATA SCIENCE

Python vs. R — What to Choose for Data Science?

Detailed comparison of Python and R languages for use in Data Science

Mikhail Raevskiy
6 min readDec 8, 2020

--

Artwork by the author. Photo: Pexels

Python and R have long been the standard for Data Science. The essence of their opposition is that both languages ​​are great for working with statistics. While Python has clear syntax and a large number of libraries, the R language was developed specifically for the statistician, and therefore is equipped with high-quality data visualization. SQL stands out — because if the data is already in tables, then it’s more luck than a reason for frustration — and Scala — mainly due to the fact that the most popular distributed data processing framework Spark is written in it.

To conduct primary data analysis and decide on the future of a feature, SQL and the command line alone are enough because data science is, first of all, not about libraries with catchy names but an approach. Nevertheless, such minimalism has its limits (and a beginner may generally be scared off), and at some point, you will still have to turn to more…

--

--

Mikhail Raevskiy

Bioinformatician at Oncobox Inc. (@oncobox). Research Associate