Why open source solutions are so desired today
Recent years have seen an increased interest and adoption of open source technology in various industry sectors. The open source technologies such as Hadoop, Cassandra, Apache Spark, Talend, Python, R and many others provide best-in-class solutions in delivering Big Data projects and have become the preferred choice in many companies.
Open source emerged in the late ’90s, and it has undergone significant changes since its inception. Wikipedia defines the open source as “a term denoting that a product includes permission to use its source code, design documents, or content.”.But the essence of open source is far more profound than just access to the source code.
Contrary to proprietary solutions, the open source model embodies unique qualities: goal-oriented yet loosely coordinated people, who cooperate on a voluntary basis to create freely distributed products and services. Such a model is an excellent exemplar of human ability and aspiration to collaborate: people that barely know each other, working together to achieve specific goals, even without direct benefits.
Undeniably, open source is a new emerging paradigm not only for software development but also in our entire society. It opens the doors to a more sustainable and respectful environment where open collaboration and transparency are at the heart of development dynamics.
Why is open source right for business?
The first and obvious answer is the cost! Open source technology is free. There is no license, no fee, no copyright of the source code. Everyone can use, modify, and share it.
The less money and time you spend, the lower is its end price to customers. The company can significantly reduce the expenses and time to release their products.
But beyond this simple response, there are another three important reasons that need to be considered.
The presence of a great community
Behind any great open source technology, there is a vast and tight community united to deliver best-of-breed solutions and to strive to make it functional, reliable and secure.
Community developers love what they do and are motivated by peer recognition. As a result, it leads to the production of higher quality products with relatively short lead times.
The remarkable scikit-learn Python library is free and used worldwide in solving Machine Learning and Data Science problems. The library offers an extensive range of built-in algorithms that make the most of Data Science projects.
Started in 2007 as a Google Summer of Code project by David Cournapeau, the current library is the result of a strong and tight-knit community effort.
Open source is about freedom
When adopting open source technology, you are free to change it however you want.
One of the important benefits is the ability to build upon others’ work in a direct way because contributors can directly view the architecture of the product. That means you can reuse code, saving on skills, time, and cost. All that is authorized by the open source licenses.
Speed and paid support
Thanks to the widely distributed community and variety of tools, plug-ins, and simply pieces of code available on the internet you can solve different tasks in no time.
You no longer need to wait for a new release from a software provider to create a new feature.
Moreover, you will be pleasantly surprised to discover that lots of things you need exist already. Many leaders are still scared about the fact that there is no “number” that they can call when an issue or need arises.
Although open source technology often comes with excellent documentation, wiki, forums and an active community, it is still possible to opt for paid support. The paid support will help you fix the bugs more quickly or address your specific needs.
Now, let’s look at the most popular open source programming languages Python and R used in Data Science and see what the main difference is.
What programming language to choose when developing data science projects?
The programming languages Python and R are free software environment and have a community-based development model.
The languages are widely used among statisticians, developers and data miners. Python is the language of 2018 in the TIOBE index (a measure of the popularity of programming languages) and R ranks 12th.
The language R:
This language focuses on better, user-friendly data analysis, statistics and graphical models.
It is used primarily in academics and research and gets even more adoption from researchers and statisticians.
The closer you are to statistics, research and Data Science, the more you might prefer R. Theuwissen
The language Python:
It focuses on productivity and code readability. It is used primarily in engineering environments with fast delivery and interoperability constraints.
The adoption from developers is huge.
The closer you are to working in an engineering environment, the more you might prefer Python.
At Saagie, we have chosen to embrace open-source technologies and to benefit our customers. Our Saagie Data has been designed to integrate a large range of open-source component (R and Python that are mentioned above, but also Spark, Scala, notebooks and much more!), always up to date in order to orchestrate them seamlessly to build and deploy business applications at scale. For more details, feel free to check out the technology we use.
Hope that it gives you some food for thought about open source and think whether you want to commit to a philosophy that looks beyond today to the environment of tomorrow.