Machine Learning and Data Visualization using Orange

Payal Das
6 min readDec 22, 2020

--

In this blog, we will be discussing about Orange, an open source tool which provides machine learning and data visualization capabilities for novice and expert users.

Introduction to Orange

Orange is an open source component-based visual programming software package used for machine learning, data visualization, data analysis and data mining. Components of Orange are called widgets and they range from simple data visualization, pre-processing and subset selection, to practical evaluation of learning algorithms and predictive modeling.

In Orange, visual programming is achieved through an interface in which workflows are created by linking predefined or user-designed widgets, and advanced users can use Orange as a Python library for data processing and widget changes.

Features of Orange

Following are the eye-catching features of Orange:

1. Open source

The best part of Orange is that it is open source, so you can get its code and even modify the tool as needed. In addition, you can also get the source code of almost all machine learning algorithms. Therefore, you can modify the algorithm according to the application, and then you can add the modified algorithm in Orange and get the result.

2. Visual Programming

This tool is not only suitable for computer science professionals, but even novice users can use it because it provides visual programming. There is no need to learn any programming language, such as JAVA, C, C++ or Python, etc. The only thing you should understand is the concept of data mining, and you should know which algorithm should be used in a specific situation.

It provides drag and drop functionality. It even provides connecting wires. Charting has never been as fun as Orange. When you experience the flexible visual environment of this tool, you will love this tool. If the connection is not correct, it will provide a dotted line. If you are not using the correct machine learning algorithm or prediction algorithm, then it will not allow you to connect to the data.

In short, visual programming provides an interactive data browsing function, which enables quick qualitative analysis through clear visualization. The GUI allows users to focus on exploratory data analysis instead of coding, and smart default settings make prototyping of any data analysis workflow very fast and easy.

Just place the widget on the canvas, connect, load the dataset and generate insights.

3. Add-ons are available to extend the functionality

The functionality of Orange can be expanded through the add-ons provided online. In fact, Orange never provides a way to process biometric data as an essential toolbox; rather than it has always been an add-on. Last year, the exact process of distributing add-ons has changed a lot to simplify the process for add-ons authors and make them more standards-compliant. Among other things, this allows system administrators to install add-ons directly from PyPi system-wide using easy_install or pip.

4. Create Dataset from any Graph you want

You can use the paint data function to draw your graph, and by using this paint data utility, you will be able to generate a data set for the graph.

Fig 1. Visualizing graph using ‘Paint Data’

The biggest beauty of Orange is that you can perform reverse processing. In general, we have seen that we draw graphs from existing data, but the opposite is also possible here! Below we have the dataset generated from the graph shown above.

Fig 2. Generated Dataset from the graph shown above

Working with Data

Orange provides you many options to do almost everything with your dataset. As shown in following figure there are almost 26 options to organize your dataset in any manner as you wish.

Fig 3. Widgets to organize your dataset

Visualizing Data

You can visualize data in about 16 different types of plots and graphs. A very simple and interesting feature of Orange, you only need to connect the data set to the desired plot or graph you want and things are done.

Fig 4. Widgets for Data Visualization

Supervised Data Model

Orange provides nearly 12 built-in machine learning models. You can use them to directly train the data set. Built-in models include the most popular machine learning algorithms, such as SVM, KNN, Logistic regression, Navies Bayes, etc., as shown in the figure below.

Fig 5. In-built Supervised Machine Learning Model Widgets

Unsupervised Model

Orange provides inbuilt model for both supervised as well as unsupervised learning methods. It provides direct implementation of algorithms like K-Means, PCA etc. It also provides the access to other models as shown in given figure:

Fig 6. In-built Unsupervised Machine Learning Model Widgets

Evaluation of Performance of Models

Orange is not only a powerful implementation tool, but also an excellent tool for evaluating the performance of different models.

Fig 7. Widgets for Evaluating the performance of Models

Test and score is one of the most commonly used widgets in Orange.

The widget mainly accepts 2 inputs-Data and Learners.

The Data is the data set we use for modeling, such as titanic.tab, which has been pre-loaded in the File widget.

The Learner is any kind of learning algorithm, for example, it can be, KNN, Logistic regression or SVM. You can only use those learners who support your task type. If you want to perform classification, you must not use linear regression, and for regression, you cannot use logistic regression.

Most other learners support both tasks. You can connect more than one learner to test and scores.

Fig 8. Using the Test & Score Widget

As you can see in the above diagram we have used Test & Score from the Evaluate options, connected it to a dataset file and also connected multiple learners to it which are Navies Bayes, Logistic Regression and Random Forest.

Below, you can see various options available which you can choose for evaluation based on your requirements. Based upon the selection, in the right pane you can see the results.

Fig 9. Evaluation Results for Test & Score

Therefore, it is very easy for users to use the test and score feature to evaluate multiple models at the same time.

Conclusion

Orange is an open source data visualization and data analysis tool for data mining through visual programming or Python scripting. The tool has components for almost all well-known machine learning algorithms, add-ons for bioinformatics and text mining as well as features for data analytics also. So, for researchers it is a one stop solution for pre-processing of dataset, visualization of dataset using graphs, all inbuilt machine learning algorithms, test and score feature for measuring accuracy of algorithm on different datasets along with many more fantastic features.

--

--

Payal Das
Payal Das

Written by Payal Das

Payal Das is currently pursuing her B-Tech degree in IT-Data Science. R enthusiast, writes about tech. Follow her on LinkedIn: linkedin.com/in/payald17

No responses yet