Aim: Data Pre-processing and text analytics using Orange.
what is text analytics?
Text analytics is the automated process of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns.
What is sentiment analysis?
A sentiment analysis system for text analysis combines natural language processing (NLP) and machine learning techniques to assign weighted sentiment scores to the entities, topics, themes and categories within a sentence or phrase.
Randomization:
Randomization minimizes the differences among groups by equally distributing people with particular characteristics among all the trial arms.
Sample Code:
>>> from Orange.data import Table
>>> from Orange.preprocess import Randomize
>>> data = Table("titanic")
>>> randomizer = Randomize(Randomize.RandomizeClasses)
>>> randomized_data = randomizer(data)
Discretization:
This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation data by the models.
Sample Code:
import Orange
store = Orange.data.Table("superstore.tab")
disc = Orange.preprocess.Discretize()
disc.method = Orange.preprocess.discretize.EqualFreq(n=3)
d_store = disc(store)
print("Original dataset:")
for e in store[:3]:
print(e)
print("Discretized dataset:")
for e in d_store[:3]:
print(e)
Continuation:
Sample Code:
import Orange
products = Orange.data.Table("Products")
continuizer = Orange.preprocess.Continuize()
products1 = continuizer(titanic)
Normalization:
It is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies.
Sample Code:
>>> from Orange.data import Table
>>> from Orange.preprocess import Normalize
>>> data = Table("Customers")
>>> normalizer = Normalize(norm_type=Normalize.NormalizeBySpan)
>>> normalized_data = normalizer(data)
No comments:
Post a Comment