Introduction to Data Science
DATA SCIENCE
EVOLUTION
25 years ago - Data Science is all about gathering and cleaning datasets and then applying statistical methods (Conventional methods like Regression, Factor Analysis, Cluster Analysis, Time series forecasting and so on).
Around 2018 it had grown so humungous as the field encompasses Data Analysis, Predictive analytics, Data Mining, Business Intelligence, Machine Learning, Deep Learning and so on.
INTRODUCTION
Data Science is the study of data to extract meaningful insights to improve performance of the business firms and drive strategic decision making of the organizations. In other words, the past data can be collected , preprocessed ,analysed and patterns would be extracted to predict future outcomes.
There is no father for Data Science but many have contributed to the domain knowledge of it.
Now let us discuss about the base of Data Science (i.e) DATA
DATA
Data can be of any form such as numbers, letters, words, images, audio , video, symbols , graphs and goes on. It might also be person's height, weight, age, gender. So, data is the raw form of knowledge. Two different types of data were
1) Traditional Data
2) Big Data
Traditional Data (Static)
The picture that comes to our mind while thinking about data would be like this---
ID NO |
NAME |
AGE |
GENDER |
1 |
Alan |
25 |
Male |
2 |
Bency |
24 |
Female |
3 |
Crum |
26 |
Male |
Above table was a perfect example of traditional data.
It is structured and stored in Databases which can be managed from one computer.It will be in table format with numeric values or text values as above table. Few examples are customer information , inventory records and students marklist etc.
Big Data (Dynamic)
This is extremely large amount of data which is impossible to manage from one computer.It can be structured ,unstructured or semi-structured.
The characteristics of Big Data can be defined as 3V's , 5V's , 7V's or even 11V's.
The main 5V's characteristics of Big Data are
- Volume- Vast amount of Data would be generated and collected
- Velocity- Speed at which data is generated and processed
- Variety- Many different types & formats of data
- Veracity- which defines its complexity
- Value- what valuable things would be done by organizations
Online platforms like Google, Facebook, Twitter generates millions of Data per second.
Comments
Post a Comment