What do you do to the raw data?

 Data pre-processing

Raw data is untouched data that needs to be converted into the form that is more understandable and useful for further processing.The group of process to do this are called pre-processing.

  • Class labelling the observations

Arranging data by category or labelling data points to the correct datatype. Example: Foe traditional data this can be numerical/categorical whereas for bigdata it can be text,digital image,digital audio.

  • Data cleansing/Data scrubbing

Dealing with inconsistent data(misspelled categories & missing values)

  • Data Balancing

Performing balancing methods for unequal number of operations.

  • Data Shuffling

Rearranging data points to eliminate unwanted patterns(patterns due to sampling emerge) & improve predictive performance.

  • Data Masking

It involves concealing the original data(personal information) with random & false data.

Comments

Popular posts from this blog

Introduction to Data Science