What do you do to the raw data?
Data pre-processing
Raw data is untouched data that needs to be converted into the form that is more understandable and useful for further processing.The group of process to do this are called pre-processing.
- Class labelling the observations
Arranging data by category or labelling data points to the correct datatype. Example: Foe traditional data this can be numerical/categorical whereas for bigdata it can be text,digital image,digital audio.
- Data cleansing/Data scrubbing
Dealing with inconsistent data(misspelled categories & missing values)
- Data Balancing
Performing balancing methods for unequal number of operations.
- Data Shuffling
Rearranging data points to eliminate unwanted patterns(patterns due to sampling emerge) & improve predictive performance.
- Data Masking
It involves concealing the original data(personal information) with random & false data.
Comments
Post a Comment