Hello! I should say ?????????????????? since today’s topic is regarding Indian language.
Natural Language Processing looks fascinating but it’s similar to Machine Learning where we need data cleaning and data pre-processing.
Sounds boring right? But it’s not our mistake…machines never tried to learn human languages . It was us who generously learnt numbers to communicate with them . Jokes apart, when we talk data pre-processing, Tokenization is an integral part of this. Basically, we split the text further into units called tokens which can be words or characters.