Naive Bayes Classifier
Github
Introduction
Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.
- 1. Imports.
-
- 2. Reading Datasets Amazon, Yelp, Kaggle.
- 3. Cleaning the dataset.
- 4. Positive sentiments vs Negative sentiments graph.
- 5. Split data into train, test and dev.
- 6. Build Vocabulary
- 7. Probability of the occurrence of the words.
- 8. Calculate accuracy using dev dataset.
- 9. Conduct five fold cross validation.
- 10. Compare the effect of Smoothing.
- 11. Derive Top 10 words that predicts positive and negative class.
- 12. Final Accuracy.
- I could achieve an accuracy of 73.81%
- Graphs
- We can see from the above graphs that the positive and negative reviews are almost the same.
- Challenges
- I faced a challenges in computing the probability values and accuracies which I could overcome with the help of a few references.
- References
- https://www.karanr.dev/blog/2020/naive_bayes_large_movie_reviews/naive_bayes_large_movie_reviews/
- https://cs230.stanford.edu/blog/split/
- https://levelup.gitconnected.com/movie-review-sentiment-analysis-with-naive-bayes-machine-learning-from-scratch-part-v-7bb869391bab
- https://satyam-kumar.medium.com/imdb-movie-review-polarity-using-naive-bayes-classifier-9f92c13efa2d
- https://scikit-learn.org/stable/modules/naive_bayes.html