Naive Bayes Classifier

Github

Introduction

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

1. Imports.
2. Reading Datasets Amazon, Yelp, Kaggle.

3. Cleaning the dataset.

4. Positive sentiments vs Negative sentiments graph.

5. Split data into train, test and dev.

6. Build Vocabulary

7. Probability of the occurrence of the words.

8. Calculate accuracy using dev dataset.

9. Conduct five fold cross validation.

10. Compare the effect of Smoothing.

11. Derive Top 10 words that predicts positive and negative class.

12. Final Accuracy.

I could achieve an accuracy of 73.81%

Graphs
We can see from the above graphs that the positive and negative reviews are almost the same.

Challenges
I faced a challenges in computing the probability values and accuracies which I could overcome with the help of a few references.

References
https://www.karanr.dev/blog/2020/naive_bayes_large_movie_reviews/naive_bayes_large_movie_reviews/
https://cs230.stanford.edu/blog/split/
https://levelup.gitconnected.com/movie-review-sentiment-analysis-with-naive-bayes-machine-learning-from-scratch-part-v-7bb869391bab
https://satyam-kumar.medium.com/imdb-movie-review-polarity-using-naive-bayes-classifier-9f92c13efa2d
https://scikit-learn.org/stable/modules/naive_bayes.html