top of page

Natural Language Processing

Ben-Gurion University of the Negev

white-background-with-blue-tech-hexagon_1017-19366.jpg

These tasks created as part of the Natural Language Processing (NLP) course at the Ben-Gurion University of the Negev. 

​

Natural language processing is the research field in which we develop, test and analyze machine learning algorithms that are used in order to automatically process large amounts of text in order to understand given texts and generate new texts. The course makes heavy use of machine learning but introduces concepts from linguistics and cognitive psychology. Typical examples for active research topics and applications are spam detection, error correction, machine translation, topic modeling, document classification and demographic attribution.

ass4.png
91f096ca51f69c4c9e4a030658a5b1e7.gif

Text Preprocessing, Language Modeling and Generation
Implement a Markovian language model and a language generator. We use noisy channel algorithm for spell checking. Combining the noisy channel with a language model is a simple, though powerful, algorithm that demonstrates some key elements in language processing and the way statistical machine learning implicitly accounts for cognitive and technological biases.

Contextual Spell Checking
Distributional semantics and Text Classification. In this assignment we built a spell checker that handles both non-word and real-word errors given in a sentential context. In order to do that we learn a language model as well as reconstruct the error distribution tables (according to error type) from lists of common errors. Finally, we combine it all to a context sensitive noisy channel model.

 Part of Speech Tagging
Implement a Hidden Markov Model and a BiLSTM model for Part of Speech tagging. Using discriminative models for POS tagging (MEMM and bi-LSTM).

Assignment 3 - Authorship Attribution
Code | Notebook | Report

LSTM networks - Using various algorithms for text classification, performing an authorship attribution task on Donald Trump’s tweets. A comprehensive report, the accompanying code and classification output obtained on a test set is included in the repository.

ass3_5.png
ass3_1.png

We tested the performance of "clasic" machine learning models using SKLearn and Neural Networks using PyTorch.To facilitate the use of models from SKLearn, a base class is implemented which acts as a uniform wrapper for them. Grid Search and 5-fold Cross Validation is used to search over the hyper parameters range and select the best performing model.

ass3_9.png
ass3_3.png
bottom of page