NATURAL LANGUAGE PROCESSING TOOLKIT

The Natural Language Processing Toolkit (NLTK) is a Python-based software application that offers a suite of tools for the purpose of processing natural language data. It provides APIs that can help quickly apply pretrained NLP models to your text, including Text Summarization, Sentence Similarity, and more. It also includes a user interface demo using Streamlit.

overview

INTRODUCTION

mesa-trabajo

Natural Language Processing (NLP) is a field of computer science and artificial intelligence that focuses on the interaction between computers and humans in natural language. It involves developing algorithms and models that can analyze, understand, and generate human language. NLP is used in a wide range of applications, including Text Summarization, Sentence Similarity, Chatbots, Grammar Correction, and more.

OUR APPROACHES

The Product Recognize will be implemented using various computer vision libraries, such as OpenCV and TensorFlow, and be integrated with other technologies such as barcodes to improve its accuracy and efficiency.
Our approaches for the Product Recognition system are in two ways (depending on the use case and available resources). Here are approaches:

product-recognition-approach

APPROACHES

Step 01:

Text Summarization

Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, whereas other models can generate entirely new text.

Our Text Summarization using LongT5 model has been fine-tuning on a large dataset of paired text summaries. This approach involves feeding the LongT5 model with pairs of text inputs and corresponding summaries and optimizing the model to predict accurate summaries.

The model was fine-tuned using techniques such as transfer learning, curriculum learning, and multi-task learning to improve its performance. Additionally, techniques such as beam search and length normalization can be applied to improve the quality of the generated summaries.

our-approaches

Step 02:

Sentence Similarity

The Sentence Similarity is first fed with a pair of input sentences, and the final hidden state of the [CLS] token is extracted. The [CLS] token represents the aggregated representation of the two input sentences. Then, a fully connected layer is added on top of the [CLS] token to produce a similarity score between 0 and 1 for the pair of input sentences. The model is then trained on a dataset of sentence pairs with corresponding similarity scores using mean squared error loss or binary cross-entropy loss. Once the model is trained, it can be used to compute the similarity between new pairs of input sentences.

Step 03:

Sentence Similarity

Named Entity Recognition (NER) is a natural language processing task that aims to identify and extract entities such as names, locations, organizations, and dates from text. Spacy is a popular Python library for NLP that provides an easy-to-use interface for NER. The basic approach for NER using Spacy involves the following steps:

our-approaches-2

Step 04:

Grammar Correction

Grammar Correction using a language model to generate grammatically correct sentences based on input text. Our approach uses techniques such as sequence-to-sequence models, and transformers. The model is trained on a large corpus of text to learn the patterns of grammar and syntax, and then used to generate synthetic sentences that adhere to those rules. The quality of the generated sentences depends on the complexity of the model and the quality and quantity of the training data.

our-approaches-3

Step 05:

Comment Classification

The Comment Classification detects whether text contains toxic content such as threatening language, insults, obscenities, identity-based hate, or sexually explicit language. Our approach is using a BERT model, which was trained on a large civil comments dataset.

 

USAGE

Step 01

Access to the NLP Toolkit site: https://experiment.saigontechnology.vn/nlp-toolkit/. Or you can access the main Saigon Technology AI Research Lab page here: https://experiment.saigontechnology.vn/, select the NLP Toolkit section and click Try our demo button.

usage-1

Step 02

On the NLP Toolkit page, to start please choose the demo in the sidebar.

usage-2

Step 03

Step 3.1: Input the corpus to the text area or simply enter an article URL.The summarization of the corpus/article will be displayed at the bottom of the page.

product-recognition-3

Step 3.2: (Sentence Similarity) Input the reference sentence and target sentence in the sidebar.Click the “Submit” button.

usage-3-2

Result:

usage-3-3

Step 3.3: (Named Entities Recognize)Input the sentence in the text area.Press “Ctrl +Enter” to submit the sentence.

usage-3-4

Result:

usage-3-5

Step 3.5: (Comment Classifier) Input your sentence in the text area.Press “Ctrl + Enter” to submit your sentence.

usage-3-8

Result:

usage-3-9

Let’s Talk

Together with our developers and analysts, we begin by discussing and analysing our client’s needs, sketching the outline