Named-Entity Recognition

Introduction

Named-entity recognition (NER), also known as token classification or text tagging, is the task of taking a sentence and classifying every word (or "token") into different categories, such as names of people or names of locations, or different parts of speech.

For example, given the sentence:

Does Chicago have any Pakistani restaurants?

A named-entity recognition algorithm may identify:

  • "Chicago" as a location
  • "Pakistani" as an ethnicity

and so on.

Using gradio (specifically the HighlightedText component), you can easily build a web demo of your NER model and share that with the rest of your team.

Here is an example of a demo that you'll be able to build:

This tutorial will show how to take a pretrained NER model and deploy it with a Gradio interface. We will show two different ways to use the HighlightedText component -- depending on your NER model, either of these two ways may be easier to learn!

Prerequisites

Make sure you have the gradio Python package already installed. You will also need a pretrained named-entity recognition model. You can use your own, or this in this tutorial, we will use one from the transformers library.

Approach 1: List of Entity Dictionaries

Many named-entity recognition models output a list of dictionaries. Each dictionary consists of an entity, a "start" index, and an "end" index. This is, for example, how NER models in the transformers library operate:

from transformers import pipeline 
ner_pipeline = pipeline("ner")
ner_pipeline("Does Chicago have any Pakistani restaurants")

Output:

[{'entity': 'I-LOC',
  'score': 0.9988978,
  'index': 2,
  'word': 'Chicago',
  'start': 5,
  'end': 12},
 {'entity': 'I-MISC',
  'score': 0.9958592,
  'index': 5,
  'word': 'Pakistani',
  'start': 22,
  'end': 31}]

If you have such a model, it is very easy to hook it up to Gradio's HighlightedText component. All you need to do is pass in this list of entities, along with the original text to the model, together as dictionary, with the keys being "entities" and "text" respectively.

Here is a complete example:

from transformers import pipeline

import gradio as gr

ner_pipeline = pipeline("ner")

examples = [
    "Does Chicago have any stores and does Joe live here?",
]

def ner(text):
    output = ner_pipeline(text)
    return {"text": text, "entities": output}    

demo = gr.Interface(ner,
             gr.Textbox(placeholder="Enter sentence here..."), 
             gr.HighlightedText(),
             examples=examples)

demo.launch()

Approach 2: List of Tuples

An alternative way to pass data into the HighlightedText component is a list of tuples. The first element of each tuple should be the word or words that are being classified into a particular entity. The second element should be the entity label (or None if they should be unlabeled). The HighlightedText component automatically strings together the words and labels to display the entities.

In some cases, this can be easier than the first approach. Here is a demo showing this approach using Spacy's parts-of-speech tagger:

import gradio as gr
import os
os.system('python -m spacy download en_core_web_sm')
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")

def text_analysis(text):
    doc = nlp(text)
    html = displacy.render(doc, style="dep", page=True)
    html = (
        "
" + html + "
" ) pos_count = { "char_count": len(text), "token_count": 0, } pos_tokens = [] for token in doc: pos_tokens.extend([(token.text, token.pos_), (" ", None)]) return pos_tokens, pos_count, html demo = gr.Interface( text_analysis, gr.Textbox(placeholder="Enter sentence here..."), ["highlight", "json", "html"], examples=[ ["What a beautiful morning for a walk!"], ["It was the best of times, it was the worst of times."], ], ) demo.launch()


And you're done! That's all you need to know to build a web-based GUI for your NER model.

Fun tip: you can share your NER demo instantly with others simply by setting share=True in launch().