This session we will review Google BERT - a tried and tested, as well as State of The Art technique for NLP tasks. We will go through the text encoding process which is similar to word-vectors approach. We will also explore the WordPiece tokenizer and see how that covers much of English language words. Then we will bulid a Multi-Laber text classifier using the encodings. Once we have the basic framework ready we will train the models and see how it performs. Next, we will go through key-steps to optimze the resulting model(s) for serving. Finally we will see a demo of the classifier we just built that runs quite fast. All the model training code will be in Python and some parts of model serving will be in Java.
https://github.com/tuxdna/bert-multi-label-classifier/