MyanBERTa: A Pre-trained Language Model For Myanmar

Model Description

This model is a BERT based Myanmar pre-trained language model. MyanBERTa has been pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words). As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied.

Contributed by:
    Aye Mya Hlaing
    Win Pa Pa


MyanBERTa is available on the  Hugging Face Hub

Updated on 26 Jul., 2022