Navigation


A pre-trained Myanmar language model using BERT

Model Description

We provide MyanmarBERT:Myanmar Pre-trained language model based on BERT for Myanmar text.
We use MyCorpus for pre-training data size with 1.3G and use SentencePiece Library for vocabulary generation.
MyanmarBERT: 12-layer, 768-hidden, 12-heads, 187M parameters
Detail description of pre-training source code and fine-tune setting follow original BERT model.Please refer BERT https://github.com/google-research/bert.

zip file contains three items:

  • A TensorFlow checkpoint (model.ckpt) containing the pre-trained weights (which is actually 3 files).
  • A vocab file using sentencePiece Library.
  • A config file which specifies the architecture of the model.
Cite this work as:
Saw Win, Win Pa Pa, "MyanmarBERT:Myanmar Pre-trained Language Model using BERT", In Proceedings of ICCA2021, pp. 402-407, February 2021, Myanmar

Download

MyanmarBERT Model

Paper


4 Aug., 2022