A pre-trained Myanmar language model using BERT
Model Description
We provide MyanmarBERT:Myanmar Pre-trained language model based on BERT for Myanmar text.
We use MyCorpus for pre-training data size with 1.3G and use SentencePiece Library for vocabulary generation.
MyanmarBERT: 12-layer, 768-hidden, 12-heads, 187M parameters
Detail description of pre-training source code and fine-tune setting follow original BERT model.Please refer BERT https://github.com/google-research/bert.
zip file contains three items:
- A TensorFlow checkpoint (model.ckpt) containing the pre-trained weights (which is actually 3 files).
- A vocab file using sentencePiece Library.
- A config file which specifies the architecture of the model.
Cite this work as:
Saw Win, Win Pa Pa, "MyanmarBERT:Myanmar Pre-trained Language Model using BERT", In Proceedings of ICCA2021, pp. 402-407, February 2021, Myanmar
Download
4 Aug., 2022