Natural Language Processing

Navigation

Home
About Us
Contact Us
Release

Since June 22, 2011

A pre-trained Myanmar language model using BERT

Model Description

We provide MyanmarBERT:Myanmar Pre-trained language model based on BERT for Myanmar text.

We use MyCorpus for pre-training data size with 1.3G and use SentencePiece Library for vocabulary generation.

MyanmarBERT: 12-layer, 768-hidden, 12-heads, 187M parameters

Detail description of pre-training source code and fine-tune setting follow original BERT model.Please refer BERT https://github.com/google-research/bert.

zip file contains three items:

A TensorFlow checkpoint (model.ckpt) containing the pre-trained weights (which is actually 3 files).
A vocab file using sentencePiece Library.
A config file which specifies the architecture of the model.

Cite this work as:

Saw Win, Win Pa Pa, "MyanmarBERT:Myanmar Pre-trained Language Model using BERT", In Proceedings of ICCA2021, pp. 402-407, February 2021, Myanmar

Download

MyanmarBERT Model

Paper

4 Aug., 2022