two stages of distillation -> distill transformer Language Model
** 1. use PubMedBERT as a teacher of BERN2 model
** 2. use BERN2 as a teacher of TinyBERN2 model
4. Experiments and Results
4.1 Benchmark Datasets
8 benchmark datasets of 6 entity classes: Gene/Protein, Disease, Chemical, Species, Cellline, and Cell type
** object: to examine generalizability(predict unseen entity)
** datasets from MTL-Bioinformatics-2016 github repo
** CoNLL-X format
4.2 Results
4.2.1 Evaluation metrics(accuracy)
WS-BERN2 is analogous to BERN2
TinyBERN2 is lower than WS-BERN2 but marginal
4.2.2 Evaluation metrics(Computational costs)
is memory and speed(probably)
5 Discussions
5.1 Effect of traiing by soft-labeling
meaningful in fewer ratio labels
5.2 Tagging Schema
BIO tagging vs IO tagging(No B tag, B to I)
** use BPE tokenizing(unknown to byte)
** IO is economical in speed
** IO is not recommended in enterprise because this not guarnatee the result
5.3 Enterprise usage of Kazu
use in AstraZeneca for biological knowledge graph(BIKG) construction and clinical trial design
Limitations
not tested in large scale condition(like many cpus and gpus)
** not know how throughput will change depends on scales
Find me on WeChat with the ID YourWeChatUsername, or scan my QR code: