Report for incorrect sentence split (JNLPBA-IOBES) #4

Open
opened 2025-10-14 16:18:12 -06:00 by navan · 0 comments
Owner

Originally created by @wonjininfo on 3/28/2019

Hi,
Thanks for providing these useful resources!
While we were using the resources, we got to know that sentences in JNLPBA-IOBES dataset might be incorrectly split.

MTL-Bioinformatics-2016/data/JNLPBA-IOBES/test.tsv starts with

Number	O

of	O
glucocorticoid	B-protein
receptors	E-protein
in	O
lymphocytes	S-cell_type
and	O
their	O
sensitivity	O
to	O
hormone	O
action	O
.	O
The	O

study	O
demonstrated	O

while MTL-Bioinformatics-2016/data/JNLPBA/test.tsv starts with

-DOCSTART-	O

Number	O
of	O
glucocorticoid	B-protein
receptors	I-protein
in	O
lymphocytes	B-cell_type
and	O
their	O
sensitivity	O
to	O
hormone	O
action	O
.	O

The	O
study	O

We used our own post-preprocessing script to fix this and used the fixed dataset in our experiments.

Once again, thank you so much for sharing these useful resources!

*Originally created by @wonjininfo on 3/28/2019* Hi, Thanks for providing these useful resources! While we were using the resources, we got to know that sentences in JNLPBA-IOBES dataset might be incorrectly split. `MTL-Bioinformatics-2016/data/JNLPBA-IOBES/test.tsv` starts with ``` Number O of O glucocorticoid B-protein receptors E-protein in O lymphocytes S-cell_type and O their O sensitivity O to O hormone O action O . O The O study O demonstrated O ``` while `MTL-Bioinformatics-2016/data/JNLPBA/test.tsv` starts with ``` -DOCSTART- O Number O of O glucocorticoid B-protein receptors I-protein in O lymphocytes B-cell_type and O their O sensitivity O to O hormone O action O . O The O study O ``` We used our own post-preprocessing [script](https://github.com/wonjininfo/CollaboNet/blob/4419de4230667938b3382b26a372c117680f8758/preprocessing.py#L127) to fix this and used the fixed dataset in our experiments. Once again, thank you so much for sharing these useful resources!
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/MTL-Bioinformatics-2016#4
No description provided.