#GPT’s vocabulary behaves the worst: more than 23% long #Chinese tokens (i.e., a token with more than two Chinese characters) are either #porn or online #gambling.
https://arxiv.org/html/2508.17771v1
#GPT’s vocabulary behaves the worst: more than 23% long #Chinese tokens (i.e., a token with more than two Chinese characters) are either #porn or online #gambling.
https://arxiv.org/html/2508.17771v1