Final

Small Language Model

Why:

Instruction Alignment: easily parse, reduce inference time
Task Alignment: consider all options in generating a response. explain reason.
NNSearch: (1) Inverted File Index (2) Hierarchical Navigable Small Worlds (3) RAGatouille // TODO
RAG with abbreviation injection

Textual Entailment Recognition: given two text fragments, determine whether the meaning of one text is entailed (can be inferred) from the other text. (no neutral case in our problem)

Failure Mode: no retrival, The scoring system currently cannot select options like "None of the above". Due to evaluation.

Machine Translation

MT Problems: (1) Lexical divergences: no one-to-one mapping in word meaning (2) Structural divergences: Syntax, word order; Syntax-semantics relationship

Solution: linking words (If a word in the target frequently co-occurs with a word in the source, these will be, over several iterations, aligned with relatively greater frequency)

BLEU scores are based on token ngram overlap

Very sensitive to tokenization
Unnecessarily complicated
Doesn’t correlate with human judgments as well as simpler metrics
A better alternative: chrF (character F-score)

chrF: A good machine translation will tend to contain characters and words that occur in a human translation of the same sentence. Correlates with human judgments quite well while being robust to tokenization difference

Precision-chrP: percentage of character 1-grams, 2-grams, ..., k-grams in the hypothesis that occur in the reference, averaged.
Recall-chrR: percentage of character 1-grams, 2-grams,..., k-grams in the reference that occur in the hypothesis, averaged.
k=6, chrF_beta = (1 + beta^2) * (chrP * chrR) / (beta^2 * chrP + chrR)