The pre-2010 "v2" notation does not include the field delimiters (:
) that
are now mandatory in v3. This function first tests if an allele is in v2
format; if an allele is not in v2 format; it's left alone. But if it is, it
looks up its v2 equivalent in the v2_to_v3 lookup table. If it is not in
the table, a v3 version is put together heuristically, by inserting :
after
every two digits.
Details
N.B. The heuristic prediction will not work in all cases. For example:
DPB1*87801N
should beDPB1*878:01N
(but is output asDPB1*87:801N
)DPB1*152401
should beDPB1*1524:01
(but is output asDPB1*15:24:01
)
In general it's not possible to make this work purely syntactically without
imbuing knowledge on which HLA alleles exist and which do not. For example,
should DRB1*1412601
be 14:126:01
or 14:12:601
? Both are theoretically
possible. However, alleles with > 2 digits per field are rare, and were not
really around before 2010, so in practice one should rarely encounter them in
v2 format.