Split an HLA typing string into alleles — extract_alleles

Takes in a space-separated HLA typing string and splits it into its constituent loci and alleles ("A_1", "A_2", "DRB1_1").

extract_alleles_str() takes in a single string, and returns a named character vector of alleles.

extract_alleles_df() takes in a data frame, where one column contains the typing string, and returns the same data frame along with a new column for each allele.

Usage

extract_alleles_str(
  string,
  loci = c("A", "B", "C", "DPA1", "DPB1", "DQA1", "DQB1", "DRB1", "DRB."),
  strip_locus = TRUE
)

extract_alleles_df(
  df,
  col_typing,
  loci = c("A", "B", "C", "DPA1", "DPB1", "DQA1", "DQB1", "DRB1", "DRB."),
  strip_locus = TRUE
)

Arguments

string

String, space-separated HLA typing.

loci

A string or character vector with the loci you are interested in. Only these alleles will be returned. Defaults to all. DRB. is used for DRB3, DRB4, and DRB5.

strip_locus

Include the locus in the output or remove it?

If TRUE (the default), the locus will be removed from the extracted alleles.
If FALSE, will retain the locus as it was in the original typing.

df

A data frame.

col_typing

The column in df that contains a space-separated HLA typing string for each row.

Value

Either a character vector or a data frame with the named alleles. A warning will be shown if any loci in the input have more than two alleles.

Examples

typing <- "A1 A2 B7 B8 Cw3 DQ5 DQ8 DR4 DR11 DR52 DR53"
extract_alleles_str(typing, loci = "A")
#> A_1 A_2 
#> "1" "2" 
extract_alleles_str(typing)
#>    A_1    A_2    B_1    B_2    C_1    C_2 DPA1_1 DPA1_2 DPB1_1 DPB1_2 DQA1_1 
#>    "1"    "2"    "7"    "8"    "3"     NA     NA     NA     NA     NA     NA 
#> DQA1_2 DQB1_1 DQB1_2 DRB1_1 DRB1_2 DRB._1 DRB._2 
#>     NA    "5"    "8"    "4"   "11"   "52"   "53" 

df <- tidyr::tibble(typing = typing)
extract_alleles_df(df, typing, loci = c("A", "B", "C"))
#> Joining with `by = join_by(typing)`
#> Joining with `by = join_by(typing)`
#> # A tibble: 1 × 7
#>   typing                                     A_1   A_2   B_1   B_2   C_1   C_2  
#>   <chr>                                      <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 A1 A2 B7 B8 Cw3 DQ5 DQ8 DR4 DR11 DR52 DR53 1     2     7     8     3     ""   

# Can also handle newer nomenclature
extract_alleles_str("DQB1*03:01 DQB1*05:01 DRB1*04:AMR",
  loci = c("DRB1", "DQB1")
)
#>   DRB1_1   DRB1_2   DQB1_1   DQB1_2 
#> "04:AMR"       NA  "03:01"  "05:01"