Convert a vector of GL Strings to a data frame with one column per allele
Source:R/extract_alleles.R
gl_to_df.Rd
gl_to_df()
takes in a character vector of
GL Strings, and transforms it into a wide data frame
with one row per GL String and one column per allele.
Value
A data frame with the following three columns:
glstring_index
Counter for each GL String in the vectornamespace
(e.g."hla"
)version_or_date
(e.g."3.29.0"
or"2023-05-27"
)
In addition, the data frame will have one column for every locus/allele
found in the GL strings (e.g. A_1
, A_2
, B_1
, B_2
, C_1
, C_2
for a class I typing).
See also
gl_to_vec()
for the basic conversion of a GL String to a vectordf_to_gl()
for the opposite operation
Examples
glstrings <-
c(
"hla#2023#HLA-A*01:01^HLA-B*07:01+HLA-B*08:01^HLA-C*01:01+HLA-C*03:04",
"hla#2023#HLA-A*02:01+HLA-A*03:01^HLA-B*07:02+HLA-B*08:02^HLA-C*01:02"
)
gl_to_df(glstrings)
#> # A tibble: 2 × 9
#> glstring_index namespace version_or_date A_1 A_2 B_1 B_2 C_1 C_2
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 hla 2023 HLA-A*… NA HLA-… HLA-… HLA-… HLA-…
#> 2 2 hla 2023 HLA-A*… HLA-… HLA-… HLA-… HLA-… NA
# If your GL Strings are in a data frame with some ID'ing columns that you
# want to keep attached, call `gl_to_df()` on the GL String column in your
# data frame:
typing_df <- tidyr::tibble(
id = c("001", "002"),
glstrings = c(
"hla#2023#HLA-A*01:01:01:01+HLA-A*02:07",
"hla#2023#HLA-DRB1*03:15:01:01+HLA-DRB1*04:93"
)
)
typing_df |>
dplyr::mutate(gl_df = gl_to_df(glstrings)) |> # make the data frame
tidyr::unnest(gl_df) # combine with your existing data frame
#> # A tibble: 2 × 9
#> id glstrings glstring_index namespace version_or_date A_1 A_2 DRB1_1
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 001 hla#2023#HL… 1 hla 2023 HLA-… HLA-… NA
#> 2 002 hla#2023#HL… 2 hla 2023 NA NA HLA-D…
#> # ℹ 1 more variable: DRB1_2 <chr>