IPD-IMGT/HLA Database and Genomics
Genome sequence coordinates and references for genes of the HLA system.
The MHC region is the most polymorphic region of the human genome and the level of diversity seen has been described as “hyperpolymorphic” rather than simply polymorphic. The IPD-IMGT/HLA database acts as repository for the allele sequences, and not the SNPs. It is the combination of SNPs that we focus on and not the individual entries.
Within the HLA field the term “allele” refers to the combination of point mutations, insertions and deletion that are seen in a single phased sequence for any given gene. Each allele can therefore be made up of multiple variant positions when compared to a single reference sequence. For this reason HLA results are not traditionally defined by individual SNPs or reported in formats like VCF. The use of HGVS nomenclature and VCF formats is possible for HLA data, but the results can be more complex than for other systems.
This is due to a combination of the number of variants seen, and also the lack of coverage of certain regions in the reference database. Currently XX% of class I alleles have a full-length genomic sequences, with YY% have a full-length CDS sequence available. For class II, where the sequences are longer, and have more complex and difficult to sequence intronic regions the coverage is lower, with XX% of class II alleles have a full-length genomic sequences and YY% have a full-length CDS sequence available.A search of dbSNP shows 198 entries for HLA-A, this is substantially lower that the thousands of HLA alleles recognised, and knowing that nearly every position of the 546 bps encoded by exons 2 and 3 of HLA-A is polymorphic, an under representative of the true level of variation seen. In order to accurately link all the variation seen in IPD-IMGT/HLA to and third party system, we would need to ensure all variants at all positions were available within the system prior to cross-referencing. Recent estimates suggest that the IPD-IMGT/HLA Database is comprised of over 362,709 distinct nucleotide variants compared to the reference sequences, at 86,902 of the 234,539 curated positions (as of August 2019).
For the reasons outlined above, at the current time, we do not provide any look up tables translating HLA alleles into a list of dBSNP or RS IDs (or VCFs) that can be used to identify variants.
We do provide the following tools which can aid in analysis of genomics data.
The allele report tool, can be used to access a HGVS style description for each allele. This can be found by viewing the allele report and the then following the ‘View HGVS Report” link. The output will detail the sequence variation seen in the required allele, in line with the HGVS recommendations for sequence variant descriptions. The HGVS descriptors will cover both the IPD-IMGT/HLA reference sequence for the cDNA sequence of the gene required as well as the GRC reference GRCh38 NM_002116.7.
An example of a gDNA report between two common alleles is shown below, the report represents how HLA-A*02:01:01:01 would be encoded using HGVS nomenclature.
HGVS Description of the HLA-A*02:01:01:01 allele using GRCh38 HLA-A reference sequence | |||
---|---|---|---|
GRCh38: CM000668.2:g.[6:29942258G>A; 6:29942307A>G; 6:29942400C>T; 6:29942413A>T; 6:29942500T>G; 6:29942510C>T; 6:29942582C>G; 6:29942602C>T; 6:29942653C>G; 6:29942661C>T; 6:29942674G>C; 6:29942681T>G; 6:29942702G>C; 6:29942706G>A; 6:29942720C>G; 6:29942732A>G; 6:29942746G>C; 6:29942751C>T; 6:29942762C>T; 6:29942828C>A; 6:29942924G>T; 6:29942940C>G; 6:29942941A>G; 6:29942954T>A; 6:29942966G>C; 6:29942976G>C; 6:29943044G>C; 6:29943058G>C; 6:29943063C>T; 6:29943116A>G; 6:29943210G>A; 6:29943224T>G; 6:29943243T>C; 6:29943261G>C; 6:29943280A>G; 6:29943287T>G; 6:29943288A>G; 6:29943316G>T; 6:29943338G>A; 6:29943339G>C; 6:29943343G>T; 6:29943378C>A; 6:29943414G>A; 6:29943422T>C; 6:29943431G>A; 6:29943452A>T; 6:29943480T>G; 6:29943643A>G; 6:29943683_6:29943684insT; 6:29943762G>A; 6:29943808_6:29943809_29943810insCACA; 6:29943812C>A; 6:29943909C>T; 6:29943951T>C; 6:29943952C>T; 6:29943959_6:29943960_29943961insCTAGAATTTTCCACGGA; 6:29943964A>G; 6:29944014T>C; 6:29944027G>A; 6:29944051G>A; 6:29944060C>T; 6:29944068T>A; 6:29944103T>G; 6:29944104T>A; 6:29944117C>T; 6:29944119T>A; 6:29944125C>G; 6:29944133G>A; 6:29944136A>G; 6:29944145C>T; 6:29944152C>G; 6:29944154C>T; 6:29944155A>G; 6:29944169G>A; 6:29944194G>A; 6:29944332G>C; 6:29944371C>T; 6:29944412T>C; 6:29944428C>T; 6:29944453delG; 6:29944454delA; 6:29944455delG; 6:29944504T>C; 6:29944557C>T; 6:29944592C>T; 6:29944622A>G; 6:29944641delG; 6:29944675C>A; 6:29944681C>G; 6:29944687A>G; 6:29944697T>C; 6:29944701A>G; 6:29944717G>A; 6:29944718T>C; 6:29944726C>T; 6:29944741T>C; 6:29944819G>A; 6:29944877delA; 6:29944878delG; 6:29944879delA; 6:29944881C>T; 6:29944892T>G; 6:29944893A>G; 6:29944978G>A; 6:29944990A>T; 6:29945020G>A; 6:29945054T>C; 6:29945076T>C; 6:29945080A>T; 6:29945159C>T; 6:29945168T>C; 6:29945266C>T; 6:29945291T>C; 6:29945298A>G; 6:29945302A>G; 6:29945359G>C; 6:29945395G>A; 6:29945415T>C; 6:29945416G>A; 6:29945421T>C] |
To help map the entries from the IPD-IMGT/HLA Database to other reference systems, the following table lists for each gene, the position in the GRCh38 sequence which maps to the A of the ATG initiatior methionine, which is labelled as gDNA base 1 in the IPD-IMGT/HLA Database. The table also lists the IPD-IMGT/HLA Database reference allele for each gene, and where known the allele the GRCh38 sequence represents.
Gene | Ensembl version | GRCh38 Location | Start of ATG (Initiator Met) | GRCh38 reference allele | IPD-IMGT/HLA reference allele |
---|---|---|---|---|---|
HFE | ENSG00000010704.18 | CM000668.2 (forward) | 6:26,087,441 | HFE*001:01:01 | HFE*001:01:01 |
HLA-F | ENSG00000204642.14 | CM000668.2 (forward) | 6:29,723,464 | F*01:03:01:01 | F*01:01:01:01 |
HLA-V | ENSG00000181126.13 | CM000668.2 (forward) | 6:29,792,234 | V*01:01:01:01 | V*01:01:01:01 |
HLA-P | ENSG00000261548.1 | CM000668.2 (forward) | 6:29,800,412 | Not an officially recognised allele | P*01:01:01:01 |
HLA-G | ENSG00000204632.11 | CM000668.2 (forward) | 6:29,827,845 | G*01:01:01:01 | G*01:01:01:01 |
HLA-H | ENSG00000206341.7 | CM000668.2 (forward) | 6:29,887,752 | H*02:04 | H*01:01:01:01 |
HLA-T | ENSG00000231130.1 | CM000668.2 (forward) | 6:29,896,651 | T*01:01:01:01 | T*01:01:01:01 |
HLA-K | ENSG00000230795.3 | CM000668.2 (forward) | 6:29,926,459 | K*01:02 | K*01:01:01:01 |
HLA-U | ENSG00000228078.1 | CM000668.2 (forward) | 6:29,934,102 | U*01:01:01:01 | U*01:01:01:01 |
HLA-A | ENSG00000206503.13 | CM000668.2 (forward) | 6:29,942,554 | A*03:01:01:01 | A*01:01:01:01 |
HLA-W | ENSG00000235290.1 | CM000668.2 (forward) | 6:29,956,596 | W*01:01:01:01 | W*01:01:01:01 |
HLA-J | ENSG00000204622.11 | CM000668.2 (forward) | 6:30,006,598 | J*01:01:01:01 | J*01:01:01:01 |
HLA-L | ENSG00000243753.5 | CM000668.2 (forward) | 6:30,259,625 | L*01:01:01:01 | L*01:01:01:01 |
HLA-N | ENSG00000224372.1 | CM000668.2 (forward) | 6:30,351,390 | N*01:01:01:01 | N*01:01:01:01 |
HLA-E | ENSG00000204592.9 | CM000668.2 (forward) | 6:30,489,532 | E*01:03:02:01 | E*01:01:01:01 |
HLA-B | ENSG00000234745.11 | CM000668.2 (reverse) | 6:31,269,521 | B*07:02:01:01 | B*07:02:01:01 |
HLA-C | ENSG00000204525.16 | CM000668.2 (reverse) | 6:31,268,808 | C*07:02:01:01 | C*01:02:01:01 |
HLA-S | ENSG00000225851.1 | CM000668.2 (reverse) | 6:31,381,942 | S*01:02:01:01 | S*01:01:01:01 |
MICA | ENSG00000204520.13 | CM000668.2 (forward) | 6:31,403,633 | MICA*008:04 | MICA*001 |
MICB | ENSG00000204516.10 | CM000668.2 (forward) | 6:31,498,194 | MICB*004:01:01 | MICB*001 |
HLA-DRA | ENSG00000204287.14 | CM000668.2 (forward) | 6:32,439,951 | DRA*01:02:03 | DRA*01:01:01:01 |
HLA-DRB3 | ENSG00000196101.9 | GL000255.2 (reverse) | 6:32,449,828 | DRB3*02:02:01:01 | DRB1*01:01:02:01 |
HLA-DRB9 | ENSG00000196301.3 | CM000668.2 (reverse) | 6:32,473,232 | DRB9*01:01:01:01 | DRB9*01:01:01:01 |
HLA-DRB2 | ENSG00000227442.1 | GL000255.2 (reverse) | 6:32,488,465 | DRB2*01:01 | DRB2*01:01 |
HLA-DRB4 | ENSG00000227357.2 | GL000256.2 (reverse) | 6:32,542,661 | DRB4*01:03:01:01 | DRB4*01:01:01:01 |
HLA-DRB5 | ENSG00000198502.6 | CM000668.2 (reverse) | 6:32,517,416 | DRB5*01:01:01:01 | DRB5*01:01:01:01 |
HLA-DRB6 | ENSG00000229391.70 | CM000668.2 (reverse) | 6:32,554,549 | Not an officially recognised allele | DRB6*01:01 |
HLA-DRB1 | ENSG00000196126.11 | CM000668.2 (reverse) | 6:32,578,875 | DRB1*15:03:01:01 | DRB1*01:01:01:01 |
HLA-DRB8 | ENSG00000233697.2 | GL000256.2 (reverse) | 6:32,589,215 | DRB8*01:01 | DRB8*01:01 |
HLA-DRB7 | ENSG00000227099.1 | GL000256.2 (reverse) | 6:32,627,102 | Not an officially recognised allele | DRB7*01:01:01 |
HLA-DQA1 | ENSG00000196735.12 | CM000668.2 (forward) | 6:32,637,459 | DQA1*01:02:01:01 | DQA1*01:01:01:01 |
HLA-DQB1 | ENSG00000179344.16 | CM000668.2 (reverse) | 6:32,661,243 | DQB1*06:02:01:01 | DQB1*05:01:01:01 |
HLA-DQA2 | ENSG00000237541.4 | CM000668.2 (forward) | 6:32,741,444 | DQA2*01:01:01:01 | DQA2*01:01:01:01 |
HLA-DOB | ENSG00000241106.8 | CM000668.2 (reverse) | 6:32,816,278 | DOB*01:01:01:01 | DOB*01:01:01:01 |
TAP2 | ENSG00000204267.15 | CM000668.2 (reverse) | 6:32,822,370 | TAP2*02:01:02:01 | TAP2*01:01:01 |
TAP1 | ENSG00000168394.11 | CM000668.2 (reverse) | 6:32,845,551 | TAP1*01:01:01:01 | TAP1*01:01:01:01 |
HLA-DMB | ENSG00000242574.9 | CM000668.2 (reverse) | 6:32,934,850 | DMB*01:03:01:01 | DMB*01:01:01:01 |
HLA-DMA | ENSG00000204257.15 | CM000668.2 (reverse) | 6:32,964,671 | DMA*01:01:01:01 | DMA*01:01:01:01 |
HLA-DOA | ENSG00000204252.14 | CM000668.2 (reverse) | 6:33,004,237 | DOA*01:01:02:01 | DOA*01:01:01 |
HLA-DPA1 | ENSG00000231389.7 | CM000668.2 (reverse) | 6:33,071,774 | DPA1*01:03:01:01 | DPA1*01:03:01:01 |
HLA-DPB1 | ENSG00000223865.11 | CM000668.2 (forward) | 6:33,076,042 | DPB1*04:01:01:01 | DPB1*01:01:01:01 |
HLA-DPA2 | ENSG00000231461.1 | CM000668.2 (reverse) | 6:33,091,753 | DPA2*01:01:01:01 | DPA2*01:01:01:01 |
HLA-DPB2 | ENSG00000224557.7 | CM000668.2 (forward) | 6:33,112,516 | DPB2*03:01:01:01 | DPB2*01:01:01 |
HLA-Y | Not available | Not available | Not available | Not available | Y*01:01 |