IPD-IMGT/HLA Database and Genomics

Genome sequence coordinates and references for genes of the HLA system.

The MHC region is the most polymorphic region of the human genome and the level of diversity seen has been described as “hyperpolymorphic” rather than simply polymorphic. The IPD-IMGT/HLA database acts as repository for the allele sequences, and not the SNPs. It is the combination of SNPs that we focus on and not the individual entries.

Within the HLA field the term “allele” refers to the combination of point mutations, insertions and deletion that are seen in a single phased sequence for any given gene. Each allele can therefore be made up of multiple variant positions when compared to a single reference sequence. For this reason HLA results are not traditionally defined by individual SNPs or reported in formats like VCF. The use of HGVS nomenclature and VCF formats is possible for HLA data, but the results can be more complex than for other systems.

This is due to a combination of the number of variants seen, and also the lack of coverage of certain regions in the reference database. Currently XX% of class I alleles have a full-length genomic sequences, with YY% have a full-length CDS sequence available. For class II, where the sequences are longer, and have more complex and difficult to sequence intronic regions the coverage is lower, with XX% of class II alleles have a full-length genomic sequences and YY% have a full-length CDS sequence available.

A search of dbSNP shows 198 entries for HLA-A, this is substantially lower that the thousands of HLA alleles recognised, and knowing that nearly every position of the 546 bps encoded by exons 2 and 3 of HLA-A is polymorphic, an under representative of the true level of variation seen. In order to accurately link all the variation seen in IPD-IMGT/HLA to and third party system, we would need to ensure all variants at all positions were available within the system prior to cross-referencing. Recent estimates suggest that the IPD-IMGT/HLA Database is comprised of over 362,709 distinct nucleotide variants compared to the reference sequences, at 86,902 of the 234,539 curated positions (as of August 2019).

For the reasons outlined above, at the current time, we do not provide any look up tables translating HLA alleles into a list of dBSNP or RS IDs (or VCFs) that can be used to identify variants.

We do provide the following tools which can aid in analysis of genomics data.

The allele report tool, can be used to access a HGVS style description for each allele. This can be found by viewing the allele report and the then following the ‘View HGVS Report” link. The output will detail the sequence variation seen in the required allele, in line with the HGVS recommendations for sequence variant descriptions. The HGVS descriptors will cover both the IPD-IMGT/HLA reference sequence for the cDNA sequence of the gene required as well as the GRC reference GRCh38 NM_002116.7.

An example of a gDNA report between two common alleles is shown below, the report represents how HLA-A*02:01:01:01 would be encoded using HGVS nomenclature.

HGVS Description of the HLA-A*02:01:01:01 allele using GRCh38 HLA-A reference sequence
GRCh38: CM000668.2:g.[6:29942258G>A; 6:29942307A>G; 6:29942400C>T; 6:29942413A>T; 6:29942500T>G; 6:29942510C>T; 6:29942582C>G; 6:29942602C>T; 6:29942653C>G; 6:29942661C>T; 6:29942674G>C; 6:29942681T>G; 6:29942702G>C; 6:29942706G>A; 6:29942720C>G; 6:29942732A>G; 6:29942746G>C; 6:29942751C>T; 6:29942762C>T; 6:29942828C>A; 6:29942924G>T; 6:29942940C>G; 6:29942941A>G; 6:29942954T>A; 6:29942966G>C; 6:29942976G>C; 6:29943044G>C; 6:29943058G>C; 6:29943063C>T; 6:29943116A>G; 6:29943210G>A; 6:29943224T>G; 6:29943243T>C; 6:29943261G>C; 6:29943280A>G; 6:29943287T>G; 6:29943288A>G; 6:29943316G>T; 6:29943338G>A; 6:29943339G>C; 6:29943343G>T; 6:29943378C>A; 6:29943414G>A; 6:29943422T>C; 6:29943431G>A; 6:29943452A>T; 6:29943480T>G; 6:29943643A>G; 6:29943683_6:29943684insT; 6:29943762G>A; 6:29943808_6:29943809_29943810insCACA; 6:29943812C>A; 6:29943909C>T; 6:29943951T>C; 6:29943952C>T; 6:29943959_6:29943960_29943961insCTAGAATTTTCCACGGA; 6:29943964A>G; 6:29944014T>C; 6:29944027G>A; 6:29944051G>A; 6:29944060C>T; 6:29944068T>A; 6:29944103T>G; 6:29944104T>A; 6:29944117C>T; 6:29944119T>A; 6:29944125C>G; 6:29944133G>A; 6:29944136A>G; 6:29944145C>T; 6:29944152C>G; 6:29944154C>T; 6:29944155A>G; 6:29944169G>A; 6:29944194G>A; 6:29944332G>C; 6:29944371C>T; 6:29944412T>C; 6:29944428C>T; 6:29944453delG; 6:29944454delA; 6:29944455delG; 6:29944504T>C; 6:29944557C>T; 6:29944592C>T; 6:29944622A>G; 6:29944641delG; 6:29944675C>A; 6:29944681C>G; 6:29944687A>G; 6:29944697T>C; 6:29944701A>G; 6:29944717G>A; 6:29944718T>C; 6:29944726C>T; 6:29944741T>C; 6:29944819G>A; 6:29944877delA; 6:29944878delG; 6:29944879delA; 6:29944881C>T; 6:29944892T>G; 6:29944893A>G; 6:29944978G>A; 6:29944990A>T; 6:29945020G>A; 6:29945054T>C; 6:29945076T>C; 6:29945080A>T; 6:29945159C>T; 6:29945168T>C; 6:29945266C>T; 6:29945291T>C; 6:29945298A>G; 6:29945302A>G; 6:29945359G>C; 6:29945395G>A; 6:29945415T>C; 6:29945416G>A; 6:29945421T>C]

To help map the entries from the IPD-IMGT/HLA Database to other reference systems, the following table lists for each gene, the position in the GRCh38 sequence which maps to the A of the ATG initiatior methionine, which is labelled as gDNA base 1 in the IPD-IMGT/HLA Database. The table also lists the IPD-IMGT/HLA Database reference allele for each gene, and where known the allele the GRCh38 sequence represents.

Gene
 
Ensembl
version
GRCh38 LocationStart of ATG (Initiator Met)GRCh38 reference alleleIPD-IMGT/HLA reference allele
HFEENSG00000010704.18CM000668.2 (forward)6:26,087,441HFE*001:01:01HFE*001:01:01
HLA-FENSG00000204642.14CM000668.2 (forward)6:29,723,464F*01:03:01:01F*01:01:01:01
HLA-VENSG00000181126.13CM000668.2 (forward)6:29,792,234V*01:01:01:01V*01:01:01:01
HLA-PENSG00000261548.1CM000668.2 (forward)6:29,800,412Not an officially recognised alleleP*01:01:01:01
HLA-GENSG00000204632.11CM000668.2 (forward)6:29,827,845G*01:01:01:01G*01:01:01:01
HLA-HENSG00000206341.7CM000668.2 (forward)6:29,887,752H*02:04H*01:01:01:01
HLA-TENSG00000231130.1CM000668.2 (forward)6:29,896,651T*01:01:01:01T*01:01:01:01
HLA-KENSG00000230795.3CM000668.2 (forward)6:29,926,459K*01:02K*01:01:01:01
HLA-UENSG00000228078.1CM000668.2 (forward)6:29,934,102U*01:01:01:01U*01:01:01:01
HLA-AENSG00000206503.13CM000668.2 (forward)6:29,942,554A*03:01:01:01A*01:01:01:01
HLA-WENSG00000235290.1CM000668.2 (forward)6:29,956,596W*01:01:01:01W*01:01:01:01
HLA-JENSG00000204622.11CM000668.2 (forward)6:30,006,598J*01:01:01:01J*01:01:01:01
HLA-LENSG00000243753.5CM000668.2 (forward)6:30,259,625L*01:01:01:01L*01:01:01:01
HLA-NENSG00000224372.1CM000668.2 (forward)6:30,351,390N*01:01:01:01N*01:01:01:01
HLA-EENSG00000204592.9CM000668.2 (forward)6:30,489,532E*01:03:02:01E*01:01:01:01
HLA-BENSG00000234745.11CM000668.2 (reverse)6:31,269,521B*07:02:01:01B*07:02:01:01
HLA-CENSG00000204525.16CM000668.2 (reverse)6:31,268,808C*07:02:01:01C*01:02:01:01
HLA-SENSG00000225851.1CM000668.2 (reverse)6:31,381,942S*01:02:01:01S*01:01:01:01
MICAENSG00000204520.13CM000668.2 (forward)6:31,403,633MICA*008:04MICA*001
MICBENSG00000204516.10CM000668.2 (forward)6:31,498,194MICB*004:01:01MICB*001
HLA-DRAENSG00000204287.14CM000668.2 (forward)6:32,439,951DRA*01:02:03DRA*01:01:01:01
HLA-DRB3ENSG00000196101.9GL000255.2 (reverse)6:32,449,828DRB3*02:02:01:01DRB1*01:01:02:01
HLA-DRB9ENSG00000196301.3CM000668.2 (reverse)6:32,473,232DRB9*01:01:01:01DRB9*01:01:01:01
HLA-DRB2ENSG00000227442.1GL000255.2 (reverse)6:32,488,465DRB2*01:01DRB2*01:01
HLA-DRB4ENSG00000227357.2GL000256.2 (reverse)6:32,542,661DRB4*01:03:01:01DRB4*01:01:01:01
HLA-DRB5ENSG00000198502.6CM000668.2 (reverse)6:32,517,416DRB5*01:01:01:01DRB5*01:01:01:01
HLA-DRB6ENSG00000229391.70CM000668.2 (reverse)6:32,554,549Not an officially recognised alleleDRB6*01:01
HLA-DRB1ENSG00000196126.11CM000668.2 (reverse)6:32,578,875DRB1*15:03:01:01DRB1*01:01:01:01
HLA-DRB8ENSG00000233697.2GL000256.2 (reverse)6:32,589,215DRB8*01:01DRB8*01:01
HLA-DRB7ENSG00000227099.1GL000256.2 (reverse)6:32,627,102Not an officially recognised alleleDRB7*01:01:01
HLA-DQA1ENSG00000196735.12CM000668.2 (forward)6:32,637,459DQA1*01:02:01:01DQA1*01:01:01:01
HLA-DQB1ENSG00000179344.16CM000668.2 (reverse)6:32,661,243DQB1*06:02:01:01DQB1*05:01:01:01
HLA-DQA2ENSG00000237541.4CM000668.2 (forward)6:32,741,444DQA2*01:01:01:01DQA2*01:01:01:01
HLA-DOBENSG00000241106.8CM000668.2 (reverse)6:32,816,278DOB*01:01:01:01DOB*01:01:01:01
TAP2ENSG00000204267.15CM000668.2 (reverse)6:32,822,370TAP2*02:01:02:01TAP2*01:01:01
TAP1ENSG00000168394.11CM000668.2 (reverse)6:32,845,551TAP1*01:01:01:01TAP1*01:01:01:01
HLA-DMBENSG00000242574.9CM000668.2 (reverse)6:32,934,850DMB*01:03:01:01DMB*01:01:01:01
HLA-DMAENSG00000204257.15CM000668.2 (reverse)6:32,964,671DMA*01:01:01:01DMA*01:01:01:01
HLA-DOAENSG00000204252.14CM000668.2 (reverse)6:33,004,237DOA*01:01:02:01DOA*01:01:01
HLA-DPA1ENSG00000231389.7CM000668.2 (reverse)6:33,071,774DPA1*01:03:01:01DPA1*01:03:01:01
HLA-DPB1ENSG00000223865.11CM000668.2 (forward)6:33,076,042DPB1*04:01:01:01DPB1*01:01:01:01
HLA-DPA2ENSG00000231461.1CM000668.2 (reverse)6:33,091,753DPA2*01:01:01:01DPA2*01:01:01:01
HLA-DPB2ENSG00000224557.7CM000668.2 (forward)6:33,112,516DPB2*03:01:01:01DPB2*01:01:01
HLA-YNot availableNot availableNot availableNot availableY*01:01