Current Projects:
HLA Disease Association for Hematological Diseases:
-
The HLA genotype a person carries affects the adaptive immune response through antigen presentation to T cells. The polymorphic HLA region contains certain genetic variants that are associated with an increased or decreased disease risk, thereby providing clues about disease susceptibility and immune function (9). To perform disease association studies, data sets with HLA-typed cases and controls are used to identify variants that are associated with disease at a statistically significant level. One such data set is from the National Marrow Donor Program (NMDP). It contains HLA typing for patients with hematological diseases who might need bone marrow transplants and healthy bone marrow donor volunteers. Its advantages include a relatively large size compared to data sets used in similar studies, which allows greater statistical power when identifying associations, and its inclusion of genotyping for 9 HLA loci, which allows previously unidentified HLA associations to be found, since the majority of previous disease association studies do not include HLA typing for 9 loci. This data set has already been used to find HLA associations for chronic lymphocytic leukemia (6) and multiple myeloma (1). We hypothesized that HLA associations could be found using this data set for two additional hematological diseases: severe aplastic anemia (SAA) (3, 7, 12) and non-Hodgkin lymphoma (NHL) (2, 11). In addition, we hypothesized that the methods for data analysis and results interpretation that we developed while studying SAA and NHL would improve upon existing disease association study strategies.
We are currently working with two data subsets from the NMDP containing imputed 9-locus haplotypes for HLA loci A, C, B, DRB3/4/5, DRB1, DQA1, DQB1, DPA1, and DPB1. Comprehensive disease association studies for SAA and NHL including these 9 loci have not been done before. The data subsets include HLA typing for four populations with the following case numbers for SAA/NHL respectively: African Americans (623/1,271); Asians & Pacific Islanders (372/486); Caucasians (2,755/11,848); and Hispanics (478/735). From healthy donor volunteers, we selected 50,000 controls for each population matched for ethnicity, age, gender, and geography. So far, we have accomplished the following tasks while pursuing our goal of identifying disease associations for SAA and NHL:
• We processed the 9-locus imputation data to create input files to analyze in a disease association pipeline that finds associated HLA variants by performing logistic regression.
• We rewrote the logistic regression algorithm to calculate reliable p-values and odds ratios for associated HLA variants and developed a method to perform conditional analysis (5) to evaluate associated individual alleles in the context of asymmetric linkage disequilibrium (ALD).
• We created haplotype-specific homozygosity (HSF) plot and alluvial plot visualizations to interpret ALD patterns in the HLA region for HLA haplotypes and alleles associated with disease (8, 4).
• We developed a strategy to extrapolate two-field HLA alleles from SNP data, which allowed us to perform meta-analyses combining GWAS studies that impute HLA alleles from SNPs with targeted HLA genotyping from the NMDP data sets.
• We developed a method for comparing HLA allele frequencies among different populations to determine when an incorrect ethnicity assignment has been made, which allowed us to identify false associations.
Future directions include finalizing the disease association pipeline and finding associated HLA variants for SAA and NHL using the 9-locus HLA data. We anticipate challenges interpreting our results due to linkage disequilibrium in the HLA region and interactions among associated HLA variants that might affect disease risk. We plan to address these challenges by using statistical analysis (conditional analysis and factor analysis) and visualization tools (HSF plots and alluvial plots). Accomplishing these goals will involve several related projects:
• Expand the HLA variants analyzed by the pipeline and perform a complete run-through of the pipeline on the supercomputer for the SAA and NHL data. For the initial analysis, partial haplotypes will be restricted to those occurring within the Class I and Class II loci respectively. Based on the initial analysis, additional partial haplotypes will be analyzed to obtain a more comprehensive picture of how Class I and Class II alleles influence disease risk.
• Analyze disease association results using factor analysis and conditional analysis. The pipeline already performs factor analysis, but groupings must be evaluated critically to assess their validity, especially if many HLA variants are analyzed. We wrote a script to perform automated conditional analysis, but it is limited by time constraints on the supercomputer, which restricts the runtime to 24 hours for each job when using normal quality of service (QOS).
• Validate our methods for obtaining disease association results by re-analyzing previous GWAS studies where HLA was imputed from SNPs. We hope to provide clarity about conflicting results from previous studies and show that our pipeline represents a reliable strategy for identifying HLA disease associations.
• Demonstrate how to use visualizations of linkage disequilibrium patterns (HSF plots and alluvial plots) to interpret disease association study results, specifically for Class I vs. Class II associations, which can have synergistic or antagonistic effects.
• Investigate RNA-seq data in the context of HLA disease association studies, which will provide information about allele-specific expression. Allele-specific expression can be altered before, during, or after disease, and affects the immune response (10).
• Consolidate the disease association informatics pipeline into a software package that can be used by other researchers and applied to other diseases. The pipeline will provide access to strategies not universally used in disease association studies, including factor analysis, conditional analysis, and visualization tools.
This research will improve knowledge about disease etiology for SAA and NHL by identifying which HLA variants affect disease risk in diverse populations. This will allow us to determine future directions for researching HLA’s role in these two diseases. Specifically, we expect:
• To identify comprehensive HLA disease associations from 9-locus data for SAA and NHL for diverse HLA variant categories stratified by age and gender.
• To demonstrate agreement between our results and previous studies and reveal limitations of SNP-based imputation by imputing HLA alleles from SNP data and comparing disease association study results.
• To gain insight about how Class I and Class II haplotype blocks can interactively affect disease risk.
• To provide information about why associated alleles differ among populations.
• To produce an updated disease association pipeline that could be used for analyzing other diseases with appropriate checks to minimize false positives and false negatives.
• To provide greater insight into the effects of asymmetric linkage disequilibrium by analyzing our results using novel visualization and statistical analysis methods, i.e., alluvial plots and conditional analysis.
Selective bibliography:
1 Beksac, M. et al. HLA polymorphism and risk of multiple myeloma. Leukemia 30, 2260-2264 (2016). https://doi.org:10.1038/leu.2016.199
2 Choi, H.-B. et al. Association of HLA alleles with non-Hodgkin’s lymphoma in Korean population. International Journal of Hematology 87, 203-209 (2008). https://doi.org:10.1007/s12185-008-0040-4
3 Deng, X.-Z. et al. Associations between the HLA-A/B/DRB1 polymorphisms and aplastic anemia: evidence from 17 case–control studies. Hematology 23, 154-162 (2018). https://doi.org:10.1080/10245332.2017.1375064
4 Dribus, M., Gragert, L., & Maiers, M. Visualizing haplotype diversity using alluvial plots. Human Immunology 82, 22-23 (2021).
5 Eike, M. C., Becker, T., Humphreys, K., Olsson, M. & Lie, B. A. Conditional analyses on the T1DGC MHC data sets: novel associations with type 1 diabetes around HLA-G and confirmation of HLA-B. Genes & Immunity 10, 56-67 (2009). https://doi.org:10.1038/gene.2008.74
6 Gragert, L. et al. Fine-mapping of HLA associations with chronic lymphocytic leukemia in US populations. Blood 124, 2657-2665 (2014). https://doi.org:10.1182/blood-2014-02-558767
7 Savage, S. A. et al. Genome-wide Association Study Identifies HLA-DPB1 as a Significant Risk Factor for Severe Aplastic Anemia. The American Journal of Human Genetics 106, 264-271 (2020). https://doi.org:https://doi.org/10.1016/j.ajhg.2020.01.004
8 Single, R. M. et al. Asymmetric linkage disequilibrium: Tools for assessing multiallelic LD. Human Immunology 77, 288-294 (2016). https://doi.org:https://doi.org/10.1016/j.humimm.2015.09.001
9 Trowsdale, J. & Knight, J. C. Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet 14, 301-323 (2013). https://doi.org:10.1146/annurev-genom-091212-153455
10 van der Meeren, L. et al. Combined loss of HLA I and HLA II expression is more common in the non-GCB type of diffuse large B cell lymphoma. Histopathology (2018).
11 Wang, S. S. et al. Human leukocyte antigen class I and II alleles in non-Hodgkin lymphoma etiology. Blood 115, 4820-4823 (2010). https://doi.org:10.1182/blood-2010-01-266775
12 Zaimoku, Y. et al. HLA associations, somatic loss of HLA expression, and clinical outcomes in immune aplastic anemia. Blood 138, 2799-2809 (2021). https://doi.org:10.1182/blood.2021012895
Alluvial Plot Web App for Visualizing HLA Haplotypes:
-
Alluvial plots can be used to display HLA haplotype frequencies in different populations. The alluvial plot web app provides an easy-to-use tool that can assist in the interpretation of disease association study results by displaying 9-locus haplotypes that carry specific alleles or partial haplotypes in selected populations.
For more information about using alluvial plots to interpret HLA disease association study results, please see the “Presentations” section.