Recent technology advances in genome sequencing and whole genome profiling have facilitated the generation of massive amounts of multi-layer omic data. Sequencing a human genome will soon become a common task. A new challenge is how to make sense of such a huge amount of data especially when thousands (or even millions) of individual genomes are being explored. This is a bottleneck in understanding our genomes, eg. the molecular mechanisms of genetic diseases, and make use of this knowledge to better our health and wellness. Deriving biological insights from high throughput data sets requires data intensive and sophisticated analyses in which methods in data science driven by biological domain expertise are the key tools.

I am broadly interested in understanding biological systems using high throughput data. My current research agenda focuses on understanding the human cancer genomes via the analysis of large-scale omic datasets and utilizing what we learn from this to track disease progression, including:

    • Understanding tumor heterogeneity and evolution: Using whole genome, whole exome, and targeted DNA sequencing of temporal, multi-regional, intra- and inter-patient, and xenograft tumors, we seek to understand how cancer evolves, metastasizes, and resists to therapies [1-4].

    • Discovering and characterization of molecular determinants and biomarkers of advanced cancers: Through integrative analysis of genome, transcriptome, and epigenome data, we discovered key alterations involving in metastatic and resistant cancers. These include the discovery of AR enhancer in >80% of metastatic castration-resistant prostate cancer patients [5,6], the discovery and characterization of several lncRNAs as drivers/biomarkers of tumor progression, metastasis and patient outcome in colorectal, lung, and prostate cancers [7-11].

    • Non-invasive monitoring of cancer progression: Using the information gained from multi-omic analysis of cancer patients we designed sequencing assays and analyses to monitor biomarkers of disease progression and patient outcome in cell-free DNA. We aim to use the readouts from these non-invasive assays to predict tumor evolution, patient response to therapies and their prognosis [12].

Once in a while, I created methods/tools to solve challenges that are beyond the capability of existing analytical approaches. Here are tools/databases that I have developed:

    • ClonEvol: Inferring and visualizing clonal evolution in multi-sample cancer sequencing [1]

    • SV-HotSpot: detection and visualization of hotspots targeted by structural variants associated with gene expression [6]

    • Allerdictor: a fast and accurate sequence-based allergen prediction tool employing text classification approach with support vector machine [13]

    • The Alternaria Genomes Database: a comprehensive resource for a fungal genus comprised of saprophytes, plant pathogens, and allergenic species [14]


My most updated list of scientific publications can be found on my Google Scholar Profile or my PubMed collection.

  1. Dang HX, White BS, Foltz SM, Miller CA, Luo J, Fields RC, Maher CA. ClonEvol: clonal ordering and visualization in cancer sequencing. Ann Oncol. 2017 Dec 1;28(12):3076-3082. doi: 10.1093/annonc/mdx517. PubMed PMID: 28950321; PubMed Central PMCID: PMC5834020.

  2. Dang HX, Krasnick BA, White BS, Grossman JG, Strand MS, Zhang J, Cabanski CR, Miller CA, Fulton RS, Goedegebuure SP, Fronick CC, Griffith M, Larson DE, Goetz BD, Walker JR, Hawkins WG, Strasberg SM, Linehan DC, Lim KH, Lockhart AC, Mardis ER, Wilson RK, Ley TJ, Maher CA, Fields RC. The clonal evolution of metastatic colorectal cancer. Sci Adv. 2020 Jun;6(24):eaay9691. doi: 10.1126/sciadv.aay9691. eCollection 2020 Jun. PubMed PMID: 32577507; PubMed Central PMCID: PMC7286679.

  3. Griffith M, Miller CA, Griffith OL, Krysiak K, Skidmore ZL, Ramu A, Walker JR, Dang HX, Trani L, Larson DE, Demeter RT, Wendl MC, McMichael JF, Austin RE, Magrini V, McGrath SD, Ly A, Kulkarni S, Cordes MG, Fronick CC, Fulton RS, Maher CA, Ding L, Klco JM, Mardis ER, Ley TJ, Wilson RK. Optimizing cancer genome sequencing and analysis. Cell Syst. 2015 Sep 23;1(3):210-223. doi: 10.1016/j.cels.2015.08.015. PubMed PMID: 26645048; PubMed Central PMCID: PMC4669575.

  4. Miller CA, McMichael J, Dang HX, Maher CA, Ding L, Ley TJ, Mardis ER, Wilson RK. Visualizing tumor evolution with the fishplot package for R. BMC Genomics. 2016 Nov 7;17(1):880. doi: 10.1186/s12864-016-3195-z. PubMed PMID: 27821060; PubMed Central PMCID: PMC5100182.

  5. Quigley DA, Dang HX, Zhao SG, Lloyd P, Aggarwal R, Alumkal JJ, Foye A, Kothari V, Perry MD, Bailey AM, Playdle D, Barnard TJ, Zhang L, Zhang J, Youngren JF, Cieslik MP, Parolia A, Beer TM, Thomas G, Chi KN, Gleave M, Lack NA, Zoubeidi A, Reiter RE, Rettig MB, Witte O, Ryan CJ, Fong L, Kim W, Friedlander T, Chou J, Li H, Das R, Li H, Moussavi-Baygi R, Goodarzi H, Gilbert LA, Lara PN Jr, Evans CP, Goldstein TC, Stuart JM, Tomlins SA, Spratt DE, Cheetham RK, Cheng DT, Farh K, Gehring JS, Hakenberg J, Liao A, Febbo PG, Shon J, Sickler B, Batzoglou S, Knudsen KE, He HH, Huang J, Wyatt AW, Dehm SM, Ashworth A, Chinnaiyan AM, Maher CA, Small EJ, Feng FY. Genomic Hallmarks and Structural Variation in Metastatic Prostate Cancer. Cell. 2018 Jul 26;174(3):758-769.e9. doi: 10.1016/j.cell.2018.06.039. Epub 2018 Jul 19. PubMed PMID: 30033370; PubMed Central PMCID: PMC6425931.

  6. Eteleeb AM, Quigley DA, Zhao SG, Pham D, Yang R, Dehm SM, Luo J, Feng FY, Dang HX, Maher CA. SV-HotSpot: detection and visualization of hotspots targeted by structural variants associated with gene expression. Sci Rep. 2020 Sep 28;10(1):15890. doi: 10.1038/s41598-020-71168-7. PubMed PMID: 32985524; PubMed Central PMCID: PMC7522247.

  7. White NM, Cabanski CR, Silva-Fisher JM, Dang HX, Govindan R, Maher CA. Transcriptome sequencing reveals altered long intergenic non-coding RNAs in lung cancer. Genome Biol. 2014 Aug 13;15(8):429. doi: 10.1186/s13059-014-0429-8. PubMed PMID: 25116943; PubMed Central PMCID: PMC4156652.

  8. Cabanski CR, White NM, Dang HX, Silva-Fisher JM, Rauck CE, Cicka D, Maher CA. Pan-cancer transcriptome analysis reveals long noncoding RNAs with conserved function. RNA Biol. 2015;12(6):628-42. doi: 10.1080/15476286.2015.1038012. PubMed PMID: 25864709; PubMed Central PMCID: PMC4615893.

  9. White NM, Zhao SG, Zhang J, Rozycki EB, Dang HX, McFadden SD, Eteleeb AM, Alshalalfa M, Vergara IA, Erho N, Arbeit JM, Karnes RJ, Den RB, Davicioni E, Maher CA. Multi-institutional Analysis Shows that Low PCAT-14 Expression Associates with Poor Outcomes in Prostate Cancer. Eur Urol. 2017 Feb;71(2):257-266. doi: 10.1016/j.eururo.2016.07.012. Epub 2016 Jul 22. PubMed PMID: 27460352.

  10. Silva-Fisher JM, Dang HX, White NM, Strand MS, Krasnick BA, Rozycki EB, Jeffers GGL, Grossman JG, Highkin MK, Tang C, Cabanski CR, Eteleeb A, Mudd J, Goedegebuure SP, Luo J, Mardis ER, Wilson RK, Ley TJ, Lockhart AC, Fields RC, Maher CA. Long non-coding RNA RAMS11 promotes metastatic colorectal cancer progression. Nat Commun. 2020 May 1;11(1):2156. doi: 10.1038/s41467-020-15547-8. PubMed PMID: 32358485; PubMed Central PMCID: PMC7195452.

  11. Dang HX, White NM, Rozycki EB, Felsheim BM, Watson MA, Govindan R, Luo J, Maher CA. Long non-coding RNA LCAL62 / LINC00261 is associated with lung adenocarcinoma prognosis. Heliyon. 2020 Mar;6(3):e03521. doi: 10.1016/j.heliyon.2020.e03521. eCollection 2020 Mar. PubMed PMID: 32181394; PubMed Central PMCID: PMC7062942.

  12. Dang HX, Chauhan PS, Ellis H, Feng W, Harris PK, Smith G, Qiao M, Dienstbach K, Beck R, Atkocius A, Qaium F, Luo J, Michalski JM, Picus J, Pachynski RK, Maher CA, Chaudhuri AA. Cell-free DNA alterations in the AR enhancer and locus predict resistance to AR-directed therapy in patients with metastatic prostate cancer. JCO Precis Oncol. 2020;4:680-713. doi: 10.1200/po.20.00047. Epub 2020 Jun 18. PubMed PMID: 32903952; PubMed Central PMCID: PMC7446541.

  13. Dang HX, Lawrence CB. Allerdictor: fast allergen prediction using text classification techniques. Bioinformatics. 2014 Apr 15;30(8):1120-1128. doi: 10.1093/bioinformatics/btu004. Epub 2014 Jan 7. PubMed PMID: 24403538; PubMed Central PMCID: PMC3982160.

  14. Dang HX, Pryor B, Peever T, Lawrence CB. The Alternaria genomes database: a comprehensive resource for a fungal genus comprised of saprophytes, plant pathogens, and allergenic species. BMC Genomics. 2015 Mar 25;16:239. doi: 10.1186/s12864-015-1430-7. PubMed PMID: 25887485; PubMed Central PMCID: PMC4387663.