Human rs75776403 polymorphism links differential phenotypic and clinical outcomes to a CLEC18A p.T151M-driven multiomics

Human traits, diseases susceptibility, and clinical outcomes vary hugely among individuals. Despite a fundamental understanding of genetic (or environmental) contributions, the detailed mechanisms of how genetic variation impacts molecular or cellular behaviours of a gene, and subsequently leads to such variability remain poorly understood. Here, in addition to phenome-wide correlations, we leveraged multiomics to exploit mechanistic links, from genetic polymorphism to protein structural or functional changes and a cross-omics perturbation landscape of a germline variant. We identified a missense cis-acting expression quantitative trait locus in CLEC18A (rs75776403) in which the altered residue (T151→M151) disrupts the lipid-binding ability of the protein domain. The altered allele carriage led to a metabolic and proliferative shift, as well as immune deactivation, therefore determines human anthropometrics (body height), kidney, and hematological traits. Collectively, we uncovered genetic pleiotropy in human complex traits and diseases via CLEC18A rs75776403-regulated pathways.


Background
Insights acquired from genetic association studies have greatly advanced our understanding of the biology of human traits and the pathophysiology of numerous complex diseases. Combining those studies with pathway analyses has further shed light on the underlying mechanisms that drive traits and/or disease variability among individuals. Due to the statistical limitation of such genetic epidemiological methodologies, it is still a big challenge to parse the molecular and cellular processes that link obscure genetic variation to explicit phenotypic changes. Variability in gene expression imposed by germline variation has been revealed by the identification of cis-acting expression quantitative trait locus (cis-eQTL). Moreover, gene set enrichment analysis permits a multigene (horizontal) view of the occurrence of human traits. Despite this, a full and integrative links (multiomics or vertical view) from genetic changes to subcellular molecular features (transcriptional or translational products, phosphorylation sites, metabolites, and biological pathways), and ultimately to phenotypic changes remains a formidable task.
Inferring the functions of CLEC18 genes can be achieved via inspecting the polymorphic amino acid in protein domains, and studying the influence of genetic variant on protein structure and functional change. For examples, the polymorphic residues at the CRD domain of CLEC18A (S339) and CLEC18A-1 (R339), and the p.405LVWLSAAMG insertion in CLEC18B CRD domain resulted in the loss of glycan-binding affinity [1]. In this study, we aimed to further identify crucial genetic polymorphism(s) among the CLEC18 gene family in humans. In addition to in silico prediction and domain structural modelling, we further validate the impacts of variants on CLEC18 genes by lipid-binding assay. Moreover, variant-associated profiles and the enriched pathways of multiomics, including phenomes, transcriptomes, proteomes, metabolomes, and phosphoproteomes, were further characterized. This approach provides a framework to illustrate the mechanistic (molecular and/or cellular) details of a genetic polymorphism, and reveals the power of integrating multiple-omics in a genetic epidemiological study to improve our understanding on the impact of genetic variant in human traits and diseases.

Missense annotation
Information (missense or not) regarding CLEC18 family gene variants was obtained from Haploreg v4.1.

Prediction of deleteriousness
Computational predictions of the impacts of missense variants were built based on the biochemical properties of amino acid substitutions. To evaluate the deleteriousness to protein function and the structure of amino acid substitutions, we adopted three in silico prediction tools including Sort Intolerant From Tolerant (SIFT; https:// sift. bii.a-star. edu. sg/; accessed Oct 2019) [5], Polymorphism Phenotyping v2 (PolyPhen-2; http:// genet ics. bwh. harva rd. edu/ pph2/; accessed Oct 2019) [6], and Combined Annotation Dependent Depletion (CADD; https:// cadd. gs. washi ngton. edu/ snv/; accessed Aug 2021) [7,8]. The SIFT score ranges 0 ~ 1. The cutoff score for SIFT is 0.05. A substitution is predicted to be tolerated (or deleterious) with a SIFT score of > 0.05 (or < 0.05). PolyPhen-2 uses the same score range as SIFT, but in the opposite direction, with a score closer to 1 representing high confidence of being damaging. CADD computes scores for all potential genetic variants throughout the reference genome. The raw score is further transformed into a CADD Phred score by ranking the variants for all 8.6 billion genetic variants. CADD Phred scores range 1 ~ 99. Phred scores for ranking the top 10% of causal genetic variants are assigned as 10. The top 1% are assigned as 20, etc. With a reasonable cutoff for deleteriousness ranging 10 ~ 20, we arbitrarily defined variants with a Phred score of > 20 as deleterious and < 10 as non-deleterious.

Structural homology modelling of the CLEC18A protein domain
We constructed a three-dimensional (3D) homology model by inputting the CAP/SCP/TAPS domain sequences of CLEC18A via the SWISS MODEL server [9]. Briefly, the algorithm identifies a "template protein", a sequence homologue of the input sequence, and uses the template protein to build a 3D model. In this case, human glioma pathogenesis-related protein 1 (GLIPR1) was selected as the template protein for model construction and further structural analysis. Visual rendering of 3D homology models was performed using PyMol v2.4.2 (https:// pymol. org/2/; accessed Oct 26, 2021).
Lipid-binding affinity test for wild-type and mutant form CLEC18A residue DNA fragments of the CAP/SCP/TAPS domain, which encoded the wild-type (WT; p.T151) and mutant form (p.M151) of the CLEC18A residue, were amplified by a reverse-transcription polymerase chain reaction (RT-PCR) and subcloned into the pcDNA3.1-hIgG1 Fc (mut) vector to generate the WT and mutant CLEC18A.Fc fusion proteins. The FreeStyle 293 Expression System (Invitrogen, Carlsbad, CA, USA) was applied to overexpress the CLEC18A.Fc fusion proteins. The detailed procedures were described in our previous study [1].
To perform the protein-lipid overlay assay, the phosphorylated derivatives of phosphatidylinositol (PIP) strips (P23751) membranes and Sphingo strips (S23753) membranes were purchased from Thermo Fisher Scientific (Waltham, MA, USA). Light exposure of the lipid strip membranes was avoided during the entire process before detection. The lipid strip membranes were initially blocked in 3% bovine serum albumin (BSA) in phosphate-buffered saline (PBS) for 1 h (h) at room temperature. Next, the SCP domain of the WT and mutant CLEC18A.Fc fusion proteins were added at final concentrations of 500 and 50 ng/ml for 2 h of incubation at room temperature. After three washes with 3% BSA-PBST (PBS plus 0.1% Tween20), the lipid membranes were incubated with an anti-human immunoglobulin G (IgG) horseradish peroxidase (HRP) antibody (1:5000) in blocking buffer for 1 h at room temperature. After three washes with 3% BSA-PBST, the lipids were detected with enhanced chemiluminescence reagents (GERPN2235, Cytiva, Chicago, IL, USA).

Data from the Taiwan biobank (TWB)
The Taiwan Biobank (TWB), a nationwide research database in Taiwan, was launched to facilitate biomedical research and further translate work into clinical settings by incorporating genomic, environmental, and disease profiles [10].
Participants were recruited from local communities, with inclusion criteria of being aged 30 ~ 70 years, physically active, without a cancer history, and self-reported to be of Han ancestry (i.e., both parents). Clinical (and follow up) data from anthropometric measurements, biospecimen tests, physical examinations, and questionnaires were obtained during sample enrollment. Written informed consent was provided by all individuals who participated in the TWB project. Ethical approval was obtained from the Institutional Review Board (IRB) of Taipei Medical University (IRB no.  N201906005), the Ethics and Governance Council of the  TWB (TWBR10807-05, TWBR10906-03), and Academia Sinica (AS-IRB01-16,018).

CLEC18A p.T151M genotypic data of the TWB cohort
Genomic DNA of participants in the TWB project was harvested using a standardized protocol and subjected to genotyping using the Axiom Genome-Wide TWB v2.0 Array Plate. Stringent quality-control filters for TWB biallelic SNV and indel genotype data were applied using PLINK v1.9 [11]. Variants with heterozygous haploid genotypes were addressed by (i) checking heterozygous calls in the pseudoautosomal region of chromosome (chr) X in males; (ii) sex checking; and (iii) directly removing the variants. Subjects with ambiguous sex data were removed during sex checks. Individuals and genotypes were sequentially filtered with a call rate threshold of 98%. Variants with a minor allele frequency (MAF) of < 1% and a Hardy-Weinberg equilibrium (HWE) P value of < 10 -10 were further discarded.
Next, a subset of independent variants was selected through linkage disequilibrium (LD) pruning by calculating the pairwise LD of autosomal variants (MAF > 10%) with parameters of a window size of 200 kb, a step size of 5, and a variance inflation factor threshold of 0.2. Outliers were detected and filtered according to the heterozygosity rate calculated from the pruned variant subset. Genetic relationships among individuals were estimated using Genome-wide Complex Trait Analysis (GCTA) [12], followed by calculating the principal components (PCs).

Phenotypic (clinical) profiles of the TWB cohort
Phenotypic varieties of TWB participants were collected from test data of biological specimens (blood and urine), physical examinations, and questionnaires. The phenotype data were further processed as follows.

Quantitative traits
In total, 86 quantitative traits were included in our study: Notably, for sitting systolic/diastolic pressures and heartbeat speed, values were averaged across three measurement time points. If a quantitative trait presented a censored value due to the detection limit, we directly replaced the record with the censored value.

Binary traits
In total, 84 binary traits were included: three alimen-

Phenome-wide association study (PheWAS) of rs75776403 in Taiwanese Hans
We conducted a phenome-wide association study (PheWAS) of rs75776403 using the TWB cohort. First, a list of genetically unrelated samples by filtering out 1st-or 2nd-order relationships was obtained using the "-unrelated" option implemented in KING v2.2.7 [13]. Next, association tests were conducted by incorporating covariates of age, squared age, sex (ignored in sexrestricted traits), and the top 20 PCs. The association model was determined according to the type of trait.

Quantitative traits
For the 86 quantitative traits, a linear regression model was adopted. The traits were normalized using the inverse normal transformation (Elfving method) before fitting the association model. Hypothesis testing was conducted using the type I (sequential) method.

Binary traits
For the 84 dichotomized traits, a logistic regression model was applied.

Ordinal traits
For the 19 ordinal traits, an ordered logistic regression model was applied using the "polr" function implemented in the MASS package. The Z statistics were calculated using the "coeftest" function implemented in the lmtest package.
To avoid false positivity, a local false discovery rate (FDR) was estimated using the "lfdr" function implemented in the qvalue package.

Meta-analysis
A fixed-effect or random-effect model was applied for a trans-ethnic meta-analysis across different populations using a restricted maximum likelihood to estimate the heterogeneity variance. Knapp-Hartung adjustments were further applied for the random-effect model. Statistical tests and forest plot visualization were conducted using the meta package.

Genomic data
Raw genomic data (Affymetrix Genome-Wide Human SNP 6.0 Array) from CCLE samples in Affymetrix CEL format were converted to Affymetrix CHP and then binary variant call format using the affy2vcf (https:// github. com/ frees eek/ gtc2v cf ) tool. Variants (SNPs and short insertion/deletions (indels)) were normalized according to the human genome reference assembly 38 (GRCh38). We next extracted only biallelic SNPs that satisfied the following criteria: (i) autosomal or chrX; (ii) call rate of > 98%; (iii) MAF of > 1%; (iv) non-somatic (according to CCLE mutations (~ 1.3 M somatic calls across 1741 cell lines) from whole-exome sequencing profiles); and (v) showing LD of r 2 > 0.4 to at least one of the nearby 50 SNPs (within 1000 Kb). After excluding cell lines that failed the heterogeneity test, these putative "germline" variants were next subjected to genotype imputation through the Michigan Imputation Server by using 1000G Phase 3 v5 as a reference. Finally, variants with a post-imputation info score of > 0.3 and a MAF of > 5% were selected for association analyses.

Transcriptomic and proteomic data
For normalized CCLE transcriptomic (mRNA sequencing) and proteomic (quantitative multiplexed proteomics profiles by mass spectrometry from the Gygi lab (https:// gygi. hms. harva rd. edu/ index. html; accessed Dec 14, 2021) [18] data, gene symbols were first harmonized using the HGNChelper package. Next, the expression values of duplicate genes were mean-aggregated.

Metabolomic data
Data regarding CCLE metabolomes containing expression values for 228 metabolites [19] were directly downloaded from the DepMap website.

Phosphoproteomic data
Data on phosphorylation sites (p-sites) across ~ 10 K genes (by trypsin and/or GluC digestion) were directly adopted from M. Frejno et al. [20]. Gene names were harmonized using the HGNChelper package.
Herein, we adopted the general term "molecular feature" to indicate mRNA, protein, metabolite, and p-site of multiomics. Associations between rs75776403 and molecular features of each omic profile were conducted as follows: (i) molecular features with zero variance were removed; (ii) a linear model between a molecular feature (as a dependent variable) and rs75776403 (as the independent variable) was fitted by including sex, age, histology, ethnicity, pathology and primary cancer type as covariates, with the type I (sequential) significance was assessed using an exact permutation test with the "lmp" function implemented in the lmPerm package; and (iii) a differentially expressed molecular feature was defined based on a significant threshold of P = 0.01.

Gene set and pathway enrichment analysis Transcriptome and proteome
We conducted gene set enrichment analysis (GSEA) to test for the enrichment of pathways (and/or gene sets) in expression data. A fast algorithm (i.e. Fast Gene Set Enrichment Analysis; FGSEA) implemented in the fgsea package was adopted with using weighted differential expression test statistics (i.e. − log 10 (P) × β; where P is the significant value and β is the regression coefficient of a molecular feature) from differentially expressed test results as input. Pathways (and/or gene sets) used for the test were compiled from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, gene ontology (GO) biological process (BP) database, and Reactome pathway database. Significant enrichment was defined as P ≤ 0.01 and the direction of enrichment was confirmed using a normalized enrichment score (NES of > 0 for positively enriched and < 0 for negatively enriched).
Robust expression enrichment was defined as follows: the weighted differential expression test statistics of each omic were first scaled. Next, pathways with scaled statistics of > 2 or < (− 2) in both expression omics were considered to be robustly enriched.

Phosphoproteome
The weighted differential expression test statistics of p-sites from trypsin and gluC were separately calculated, and then aggregated by the maximum, which takes account of the difference in site-specificity of the two digestive enzymes. The weighted statistics of each digestive enzyme and the merged weighted statistics were subjected to GSEA analysis. Reference pathways regarding human phosphosite-specific signatures were adopted from PTMsigDB v1.9.0 (https:// github. com/ broad insti tute/ ssGSE A2.0; accessed 30 Dec 2021) [21]. After harmonizing gene symbols using the HGNChelper package, we conducted the FGSEA as described above.

Tissue-specific mRNA expression data
The mRNA-sequencing (Seq) expression profiles in transcript per million of several tissues, including the adrenal gland, brain, liver, ovary, pituitary, testis, and thyroid were queried from the Genotype-Tissue Expression (GTEx; v8) portal (https:// gtexp ortal. org/ home/ datas ets; accessed Nov 29, 2021). Gene symbols were harmonized using the HGNChelper package. Next, expression values of duplicated genes were aggregated by the maximum. A Spearman's rho (ρ) statistic was calculated to assess the rank-based association of pairwise genes.

CLEC18A p.T151M disrupted the lipid-binding affinity of the CAP/SCP/TAPS domain
The rs75776403 residue (p.M151 was considered as a "mutant" form compared to "wild-type" p.T151) was located on the C-terminal cysteine-rich secretory protein/antigen 5/pathogenesis related-1 (CAP) or Spermcoating protein (SCP) or Tpx antigen 5/pathogenesis related-1/Sc7 (TAPS) domain of CLEC18A. To assess the possible location of T151 of CLEC18A, we have to establish the protein structure. However, the precise tertiary structures of the SCP domain of CLEC18A remain unknown, we, therefore, generated a homologous model via the SWISS-MODEL Server [9]. From the homology modelling results, residue T151 of CLEC18A is located on the outer surface of the CAP/SCP/TAPS domain. Furthermore, this residue is located in the bottom cavity of the folded protein domain. Taken together, these results illustrated the accessibility of the residue to other interacting substrates of CLEC18A (Fig. 1C). Since threonineto-methionine (polar uncharged to neutral hydrophobic) conversion may abrogate the polarity of the residue 151 of CLEC18A, we speculated that rs75776403 (p.T151M) disrupted the substrate-binding affinity of the CLEC18A protein domain. Given that the sterol-and/or acidic glycolipid-binding (and exportation) function was conserved across SCP/TAPS/CAP proteins [1,22], we next tested whether p.T151M might disrupt the lipid-binding ability of the CAP/SCP/TAPS domain of CLEC18A (Fig. 1D). As a result, WT CLEC18A showed binding affinity to two acidic phospholipids, i.e., phosphatidic acid (PA) and phosphatidylserine (PS). Specifically, the ability to bind PA and PS was abolished in CLEC18A with p.M151, suggesting functional disruption by the missense rs75776403 in the CAP/SCP/TAPS domain.

Phenotypic landscape of rs75776403 in different human populations
Given the structural and functional perturbation of rs75776403 (p.T151M) in the CAP/SCP/TAPS domain of CLEC18A, we next clarified the genetic epidemiological links in humans. First, the allelic frequencies of rs75776403 in different ethnicities were summarized: rs75776403 c.T allele (corresponding to p.M151) was the most abundant in Asians (48%), followed by admixed Americans (32%), Europeans (27%) and Africans (11%). Specifically, the c.T allelic frequency of rs75776403 was 47% in Taiwanese Hans.
Second, we conducted a phenome-wide association study (PheWAS) of rs75776403 in the Taiwanese population to examine the correlations between rs75776403 and multifarious human traits. By leveraging a Taiwan Biobank (TWB) cohort with a sample size of 68,080, we parsed the genetic association profile of rs75776403 in 189 traits as follows: (i) Among quantitative traits (Additional file 1: Table S4), 62 traits were directly adopted from clinical data (compiled from blood or urine specimens, physical examinations, and questionnaires) of TWB subjects. Next, 24 traits with clinical relevance were derived and further included (Additional file 1: Table S5), resulting in a total number of 86 quantitative traits spanning 11 phenotype categories (Additional   Table S11, top). For the Japanese population, only three out of 229 traits were found to be associated with rs75776403, i.e., height (Fdr = 1.51 × 10 -8 ), body weight (Fdr = 5.02 × 10 -3 ), and serum creatinine (Fdr = 6.13 × 10 -3 ; Additional file 1: Table S11, bottom).

Multiomics profiling of molecular impacts of the CLEC18A c.452C-to-T polymorphism
Since the phenotypic relevance of rs75776403 (CLEC18A c.452C-to-T) was elaborated, the next critical question  showing overlapping of CLEC18A transcript-associated genes across three tissues (brain, pituitary gland and thyroid). J Bar plot showing three genes (x-axis) associated with CLEC18A transcript levels in all brain, pituitary gland and thyroid tissues (P < 0.05). The strength of the association was quantified by Spearman's rho statistic (y-axis). Genes that are implicated in cellular responses to thyroid hormone stimulus and corticosteroid receptor signaling are colored in purple and orange, respectively was to know how the genetic variant at this locus impact human traits through perturbing specific molecular or cellular processes. We thus leveraged human cell linebased multiomics (compiling transcriptome, proteome, metabolome, and phosphoproteome) from the Cancer Cell Line Encyclopedia (CCLE) database to conduct the following analyses. For each molecular feature from the transcriptome and proteome (totals of 19,095 mRNAs and 12,183 proteins), a linear regression model was fitted and associations with rs75776403 were tested by exact permutation. Next, gene set enrichment analysis (GSEA) based on the KEGG, GO BP, and Reactome databases was conducted to identify the potential pathways (and/or gene sets). As a result, we found 278 (251 upregulated and 27 downregulated) and 70 (33 upregulated and 37 downregulated) differentially expressed mRNAs and proteins, respectively (Fig. 2B, C). The GSEA analysis further revealed 392 (365 positively and 27 negatively) and 85 (34 positively and 51 negatively) significantly enriched pathways from mRNAs and proteins, respectively (Additional file 1: Table S12). Intriguingly, the GSEA results pinpointed positive enrichment in the cell cycle, in contrast to negative enrichment in (carbohydrate, amino acid, and lipid) metabolism and immune activation, thus confirming functional consequences (i.e., metabolic or proliferative shifts, and immune deactivation) of CLEC18A transcription and/or translation due to the rs75776403 c.C-to-T polymorphism.

Implications of CLEC18A p.T151M in hormonal regulation
Although these findings imply that metabolic, proliferative and immune-related pathways are governed by CLEC18A p.T151M, it was still unclear whether an upstream regulatory mechanism mediates the molecular and/or cellular processes. To determine these, we combined mRNAs-and proteins-derived GSEA results and identified 27 robust (six positively and 21 negatively enriched) pathways ( Fig. 2G and Additional file 1: Fig.  S2).
Because hypothalamus-pituitary-thyroid (HPT) axis maintains thyroid hormone homeostasis in humans, correlations of CLEC18A mRNA with 30 leading-edge genes in the cellular responses to the thyroid hormone stimulus and corticosteroid receptor signaling pathways were examined within involved (HPT) tissues (brain, pituitary, and thyroid). We found 27 (90.0%), 14 (46.7%), and 16 (53.3%) genes that respectively exhibited an association (P < 0.05) with CLEC18A mRNA expression in the brain, pituitary gland, and thyroid (Table 3). Notably, BRD8, ARNTL, and YWHAH were significantly correlated to the CLEC18A transcript in three HPT tissues. Importantly, BRD8 and ARNTL showed the consistent effect directions, in contrast to YWHAH which showed contrary results among the tissues (Figs. 2I-J). The similar approaches were applied in other thyroid hormone related tissues, such as the adrenal gland, liver, ovary and testis (Additional file 1: Table S17). The data suggested that rs75776403 plays roles in both upstream biosynthesis and downstream hormonal responses regulated by the thyroid hormone.

Discussion
Despite remarkable advances in genetics, little attention has been paid to whether and how genetic variants impact aspects of molecular and/or cellular features in the pathophysiology of human traits beyond simple susceptibility. Herein, we identified a missense cis-eQTL (p.T151M) in CLEC18A that was mainly correlated with anthropometrics (especially body height), nephrotic, and hematological traits (and/or diseases) in Taiwanese, Table 3 Association between CLEC18A mRNA level and the (leading edge) genes implicated in the thyroid-stimulated pathway or corticosteroid receptor signaling N sample size. Genes implicated in thyroid stimulated pathways were labeled in white color; while genes implicated in corticosteroid receptor signaling were labeled in grey color. A P value of < 0.05 was highlighted in bold European and Japanese populations. This observation implicates the pleiotropy (associated with more than one trait) of rs75776403 as well as the concept that a genetic polymorphism could exert functional impacts through both regulating gene expressions (as a cis-eQTL) and disrupting a critical protein domain (as a missense variant). Focusing on these pathophysiological aspects may provide novel insights regarding genetic epidemiological profiles of the given trait or disease, as well as facilitate precision medicine. CLEC18A has been linked to host immune defence against dengue viral infection through the C-type lectinlike domain (CTLD) [2]. Serum CLEC18 protein levels were found to be positively correlated with the HBV DNA load, HBsAg levels, and HCV viral loads [3,4]. Nonetheless, whether CLEC18A is associated with other diseases, or which protein domain is responsible for the physiological correlation, is still unclear. By exploring the functional effects of the rs75776403 c.C-to-T polymorphism, we identified that this variant promotes metabolic and proliferative shifts, as well as immune deactivation in human cells. This finding, therefore, not only elaborates on molecular and/or cellular links to the genetic associations of rs75776403, but also sheds light on the functional roles of CLEC18A in all cell types (such as immune cells) that may be implicated in the associated traits or diseases.
The CAP/SCP/TAPS domain, which is possessed by CLEC18A and other genes, was proposed to bind and transport sterols, acidic glycolipids, and/or acidic phospholipids [28,29]. Specifically, this domain is directly responsible for packing lipids in the endoplasmic reticulum (ER) into vesicles and secreting them out of cells [30]. Here, rs75776403 p.T151M was shown to disrupt the binding affinity of the CAP/SCP/TAPS domain to the PA and PS (both of which are acidic phospholipids). Notably, further experimental validation is needed to consolidate our findings. Since PA and its derivative, lysophosphatidic acid (lysoPA), may act as extracellular ligands of G protein-coupled receptors (GPCRs) [31], which play pivotal roles in metabolism [32], diminished extracellular exportation of PA due to rs75776403 p.T151M disruption of the CAP/SCP/TAPS domain may lead to less GPCR signaling, and thus perturb homeostasis of metabolites including carbohydrates, amino acids and lipids [32,33]. It is noteworthy that PA and lysoPA may also act as mitogens. Therefore, CLEC18A p.T151M may lead to the accumulation of free-form PA by either decreasing extracellular exportation of PA or decreasing CLEC18A-bounded PA. These free-form PAs may thus exert mitogenic functions to enhance cell proliferation. Furthermore, PS may also bind to the receptor for advanced glycation end products (RAGE) and affect cell cycle genes implicated in the G 1 /S phase transition [34].
The underlying mechanisms of immune deactivation through CLEC18A p.T151M can be explained as follows. First, PS-recognized receptors share a common feature that promotes the production of anti-inflammatory mediators [34,35]. Hence, rs75776403 p.T151M may lead to a greater amount of the free form of intracellular PS and contribute to immune inhibition. Second, consistent with speculation that CLEC18A may extract sterols from assembled membranes of pathogens (e.g., dengue virus type 2 [36]) during the intracellular viral replication stage in the ER [1], we observed negative enrichment of immune activation pathways by rs75776403 p.T151M. Third, genes with the CAP/SCP/TAPS domain were proposed to be a Ca 2+ -specific serine protease [37,38], which is critical for antigen processing and to elicit pathogen-specific immune activation of T cells [39,40]. rs75776403 p.T151M may thus disrupt immune activation through diminishing the proteolytic activity of CLEC18A. Fourth, PA was implicated in the biogenesis of recycling endosomes (REs) [41]. Since the endosome is critical for innate and adaptive immune function [42,43], we proposed that CLEC18A p.T151M may cause retention of PA in the ER, thus weakening the integrity of membrane structure and the subcellular function of endosomes. Notably, a decrease in the amount of PA in RE may also affect the recycling/turnover of EGFR [44].
Additionally, we identified a correlation of rs75776403 p.T151M and CLEC18A transcripts with the end products of the thyroid hormone biogenesis pathway, thyroglobulin (TG) and thyroid hormone receptor (THRB). We should notice that the upstream receptors of the pathway, thyrotropin-releasing hormone receptor (TRHR) [45] and thyrotropin receptor (TSHR) [46], are also GPCRs. Thus, loss of PA-binding affinity due to rs75776403 p.T151M may result in loss of synergistic G protein activation. Specifically, thyroid hormones (triiodothyronine (T3), and thyroxine (T4)) are essential for growth, development, and metabolisms [47,48]. The functions of triiodothyronine is not only in cell proliferation [49][50][51], branching morphogenesis of the lungs [52], but also in the DNA synthesis, differentiation to osteoblasts, and act on growth plate chondrocytes [53,54]. Thus, previous findings were in line with our results that the rs75776403 variant significantly associates with the human growth phenotypes.
Moreover, thyroid hormones modulate the overall synthesis of phosphocreatine (PCr) through the regulations for creatine kinase and mitochondrial oxidative phosphorylation [55]. PCr is a protector against cardiac disease, as a meta-analysis revealing that uptake of extra PCr produced lower risks for all cardiac incidences [56]. Thus, it is reasonable to speculate that the rs75776403 p.T151M correlation with lower risk for cardiomyopathy may be due to thyroid hormones-mediated PCr accumulation. Moreover, consistent with our PheWAS results, thyroid hormones were shown to alter several biomarkers related to kidney function (serum levels of creatinine and eGFR) [57] and platelets (mean platelet count, mean platelet volume, and platelet distribution width) [58,59].
To date, more than 5500 publications and ~ 330 K associations have been elaborated (by the Genome-Wide Association Study Catalog; February 2022), but most results still need to be functionally validated. One of the possible ways to accomplish this is to integrate data from multiple sources. Herein, we leveraged cis-eQTL annotations from the GTEx database to confirm the tissuespecific cis-regulatory effects of the rs75776403 c.C-to-T polymorphism on mRNA levels of CLEC18A. Furthermore, a protein domain simulation and lipid strip test were conducted to confirm that missense rs75776403 p.T151M abruptly affects the lipid-binding ability of the CLEC18A. By further incorporating the rs75776403correlated multiomics to pinpoint metabolic/proliferative shift and implication in thyroid hormonal regulation, these results elaborate a promising way to link the molecular and/or cellular features of rs75776403 to phenotypic landscape of humans.

Conclusions
The rs75776403 was identified as a crucial missense cis-eQTL in CLEC18A, and linked the polymorphism to various phenotypes. We demonstrated that CLEC18A binds specifically to two phospholipids, PA and PS via the CAP/ SCP/TAPS domain. Defects in the lipid-binding ability were identified in the rs75776403 altered allele (p.M151) carriage. By elaborating CLEC18A rs75776403-associated multiomics, CLEC18A has great impact to regulate cellular processes, metabolism, cell cycle, immune activation, thyroid hormone biosynthesis and immune responses. Thus, a single amino acid change T 151 →M 151 ultimately contributes to the variability of human traits and differential outcome of human diseases. This study demonstrates a promising approach to get insights on the potential impact of genetic polymorphism for precision medicine (Fig. 3).