Chromosome-StartPosition-EndPosition-RepeatSequence
1-44835-44867-AAAT1-370632-370648-TTTC| Field Name | Description | Example |
|---|---|---|
| LocusId | Unique id for this locus in the format "chrom-start0based-end-motif" | 1-44835-44867-AAAT |
| TR-Atlas Id | TR id from TR-Atlas | TR1 |
| VariantType | The type of variant | Repeat |
| ReferenceRegion_hg38 | Genomic coordinates in hg38 | chr1:44835-44867 |
| Source | The source catalog that contributed this locus definition. Possible values-KnownDiseaseAssociatedLoci, Illumina174kPolymorphicTRs, PerfectRepeatsInReference, PolymorphicTRsInT2TAssemblies | Illumina174kPolymorphicTRs |
| LocusStructure | Structure of the locus | (AAAT)* |
| FoundInPerfectRepeatsInReference | Whether the TR was found in perfect repeats in reference | Yes |
| CanonicalMotif | The reference repeat motif, normalized by computing all cyclic shifts (ie. CAG, AGC, GAC, CTG, TGC, GCT) and taking the one that's alphabetically first | AAAT |
| ReferenceRepeatPurity | Fraction of bases within the locus interval that exactly match perfect repeats of the given motif. Range: 0.43 to 1 | 1.0 |
| NumRepeatsInReference | Locus interval size divided by motif size, rounded down to the nearest integer. Range:1 to 300 | 8 |
| NsInFlanks | Number of "N" bases in the reference genome within +/-1000bp of the TR locus. ExpansionHunter reports an error is this exceeds 5 | 0 |
| LeftFlankMappability | UCSC 36-mer mappability track per-base mappability scores averaged across a 150bp window of flanking sequence immmediately to the left of the TR locus. Range:0 to 1 | 0.31 |
| FlanksAndLocusMappability | UCSC 36-mer mappability track per-base mappability scores averaged across the TR locus +/- 150bp of flanking sequence. Range:0 to 1 | 0.29 |
| RightFlankMappability | UCSC 36-mer mappability track per-base mappability scores averaged across a 150bp window of flanking sequence immmediately to the right of the TR locus. Range:0 to 1 | 0.3 |
| LPSLengthStdevFromHPRC100 | Standard deviation of the allele frequency distribution represented by AlleleFrequenciesFromHPRC100. | 0.973 |
| LPSMotifFractionFromHPRC100 | Standard deviation of the number of repeats in the longest pure segment (LPS) detected at this tandem repeat locus in 100 high-coverage PacBio HiFi samples from the HPRC. | AAAT: 196/196 |
| TRsInRegion | Number of TRs in the vicinity of this locus that are separated from each other by no more than 6bp of spacer sequence. Range: 1 to 13 TRs | 2 |
| FoundInKnownDiseaseAssociatedLoci | Whether this TR was found in known disease associated loci | Yes |
| KnownDiseaseAssociatedLocus | Whether this locus is known to cause a monogenic disease | Yes |
| KnownDiseaseAssociatedMotif | Whether this locus has the same canonical motif as a locus that is known to cause a monogenic disease | Yes |
| FoundInIllumina174kPolymorphicTRs | Whether this TR was found in Illumina174k polymorphic TRs | Yes |
| FoundInPolymorphicTRsInT2TAssemblies | Whether this TR was found in polymorphic TRs in T2T assemblies | Yes |
| StdevFromIllumina174k | Standard deviation of the allele frequency distribution represented by AlleleFrequenciesFromIllumina174k. Range: 0 to 31.5 | 0.438384433926388 |
| StdevFromT2TAssemblies | Standard deviation of the allele frequency distribution represented by AlleleFrequenciesFromT2TAssemblies. Range: 0 to 1267.3 | 0.4581427791078368 |
| VariationCluster | The chrom:start-end of coordinates of the variation cluster that contains this TR locus | 10:100001091-100001255 |
| VariationClusterSizeDiff | The difference in size between the original TR locus and the region spanned by the variation cluster. Range: 6 to 8,308 bp | 10 |
| Field Name | Description | Example |
|---|---|---|
| GencodeGeneRegion | The most significant gene region that this locus overlaps based on Gencode v46 gene annotations. Possible values in order from least to most significant are intergenic, promoter, intron, exon, 3' UTR, 5' UTR, coding. Here, "exon" means an exon of a non-coding transcript. | exon |
| GencodeGeneName | The first gene whose reported GencodeGeneRegion overlaps this locus based on Gencode v46 annotations | WASH7P |
| GencodeGeneId | ENSG gene id of the first gene whose reported GencodeGeneRegion overlaps this locus based on Gencode v46 annotations | ENSG00000227232 |
| GencodeTranscriptId | ENST transcript id of the first transcript whose reported GencodeGeneRegion overlaps this locus based on Gencode v46 annotations | ENST00000488147 |
| RefseqGeneRegion | [RefSeq annotations. See description of GencodeGeneRegion] | exon |
| RefseqGeneName | [RefSeq annotations. See description of GencodeGeneName] | WASH7P |
| RefseqGeneId | [RefSeq annotations. See description of GencodeGeneId] | WASH7P |
| RefseqTranscriptId | [RefSeq annotations. See description of GencodeTranscriptId] | NR_024540 |
| ManeGeneRegion | [MANE annotations. See description of GencodeGeneRegion] | 3' UTR |
| ManeGeneName | [MANE annotations. See description of GencodeGeneName] | GNB1 |
| ManeGeneId | [MANE annotations. See description of GencodeGeneId] | ENSG00000078369 |
| ManeTranscriptId | [MANE annotations. See description of GencodeTranscriptId] | ENST00000378609 |
| Field Name | Description | Example |
|---|---|---|
| AlleleFrequenciesFromIllumina174k | Allele frequencies of TRs detected from 2,504 individuals in the 1000 Genomes Project | - |
| AlleleFrequenciesFromT2TAssemblies | Allele frequencies of TRs detected using 78 haplotype-resolved T2T assemblies from the Human Pangenome Reference Consortium (HPRC) and Human Genome Structural Variation Consortium (HGSVC) | - |
| All | Allele frequencies from all populations of TR-Atlas | - |
| African | Allele frequencies from African populations of TR-Atlas | - |
| South_Asian | Allele frequencies from South Asian populations of TR-Atlas | - |
| East_Asian | Allele frequencies from East Asian populations of TR-Atlas | - |
| European | Allele frequencies from European populations of TR-Atlas | - |
| Hispanic | Allele frequencies from Hispanic populations of TR-Atlas | - |
| Field Name | Description | Example |
|---|---|---|
| genetype | one of the following values, "coding", "coding;nonCoding", "coding;nonCoding;pseudo", "coding;pseudo", "intergenic", "nonCoding", "nonCoding;pseudo", "pseudo" | coding |
| genename | gene symbol | GNB1 |
| protien_coding | whether TR is located in the CDS region of the protein-coding genes | protien_coding |
| exon | whether TR is located in the exon region of the gene | exon |
| UTR | whether TR is located in the UTR region of the gene | three_prime_UTR |
| mean_1000g_50Han | the mean of TR repeat number in cohort1 dataset containing 50 Beijing Han (North China) individuals of 1000 Genome Project | 8 |
| sd_1000g_50Han | the standard deviation of TR repeat number in cohort1 dataset containing 50 Beijing Han (North China) individuals of 1000 Genome Project | 2.42 |
| max_1000g_50Han | the max of TR repeat number in cohort1 dataset containing 50 Beijing Han (North China) individuals of 1000 Genome Project | 30 |
| mean_1000g_100 | the mean of TR repeat number in cohort2 dataset containing 100 individuals from 6 ethnic groups of 1000 Genome Project | 8 |
| sd_1000g_100 | the standard deviation of TR repeat number in cohort2 dataset containing 100 individuals from 6 ethnic groups of 1000 Genome Project | 2.42 |
| max_1000g_100 | the max of TR repeat number in cohort2 dataset containing 100 individuals from 6 ethnic groups of 1000 Genome Project | 29 |
| TADboundry | whether TR is intersected with a TADboundry | TADboundry |
| CTCF | whether TR is intersected with a CTCF | CTCF |
| TFBS | whether TR is intersected with a TFBS | TFBS |
| gene_pLI | probability of being loss-of-function intolerant | 0.01 |
| HPO | phenotype records in HPO of the intersect gene | HP:0000952:Jaundice; |
| expansion_rate | (repeat number-max of cohort2) /max of cohort2 | 0.5 |
| score | STRAS prediction score | 0.446 |
| Field Name | Description | Example |
|---|---|---|
| gene | intersected gene | intergenic |
| location | TR located in intergenic, First exon, Middle exon, or Last exon | intergenic |
| region | TR located in intergenic, intron, or exon | intergenic |
| gene_type | one of the following values-antisense,IG_C_gene,intergenic,lincRNA,polymorphic_pseudogene,processed_transcript,protein_coding,pseudogene,sense_overlapping,Unknown | intergenic |
| gene_distance | the distance between the TR locus and the nearest gene | 8755 |
| gerp | the GERP score is defined as the reduction in the number of substitutions in the multi-species sequence alignment compared to the neutral expectation. | 0.013 |
| TAD | whether located in topologically associated domain | 0 |
| eSTR | Is it an eSTR | 1 |
| opReg | whether located in open regulatory regions | 1 |
| tissue_simple | Nervous_System-the max expression of the gene is in nervous system; No_expression-the expression of the gene is not detected; Other-the max expression of the gene is in other tissue. | No_expression |
| promoter | whether located in promoter | 0 |
| UTR_3 | whether located in 3' UTR | 0 |
| UTR_5 | whether located in 5' UTR | 0 |
| loeuf | the loss-of-function observed/expected upper bound fraction. LOEUF is a conservative estimate of evolutionary selection against disease-causing variants based on the upper limit of the confidence interval for the observed/expected pLoF mutation rate. Genes with lower observed/expected pLoF variant ratios are evolutionarily constrained; LOEUF scores are binned into deciles ranging from 0 being most constrained, to 9, indicating least constrained. A lower LOEUF value indicates a more essential gene. | 9 |
| pLi | probability of being loss-of-function intolerant | 0 |
| RAD21 | RAD21 transcription factor binding sites number | 1 |
| SMC3 | SMC3 transcription factor binding sites number | 1 |
| per_g | the percentage of G base in the motif | 0 |
| per_c | the percentage of C base in the motif | 0 |
| per_a | the percentage of A base in the motif | 25 |
| per_t | the percentage of T base in the motif | 75 |
| gc_content | the percentage of G and C bases in the motif | 0 |
| eSh0 | star network's topological indices-Shannon entropy | 3.71 |
| eSh1 | star network's topological indices-Shannon entropy | 3.69 |
| eSh2 | star network's topological indices-Shannon entropy | 3.69 |
| eSh3 | star network's topological indices-Shannon entropy | 3.68 |
| eSh4 | star network's topological indices-Shannon entropy | 3.68 |
| eSh5 | star network's topological indices-Shannon entropy | 3.68 |
| eTr0 | star network's topological indices-spectral moments | 41 |
| eTr1 | star network's topological indices-spectral moments | 0 |
| eTr2 | star network's topological indices-spectral moments | 14.5 |
| eTr3 | star network's topological indices-spectral moments | 1.5 |
| eTr4 | star network's topological indices-spectral moments | 8.32 |
| eTr5 | star network's topological indices-spectral moments | 2.36 |
| eH | star network's topological indices-Harray number | 135.42 |
| eW | star network's topological indices-Wiener index | 11480 |
| eS6 | star network's topological indices-Gutman topological index | 2287.04 |
| eS | star network's topological indices-Schultz topological index | 64898 |
| J | star network's topological indices-Balaban distance connectivity index | 0.33 |
| eX0 | star network's topological indices-Kier-Hall connectivity index | 24.79 |
| eX1R | star network's topological indices-Randic connectivity index | 20.25 |
| eX2 | star network's topological indices-Kier-Hall connectivity index | 16.53 |
| eX3 | star network's topological indices-Kier-Hall connectivity index | 13.07 |
| eX4 | star network's topological indices-Kier-Hall connectivity index | 10.42 |
| eX5 | star network's topological indices-Kier-Hall connectivity index | 8.36 |
| SVM | the support vector machine prediction for likelihood of pathogenicity (0 = Benign, 1 = Pathogenic) | 0.00558030318633804 |
| XGB | the XGBoost prediction for likelihood of pathogenicity | 1.6952976e-05 |
| EnsembleConfidence | confidence score producing by calculating a sum of the SVM and XGB scores. | 0.00559725616212212 |
| EnsembleMax | the maximum score between the SVM and XGB predictions | 0.00558030318633804 |
| EnsembleBinary | rounding the ensemble maximum score to the nearest integer (0 or 1). | 0 |
A: Please check if the LocusId format is correct, ensuring the use of English hyphens as separators. If the problem persists, please contact the system administrator.
A: Under normal circumstances, a single prediction and report generation completes within 5 minutes. It may take longer if the system is busy.
A: The current version supports both single and batch prediction.
A: The basic annotation information for the tandem repeat loci is from the trexplorer-catalog and TRAD database. The allele frequencies were obtained from Illumina 174k polymorphic TRs in 1kGP, Polymorphic TRs in 78 T2T assemblies and TR-Atlas. The effects prediction was performed using STRAS and RExPRT.
A: When using reports generated by this system, please cite: TREFFIC v1.0.