TREFFIC - Help Documentation

Prediction Steps:
  1. Enter the LocusId of the TR locus in the input box on the homepage.
  2. Click the "Start Prediction" button.
  3. Wait for the system to predict and generate the report.
  4. View the prediction report.
  5. Click the "Download PDF Report" button to get the detailed report.
LocusId Format Explanation:
Chromosome-StartPosition-EndPosition-RepeatSequence

Examples:
  • 1-44835-44867-AAAT
  • 1-370632-370648-TTTC

Basic Information
Field Name Description Example
LocusId Unique id for this locus in the format "chrom-start0based-end-motif" 1-44835-44867-AAAT
TR-Atlas Id TR id from TR-Atlas TR1
VariantType The type of variant Repeat
ReferenceRegion_hg38 Genomic coordinates in hg38 chr1:44835-44867
Source The source catalog that contributed this locus definition. Possible values-KnownDiseaseAssociatedLoci, Illumina174kPolymorphicTRs, PerfectRepeatsInReference, PolymorphicTRsInT2TAssemblies Illumina174kPolymorphicTRs
LocusStructure Structure of the locus (AAAT)*
FoundInPerfectRepeatsInReference Whether the TR was found in perfect repeats in reference Yes
CanonicalMotif The reference repeat motif, normalized by computing all cyclic shifts (ie. CAG, AGC, GAC, CTG, TGC, GCT) and taking the one that's alphabetically first AAAT
ReferenceRepeatPurity Fraction of bases within the locus interval that exactly match perfect repeats of the given motif. Range: 0.43 to 1 1.0
NumRepeatsInReference Locus interval size divided by motif size, rounded down to the nearest integer. Range:1 to 300 8
NsInFlanks Number of "N" bases in the reference genome within +/-1000bp of the TR locus. ExpansionHunter reports an error is this exceeds 5 0
LeftFlankMappability UCSC 36-mer mappability track per-base mappability scores averaged across a 150bp window of flanking sequence immmediately to the left of the TR locus. Range:0 to 1 0.31
FlanksAndLocusMappability UCSC 36-mer mappability track per-base mappability scores averaged across the TR locus +/- 150bp of flanking sequence. Range:0 to 1 0.29
RightFlankMappability UCSC 36-mer mappability track per-base mappability scores averaged across a 150bp window of flanking sequence immmediately to the right of the TR locus. Range:0 to 1 0.3
LPSLengthStdevFromHPRC100 Standard deviation of the allele frequency distribution represented by AlleleFrequenciesFromHPRC100. 0.973
LPSMotifFractionFromHPRC100 Standard deviation of the number of repeats in the longest pure segment (LPS) detected at this tandem repeat locus in 100 high-coverage PacBio HiFi samples from the HPRC. AAAT: 196/196
TRsInRegion Number of TRs in the vicinity of this locus that are separated from each other by no more than 6bp of spacer sequence. Range: 1 to 13 TRs 2
FoundInKnownDiseaseAssociatedLoci Whether this TR was found in known disease associated loci Yes
KnownDiseaseAssociatedLocus Whether this locus is known to cause a monogenic disease Yes
KnownDiseaseAssociatedMotif Whether this locus has the same canonical motif as a locus that is known to cause a monogenic disease Yes
FoundInIllumina174kPolymorphicTRs Whether this TR was found in Illumina174k polymorphic TRs Yes
FoundInPolymorphicTRsInT2TAssemblies Whether this TR was found in polymorphic TRs in T2T assemblies Yes
StdevFromIllumina174k Standard deviation of the allele frequency distribution represented by AlleleFrequenciesFromIllumina174k. Range: 0 to 31.5 0.438384433926388
StdevFromT2TAssemblies Standard deviation of the allele frequency distribution represented by AlleleFrequenciesFromT2TAssemblies. Range: 0 to 1267.3 0.4581427791078368
VariationCluster The chrom:start-end of coordinates of the variation cluster that contains this TR locus 10:100001091-100001255
VariationClusterSizeDiff The difference in size between the original TR locus and the region spanned by the variation cluster. Range: 6 to 8,308 bp 10
Gene Information
Field Name Description Example
GencodeGeneRegion The most significant gene region that this locus overlaps based on Gencode v46 gene annotations. Possible values in order from least to most significant are intergenic, promoter, intron, exon, 3' UTR, 5' UTR, coding. Here, "exon" means an exon of a non-coding transcript. exon
GencodeGeneName The first gene whose reported GencodeGeneRegion overlaps this locus based on Gencode v46 annotations WASH7P
GencodeGeneId ENSG gene id of the first gene whose reported GencodeGeneRegion overlaps this locus based on Gencode v46 annotations ENSG00000227232
GencodeTranscriptId ENST transcript id of the first transcript whose reported GencodeGeneRegion overlaps this locus based on Gencode v46 annotations ENST00000488147
RefseqGeneRegion [RefSeq annotations. See description of GencodeGeneRegion] exon
RefseqGeneName [RefSeq annotations. See description of GencodeGeneName] WASH7P
RefseqGeneId [RefSeq annotations. See description of GencodeGeneId] WASH7P
RefseqTranscriptId [RefSeq annotations. See description of GencodeTranscriptId] NR_024540
ManeGeneRegion [MANE annotations. See description of GencodeGeneRegion] 3' UTR
ManeGeneName [MANE annotations. See description of GencodeGeneName] GNB1
ManeGeneId [MANE annotations. See description of GencodeGeneId] ENSG00000078369
ManeTranscriptId [MANE annotations. See description of GencodeTranscriptId] ENST00000378609
Allele Frequency
Field Name Description Example
AlleleFrequenciesFromIllumina174k Allele frequencies of TRs detected from 2,504 individuals in the 1000 Genomes Project -
AlleleFrequenciesFromT2TAssemblies Allele frequencies of TRs detected using 78 haplotype-resolved T2T assemblies from the Human Pangenome Reference Consortium (HPRC) and Human Genome Structural Variation Consortium (HGSVC) -
All Allele frequencies from all populations of TR-Atlas -
African Allele frequencies from African populations of TR-Atlas -
South_Asian Allele frequencies from South Asian populations of TR-Atlas -
East_Asian Allele frequencies from East Asian populations of TR-Atlas -
European Allele frequencies from European populations of TR-Atlas -
Hispanic Allele frequencies from Hispanic populations of TR-Atlas -
STRAS Prediction
Field Name Description Example
genetype one of the following values, "coding", "coding;nonCoding", "coding;nonCoding;pseudo", "coding;pseudo", "intergenic", "nonCoding", "nonCoding;pseudo", "pseudo" coding
genename gene symbol GNB1
protien_coding whether TR is located in the CDS region of the protein-coding genes protien_coding
exon whether TR is located in the exon region of the gene exon
UTR whether TR is located in the UTR region of the gene three_prime_UTR
mean_1000g_50Han the mean of TR repeat number in cohort1 dataset containing 50 Beijing Han (North China) individuals of 1000 Genome Project 8
sd_1000g_50Han the standard deviation of TR repeat number in cohort1 dataset containing 50 Beijing Han (North China) individuals of 1000 Genome Project 2.42
max_1000g_50Han the max of TR repeat number in cohort1 dataset containing 50 Beijing Han (North China) individuals of 1000 Genome Project 30
mean_1000g_100 the mean of TR repeat number in cohort2 dataset containing 100 individuals from 6 ethnic groups of 1000 Genome Project 8
sd_1000g_100 the standard deviation of TR repeat number in cohort2 dataset containing 100 individuals from 6 ethnic groups of 1000 Genome Project 2.42
max_1000g_100 the max of TR repeat number in cohort2 dataset containing 100 individuals from 6 ethnic groups of 1000 Genome Project 29
TADboundry whether TR is intersected with a TADboundry TADboundry
CTCF whether TR is intersected with a CTCF CTCF
TFBS whether TR is intersected with a TFBS TFBS
gene_pLI probability of being loss-of-function intolerant 0.01
HPO phenotype records in HPO of the intersect gene HP:0000952:Jaundice;
expansion_rate (repeat number-max of cohort2) /max of cohort2 0.5
score STRAS prediction score 0.446
RExPRT Prediction
Field Name Description Example
gene intersected gene intergenic
location TR located in intergenic, First exon, Middle exon, or Last exon intergenic
region TR located in intergenic, intron, or exon intergenic
gene_type one of the following values-antisense,IG_C_gene,intergenic,lincRNA,polymorphic_pseudogene,processed_transcript,protein_coding,pseudogene,sense_overlapping,Unknown intergenic
gene_distance the distance between the TR locus and the nearest gene 8755
gerp the GERP score is defined as the reduction in the number of substitutions in the multi-species sequence alignment compared to the neutral expectation. 0.013
TAD whether located in topologically associated domain 0
eSTR Is it an eSTR 1
opReg whether located in open regulatory regions 1
tissue_simple Nervous_System-the max expression of the gene is in nervous system; No_expression-the expression of the gene is not detected; Other-the max expression of the gene is in other tissue. No_expression
promoter whether located in promoter 0
UTR_3 whether located in 3' UTR 0
UTR_5 whether located in 5' UTR 0
loeuf the loss-of-function observed/expected upper bound fraction. LOEUF is a conservative estimate of evolutionary selection against disease-causing variants based on the upper limit of the confidence interval for the observed/expected pLoF mutation rate. Genes with lower observed/expected pLoF variant ratios are evolutionarily constrained; LOEUF scores are binned into deciles ranging from 0 being most constrained, to 9, indicating least constrained. A lower LOEUF value indicates a more essential gene. 9
pLi probability of being loss-of-function intolerant 0
RAD21 RAD21 transcription factor binding sites number 1
SMC3 SMC3 transcription factor binding sites number 1
per_g the percentage of G base in the motif 0
per_c the percentage of C base in the motif 0
per_a the percentage of A base in the motif 25
per_t the percentage of T base in the motif 75
gc_content the percentage of G and C bases in the motif 0
eSh0 star network's topological indices-Shannon entropy 3.71
eSh1 star network's topological indices-Shannon entropy 3.69
eSh2 star network's topological indices-Shannon entropy 3.69
eSh3 star network's topological indices-Shannon entropy 3.68
eSh4 star network's topological indices-Shannon entropy 3.68
eSh5 star network's topological indices-Shannon entropy 3.68
eTr0 star network's topological indices-spectral moments 41
eTr1 star network's topological indices-spectral moments 0
eTr2 star network's topological indices-spectral moments 14.5
eTr3 star network's topological indices-spectral moments 1.5
eTr4 star network's topological indices-spectral moments 8.32
eTr5 star network's topological indices-spectral moments 2.36
eH star network's topological indices-Harray number 135.42
eW star network's topological indices-Wiener index 11480
eS6 star network's topological indices-Gutman topological index 2287.04
eS star network's topological indices-Schultz topological index 64898
J star network's topological indices-Balaban distance connectivity index 0.33
eX0 star network's topological indices-Kier-Hall connectivity index 24.79
eX1R star network's topological indices-Randic connectivity index 20.25
eX2 star network's topological indices-Kier-Hall connectivity index 16.53
eX3 star network's topological indices-Kier-Hall connectivity index 13.07
eX4 star network's topological indices-Kier-Hall connectivity index 10.42
eX5 star network's topological indices-Kier-Hall connectivity index 8.36
SVM the support vector machine prediction for likelihood of pathogenicity (0 = Benign, 1 = Pathogenic) 0.00558030318633804
XGB the XGBoost prediction for likelihood of pathogenicity 1.6952976e-05
EnsembleConfidence confidence score producing by calculating a sum of the SVM and XGB scores. 0.00559725616212212
EnsembleMax the maximum score between the SVM and XGB predictions 0.00558030318633804
EnsembleBinary rounding the ensemble maximum score to the nearest integer (0 or 1). 0
PDF Report Contents:
  • Basic Information: Basic information section.
  • Gene Information: Gene annotation information.
  • Allele Frequencies: Allele frequency bar charts.
  • STRAS Prediction: STRAS tool prediction results.
  • RExPRT Prediction: RExPRT tool prediction results.

Q1: What should I do if no results are returned?

A: Please check if the LocusId format is correct, ensuring the use of English hyphens as separators. If the problem persists, please contact the system administrator.

Q2: How long does it take to generate a PDF report?

A: Under normal circumstances, a single prediction and report generation completes within 5 minutes. It may take longer if the system is busy.

Q3: Is batch prediction supported?

A: The current version supports both single and batch prediction.

Q4: What is the source of the webserver?

A: The basic annotation information for the tandem repeat loci is from the trexplorer-catalog and TRAD database. The allele frequencies were obtained from Illumina 174k polymorphic TRs in 1kGP, Polymorphic TRs in 78 T2T assemblies and TR-Atlas. The effects prediction was performed using STRAS and RExPRT.

Q5: How should I cite this system?

A: When using reports generated by this system, please cite: TREFFIC v1.0.

Contact Information:
  • System Administrator: chenxiaowei^at^ibp^dot^ac^dot^cn
  • Technical Support: chenxiaowei^at^ibp^dot^ac^dot^cn
  • Data Issues: chenxiaowei^at^ibp^dot^ac^dot^cn
System Information:
  • Version: 1.0.0