Mutations in coding and non-coding regions of the genome jointly affect tumorigenesis.

Classically, we identify cancer as the consequence of a mutation in a gene affecting the expression of a certain protein with a crucial role in cell physiology. Some of these are targeted in cancer therapy.

 

However, only a small percentage of the human genome (less than 2%) is actually translated into proteins; the rest of the non-coding genome includes DNA sequences with a regulatory role, which means modulating the expression of other genes.

 

Mutations in non-coding regions promote tumorigenesis. Recent studies have shown that, as well as genes, non-coding regions may be mutated and, if that results into their altered regulatory function, protein expression can be in turn (indirectly) affected, with detrimental consequences. For instance, mutations in the DNA region modulating the expression of TERT protein, have been linked to several tumor types.

 

Therefore, identifying other mutations in non-coding regions altering proteins' function may be useful to identify those actually damaging cell physiology -and thus driving cancer-, differentiating them from others that are instead "silent".

 

The approach. The authors of this study performed an integrated analysis of a "collection of data" (a dataset) gathering genomic information of almost 1000 tumor samples ("a"), with a dataset of expression level ("b", namely a dataset containing information about changes in expression level of the genes associated with the mutated non-coding DNA sequences identified in dataset "a").

 

First, they used these datasets to build a computational model (a tool that combines these data, finds correlation, identifying mutations in non-coding regions that are shared among many samples.

 

The results obtained (for instance a n number of mutations identified) were then validated by using this computational tool to identify mutations in another dataset ("c", assembled from a different collection of tumor samples).

 

To make it simple (very simple) is what mathematicians do with equations. If we have:

 

x + 3 = 10 (a)

 

x + y = 12 (b)

 

They use the "a" equation to assign a value to x (x = 10 – 3 = 7) and then use this value to solve "b" (7 + y = 12, from which y = 5).

 

Then, to check whether actually x = 7 and y = 5 (meaning to "validate" the values we found for x and y), we can use another equation ("c"):

 

x – y = 2 (c)

 

Replacing the values we previously found (x = 7 and y = 5), the result is, indeed, 2.

 

Our finding is validated.

 

The result. By using this approach, they identified frequent mutations in the non-coding genome of cancer tissue that, by altering the regulatory function, affect gene expression (meaning protein expression of a gene modulated by this regulatory region). Integrating the information provided by the two datasets is pivotal to identify key mutations to be potentially used as target for cancer therapy, as mutations are broad and not all of them are biologically meaningful.

 

Among these, for instance, the one in the non-regulatory region of the DAAM1 gene, often occurring in patients with metastatic melanoma, enhanced protein expression and cell invasiveness, as consistent with the invasive profile of metastatic cells, thus linking a non-coding mutation of DAAM1 to a cell trait: invasiveness.

 

Analyzing the link between mutations in coding regions previously described in cancer and those in non-coding regions indirectly affecting protein expression level -as found by their analysis- authors showed that they may occur in combination, as part of the same cell signaling, converging into a tumorigenic phenotype, which further explains the complexity of cancer processes.

 

Reference: A global transcriptional network connecting noncoding mutations to changes in tumor gene expression. Zhang, Bojorquez-Gomez, Velez, Xu, Sanchez, Shen, Chen, Licon, Melton, Olson, Yu, Huang, Carter, Farley, Snyder, Fraley, Kreisberg, Ideker. Nat Genet. Apr 2018