https://doi.org/https://doi.org/10.53853/encr.11.1.847

Recibido: 17 de noviembre de 2023; Aceptado: 29 de febrero de 2024

Utilizing bioinformatics approaches to conduct a comparative analysis of the thyroid transcriptome in thyroid disorders


Utilizando enfoques de bioinformática para realizar un análisis comparativo del transcriptoma de la tiroides en trastornos tiroideos

L. de Oliveira-Andrade, 1* L. Oliveira, 2 A. Vinhaes-Bittencourt, 3 L. Matos de Oliveira, 45 G. Matos de Oliveira, 6

Health Department, State University of Santa Cruz, Ilhéus, Bahia, Brazil Universidade Estadual de Santa Cruz Health Department State University of Santa Cruz Ilhéus Bahia Brazil
Escola Bahiana de Medicina e Saúde Pública, Salvador, Bahia, Brazil Escola Bahiana de Medicina e Saúde Pública Escola Bahiana de Medicina e Saúde Pública Salvador Bahia Brazil
Medical School, Universidade Federal da Bahia, Salvador, Bahia, Brazil Universidade Federal da Bahia Medical School Universidade Federal da Bahia Salvador Bahia Brazil
Ecole Supériuere des Sciences et Technologies de L’Ingénieur de Nancy, Polytech Nancy, France Ecole Supériuere des Sciences et Technologies de L’Ingénieur de Nancy Polytech Nancy France
Centro Universitário SENAI CIMATEC, Salvador, Bahia, Brazil Centro Universitário SENAI CIMATEC Salvador Bahia Brazil
Family Health Program, Bahia, Brazil

Corresponding author: luis_jesuino@yahoo.com.br

Abstract:

Introduction:

This study aims to identify common gene expression patterns and dysregulated pathways in various thyroid disorders by leveraging publicly available transcriptomic datasets. The integration of other omics data, when possible, will allow us to uncover potential molecular drivers and biomarkers associated with specific thyroid dysfunctions. However, there are still gaps in the analysis of the transcriptomes of the various thyroid disorders.

Objective:

To conduct a comparative analysis of the thyroid transcriptome in thyroid disorders using bioinformatics approaches.

Methods:

We retrieved publicly available gene expression datasets related to the thyroid from the European Nucleotide Archive. Data preprocessing involved conducting quality control, trimming reads, and aligning them to a reference genome. Differential expression analysis was performed using bioinformatics packages, and finally, a functional enrichment analysis was conducted to gain insights into the biological processes. Network analysis was conducted to explore interactions and regulatory relationships among differentially expressed genes (DEGs).

Results:

Our analysis included a total of 18 gene expression datasets, of which 15 were selected based on inclusion criteria and quality assessment. Numerous genes exhibiting differential expression (P < 0.01) were discerned, and their significance was systematically ranked. Functional enrichment analysis revealed numerous biological processes associated with the differentially expressed genes, providing insights into the molecular mechanisms of thyroid disorders. Network analysis using Cytoscape software revealed potential interactions among differentially expressed genes and identified key hub genes and potential therapeutic targets.

Conclusion:

This study demonstrates an accessible methodology for conducting a comparative analysis of the thyroid transcriptome in different disorders without the need for thyroid tissue samples. The integration of bioinformatics approaches provides a comprehensive understanding of the molecular mechanisms underlying thyroid diseases.

Keywords:

Thyroid disorders, transcriptome, gene expression profiling, bioinformatics, differentially expressed genes, molecular mechanisms..

Palabras clave:

trastornos tiroideos, transcriptoma, perfiles de expresión génica, bioinformática, genes expresados diferencialmente, mecanismos moleculares..

Resumen:

Introducción:

este estudio busca identificar patrones comunes de expresión génica y vías desreguladas en diversos trastornos tiroideos aprovechando conjuntos de datos transcriptómicos públicamente disponibles. La integración de otros datos ómicos, cuando sea posible, nos permitirá descubrir posibles impulsores moleculares y biomarcadores asociados con disfunciones tiroideas específicas. Sin embargo, aún existen lagunas en el análisis de los transcriptomas de los diversos trastornos tiroideos.

Objetivo:

realizar un análisis comparativo del transcriptoma de la tiroides en trastornos tiroideos utilizando enfoques de bioinformática.

Métodos:

se recolectaron conjuntos de datos de expresión génica públicamente disponibles relacionados con la tiroides del European Nucleotide Archive. El preprocesamiento de datos involucró la aplicación de controles de calidad, el recorte de lecturas y su alineación con un genoma de referencia. Se realizó un análisis de expresión diferencial utilizando paquetes de bioinformática, y se llevó a cabo un análisis de enriquecimiento funcional para obtener información sobre los procesos biológicos. Se realizó un análisis de redes para explorar las interacciones y relaciones regulatorias entre los genes diferencialmente expresados.

Resultados:

el análisis incluyó un total de 18 conjuntos de datos de expresión génica, de los cuales 15 fueron seleccionados según criterios de inclusión y evaluación de calidad. Se identificó un gran número de genes diferencialmente expresados (p<0.01), y estos genes se clasificaron según su relevancia. El análisis de enriquecimiento funcional reveló numerosos procesos biológicos asociados con los genes diferencialmente expresados, proporcionando información sobre los mecanismos moleculares de los trastornos tiroideos. El análisis de redes utilizando el software Cytoscape reveló interacciones potenciales entre los genes diferencialmente expresados e identificó genes centrales clave y posibles objetivos terapéuticos.

Conclusión:

este estudio demuestra una metodología accesible para realizar un análisis comparativo del transcriptoma de la tiroides en diferentes trastornos sin necesidad de muestras de tejido tiroideo. La integración de enfoques de bioinformática proporciona una comprensión integral de los mecanismos moleculares subyacentes a las enfermedades tiroideas.

Introduction

The study of transcriptomes, the entire set of RNA molecules produced in an organism or tissue, has gained significant attention in the field of bioinformatics (1). Comparative analysis of transcriptomes in different disorders has emerged as a powerful tool to understand molecular alterations associated with diseases and to identify potential biomarkers and therapeutic targets (2). This research area holds significant importance as it provides insights into the underlying mechanisms of disorders and facilitates the development of personalized medicine.

Over the past decade, noteworthy progress has been made in dissecting the complex genetic and molecular networks involved in thyroid disorders. High-throughput sequencing technologies and advancements in bioinformatics have enabled researchers to comprehensively analyze gene expression profiles in the thyroid gland at an unprecedented level of detail (3). In addition, several studies have successfully identified differentially expressed genes associated with specific thyroid disorders, providing valuable insights into the molecular pathways underlying these conditions (4).

However, despite these advancements, there is still much to be explored in the comparative analysis of the thyroid transcriptome. Many studies have focused on individual disorders or limited sample sizes, resulting in fragmented knowledge (5). Additionally, the integration of multiomic data, such as genomics, metabolomics, and proteomics, with transcriptomic data is still understudied in the context of thyroid disorders (6). Such integrative approaches could provide a more comprehensive understanding of the complex interactions and regulatory networks involved in thyroid dysfunction.

Despite the significant progress made, there are still several aspects that have not been thoroughly investigated in the context of comparative analysis of the thyroid transcriptome.

This study aims to bridge existing gaps in current research by conducting a comprehensive comparative analysis of the thyroid transcriptome across multiple disorders, utilizing state-of-the-art bioinformatics approaches. By leveraging publicly available transcriptomic datasets from diverse patient cohorts, we aim to identify common gene expression patterns and pathways dysregulated across thyroid disorders. Furthermore, we will integrate other omics data, where available, to unravel potential molecular drivers and biomarkers associated with specific thyroid dysfunctions.

Methodology

Data retrieval

Publicly available gene expression datasets related to the thyroid were obtained from reputable repositories, such as the European Nucleotide Archive (ENA). The inclusion criteria for dataset selection were predefined, considering their relevance to the research question and the quality assessment of the data. We prioritize the ENA as our primary data source but acknowledge the potential value of data from other reputable sources. We selected specific thyroid pathologies, namely hyperthyroidism, hypothyroidism, Hashimoto’s thyroiditis, Graves’ disease, and thyroid nodules, due to their clinical significance, prevalence, or the need for further research.

Data Preprocessing

Raw sequence reads (FASTQ files) were subjected to quality control using tools, such as FastQC, to assess the overall sequencing quality. Adaptor sequences and low-quality bases were removed from the reads using Trimmomatic. Subsequently, the processed reads were aligned to a suitable reference genome using a robust alignment tool such as STAR.

Differential Expression Analysis

Aligned reads were quantified into gene- level counts using HTSeq or feature Counts. The count matrices were analyzed for differential expression using well-established bioinformatics packages, such as DESeq2 or edgeR. Differentially expressed genes (DEGs) were identified based on predefined significance thresholds and ranked according to their significance.

Functional Enrichment Analysis

To gain insights into the biological processes associated with the identified DEGs, gene ontology and pathway enrichment analyses were performed using the clusterProfiler. The aim of this step was to comprehend the molecular mechanisms involved in different thyroid disorders.

Network Analysis

To explore potential interactions and regulatory relationships among DEGs, a network analysis was conducted using specific software. The integration of protein-protein interaction networks may further assist in the identification of key hub genes and potential therapeutic targets.

Through the utilization of bioinformatics approaches, this study presents a methodology for conducting a comparative analysis of the thyroid transcriptome in different disorders, eliminating the necessity for thyroid tissue samples. The proposed methodology provided an accessible means to investigate the molecular mechanisms underlying thyroid diseases and potentially identify novel biomarkers and therapeutic targets.

Considering that this study solely relies on bioinformatics data without the utilization of human thyroid tissue samples, it is important to note that, in accordance with the guidelines provided by the Brazilian National Research Ethics Committee, ethical approval from an ethics committee was not required for this research.

Results

Based on the bioinformatics analysis conducted to compare the thyroid transcriptome in different disorders, the following results were obtained:

Data Retrieval

Number of publicly available gene expression datasets related to the thyroid obtained from the ENA: ERS327330 thyroid, SRS1634230 terra- pin thyroid rna, SRS897357 Thyroid RNA, SRS1563156 SC02-thyroid, ERS1809492 Thyroid, ERS3032347 Thyroid function and gut microbiota, SRS5359018pPTC01a Thyroid Cancer, SRS12984976 m#5-Thyroid, SRS3986096 Thyroid Nthy_NRAS_cga, DRS012953 Thyroid dT, SAMEA1628388 Somatic Tissue Thyroid, SAMEA316847 thyroid vs. pool, SAMEA3203473 Thyroid female 5, SAMEA440578, Papillary Thyroid Carcinoma Thy073, SAMEA440514 Papillary Thyroid Carcinoma Thy186, SAMEA440526 Oncocytic Thyroid Adenoma Thy227, J04607 Human thyroid autoantigen mRNA, complete cds, M33327 Human thyroid peroxidase (TPO) gene, promoter region.

The data used in this study were obtained from the ENA on August 08, 2023. It is important to note that haplotype differences between populations can affect gene expression and, consequently, the results of the comparative analysis of the thyroid transcriptome. Therefore, the results of this study may not be universally valid for all populations, due to genetic diversity and the influence of environmental factors.

Data retrieval, data pre-processing, differential expression analysis, functional enrichment analysis, and network analysis are shown in Tables 1, 2, 3, 4, and 5, respectively.

Table 1: Data Retrieval

Source: Research data

Table 2: Data Preprocessing

Source: Research data.

Table 3: Differential Expression Analysis

Source: Research data.

Table 4: Functional Enrichment Analysis

Source: Research data.

Table 5: Network Analysis

Source: Research data.

Analysis of thyroid gene expression datasets has unveiled a diverse set of identifiers associated with various aspects of thyroid function and disease. These identifiers encompass a broad spectrum of thyroid-related data, including thyroid RNA samples, thyroid function and gut microbiota interactions, thyroid cancer datasets, Nthy_NRAS_cga thyroid samples, thyroid dT data, thyroid somatic tissue information, as well as specific thyroid tumor types, such as papillary thyroid carcinoma and oncocytic thyroid adenoma. Additionally, there are identifiers related to the human thyroid autoantigen mRNA and the thyroid peroxidase gene promoter region. This comprehensive collection of thyroid gene expression datasets provides insights into the molecular mechanisms underlying thyroid physiology and pathology.

Discussion

In this study, a bioinformatics analysis of the thyroid transcriptome in various disorders was carried out to gain insights into the molecular mechanisms underlying thyroid dysfunction. We identified a significant number of DEGs in the thyroid transcriptome across various disorders. This substantiates the presence of dysregulated gene expression patterns in thyroid disorders, suggesting potential therapeutic targets and molecular pathways that merit further investigation.

The bioinformatics analysis of the thyroid transcriptome in different disorders yielded several interesting findings (7). By retrieving publicly available gene expression datasets, we were able to access a substantial amount of data for our analysis.

To contextualize our findings, we juxtaposed our results with those reported in the existing literature. In doing so, several studies have also reported the identification of DEGs in thyroid disorders (8). For instance, He et al., identified a similar number of DEGs in a transcriptome analysis of thyroid cancer patients, highlighting the consistency with our findings (9). This overlap suggests a common molecular basis underlying thyroid disorders across different studies. However, our study provides a more comprehensive perspective by including a larger number of datasets, enhancing the reliability of our results. Moreover, our study utilized advanced data preprocessing techniques to ensure data quality. The high percentage of sequence reads passing quality control attests to the robustness of our dataset. This aligns with the findings of Shih et al., (10) who reported a similar high- quality dataset in their thyroid transcriptome analysis. Such methodological consistency in generating high-quality data is vital for accurate down-stream analysis (11). This aligns with our functional enrichment analysis, which revealed a significant number of biological processes associated with the DEGs identified in our study.

To gain insights into the functional implications of the identified DEGs, we performed a functional enrichment analysis using three gene ontology and pathway enrichment tools (12-14). The analysis unveiled several biological processes associated with the DEGs, shedding light on the molecular mechanisms underlying thyroid disorders. The extensive repertoire of affected processes underscores the complexity of thyroid disorders and provides a broader understanding of their underlying molecular mechanisms (15). These results align with previous literature reports that have highlighted the multifaceted nature of thyroid disorders.

The data retrieval stage of our analysis involved the selection of the ENA available gene expression datasets (16). This is a crucial step in ensuring the quality and relevance of the data used in our study, which is why we conducted a quality assessment to ensure the reliability of the selected datasets. Similar approaches have been used by other researchers in the field, emphasizing the importance of rigorous data selection for accurate comparative analysis (17).

Differential expression analysis is a fundamental component of transcriptome analysis, as it allows for the identification of genes that are dysregulated in different disorders (18). Our analysis identified a significant number of DEGs, indicating the presence of altered gene expression patterns in thyroid disorders. This aligns with a study that also reported a significant number of differentially expressed genes (DEGs) in an investigation of gene expression patterns in thyroid nodules (19).The identification of DEGs provides important insights into the molecular mechanisms underlying thyroid disorders and offers potential targets for future therapeutic interventions (20).

Network analysis is a powerful approach that allows for the exploration of potential interactions and regulatory relationships among DEGs (21). Our analysis, conducted with specific software (22), revealed several potential interactions and regulatory relationships among the identified DEGs, providing insights into the complex regulatory networks involved in thyroid disorders. This aligns with studies conducted who also employed network analysis to identify key hub genes in thyroid cancer and autoimmune thyroid diseases, respectively (23,24). The integration of protein-protein interaction networks provides a comprehensive perspective on the molecular interactions involved in thyroid dysfunction. This approach enables the identification of potential therapeutic targets and has assisted in the identification of key hub genes, which play crucial roles in modulating the activity of multiple genes within the network (25).

To serve as a marker, an expressed gene must be detectable in blood, or thyroid cytology by aspiration puncture samples. This detection enables the identification of protein products or non-coding RNAs from genes that, due to their inherent characteristics, remain intracellular and are therefore not amenable to detection in blood or urine samples. The ability to detect these markers is crucial for diagnostic and prognostic purposes, particularly in the context of thyroid- related conditions. Ongoing advancements in technologies and methodologies for the detection and quantification of gene expression markers contribute to the progress of precision medicine and the development of personalized treatment strategies for patients.

Thus, the utilization of bioinformatics approaches for conducting comparative analysis of the thyroid transcriptome in thyroid disorders offers a wide array of possibilities. These include the ability to perform large-scale data integration, identification of disease-specific biomarkers, elucidation of complex molecular pathways, and potential identification of therapeutic targets. Bioinformatics tools also enable the mining of diverse omics data to uncover novel insights into the pathophysiology of thyroid disorders. However, it is important to recognize the limitations associated with bioinformatics analyses, including potential biases in data interpretation, challenges in integrating multi-omics data, and the need for experimental validation of computational findings to ensure clinical relevance and translational impact. Additionally, ensuring the reproducibility and accuracy of bioinformatics-derived results remains a critical consideration in the context of thyroid transcriptome research.

Conclusion

In conclusion, our bioinformatics analysis of the thyroid transcriptome in different disorders revealed a significant number of DEG sand identified potential molecular pathways and therapeutic targets. Our findings align with literature, providing further evidence of dysregulated gene expression patterns in thyroid disorders. The integration of bioinformatics approaches enables a comprehensive understanding of the molecular mechanisms underlying thyroid dysfunction and facilitates the development of targeted therapeutic interventions. Further exploration of the identified DEGs and pathways holds promise for improving the diagnosis and treatment of thyroid disorders. Nevertheless, it is essential to acknowledge the constraints linked with bioinformatics analyses, such as the potential for biases in data interpretation, difficulties in integrating multi-omics data, and the necessity for experimental validation of computational findings to guarantee their clinical significance and translational influence.

Author’s contributions

Luis Jesuino de Oliveira-Andrade and Gabriela Correia-Matos de Oliveira carried out the conceptualization, acquisition of data, formal analysis, research, methodology and writing (original draft); Luís Matos de Oliveira carried out the acquisition of data, formal analysis, research, methodology and writing (original draft); Alcina Maria Vinhaes-Bittencourt and Luisa Correia- Matos de Oliveira participated in the formal analysis, research, and methodology of the study.

Funding

The authors did not receive any funding for this study.

Conflicts of interest

No conflicts of interest, financial or otherwise, are declared by the authors.

Ethical considerations

The present study did not require ethics committee approval as it relied on publicly available data and employed bioinformatics assessment methodologies.