Featured

CRISPR CAS9 – BIOINFORMATIC TOOL

Our world is built on biology and once we begin to understand it, it then becomes a technology.

 – Ryan Bethencourt

ABSTRACT

 American scientist Jennifer A. Doudna and French scientist Emmanuelle Charpentier (sixth and seventh women Nobel prize winners) co-invented CRISPR/Cas9 gene editing. CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats. It is a piece of genetic information that some bacterial species use as part of an antiviral mechanism. 

CRISPR-Cas9 has a strong relationship with bioinformatics. CRISPR-Cas9 is a genome editing tool that allows researchers to make precise changes to DNA sequences. To use this tool effectively, researchers need to design guide RNAs that can recognize and target specific DNA sequences. Bioinformatics tools and techniques are essential for designing these guide RNAs and analysing the data generated by CRISPR-Cas9 experiments.

HOW DOES CRISPR CAS 9 WORK:

CRISPR is a natural process that functions as a bacterial immune system against invading viruses. They use two main components. The first is short snippets of repetitive DNA sequences called CRISPR. The second is Cas or Crispr-associated proteins which chop up DNA like molecular scissors. 

https://www.labiotech.eu/in-depth/crispr-cas9-review-gene-editing-tool/

When a virus enters a bacteria Cas proteins cut out a segment of the viral DNA to stitch into the bacterium’s CRISPR region, identifying the chemical code of the infection. Those viral codes are then copied into short pieces of RNA. This RNA binds to a special protein called Cas9. The resulting complex acts like guards latching onto free-floating genetic material and searching for the match to the virus. If the virus invades again, the complex recognises it immediately and Cas9 destroys the viral DNA. Lots of bacteria like halophiles, E. coli, and clostridium have this type of defence mechanism. 

CRISPR CAS9 IN MICE:

The CRISPR/Cas9 system simplifies the entire process of creating knockout mouse models. This technology is used to knock out or modify DNA in research mice to study disease phenotypes and develop new treatments. A knockout mouse is a genetically modified mouse in which researchers have inactivated, or “knocked out”, an existing gene by replacing it or disrupting it with an artificial piece of DNA. Using CRISPR we can mutate several suspected cancer genes simultaneously in the somatic cells of adult mice. 

CRISPR knock-ins have also corrected disease-causing gene defects in adult mice, such as the mutations that cause haemophilia and sickle cell anaemia. CRISPR was previously used to treat progeria and muscular dystrophy in mice. 

https://www.semanticscholar.org/paper/Hutchinson-Gilford-Progeria-Syndrome-Ibrahim/890309e16142b82270411b5b7c67e034210165a0

PROGERIA IN MICE

DISEASES THAT CAN BE CURED USING CRISPR CAS9

CRISPR technology could cure many human hereditary diseases such as haemophilia, β-thalassemia, cystic fibrosis, Alzheimer’s, Huntington’s, Parkinson’s, tyrosinemia, Duchenne muscular dystrophy, Tay-Sachs, and fragile X syndrome disorders. CRISPR is used to edit the gene by changing the DNA from a harmful variant to a healthy variant. In addition, CRISPR is now being developed as a rapid diagnostic.

https://www.sciencedirect.com/science/article/pii/S1525001620304858

CRISPR CAS9 BIOINFORMATICS

In recent years, the CRISPR-Cas system has also been engineered to facilitate target gene editing in eukaryotic genomes. Bioinformatics played an essential role in the detection and analysis of CRISPR systems and here we review the bioinformatics-based efforts that pushed the field of CRISPR-Cas research further. CRISPR-Cas design tools are computer software platforms and bioinformatics tools used to facilitate the design of guide RNAs (gRNAs) for use with the CRISPR/Cas gene editing system. The performance of CRISPR/Cas relies on well-designed single-guide RNA (sgRNA), so a lot of bioinformatic tools have been developed to assist the sgRNA. These tools vary in design specifications, parameters, genomes etc.

Bioinformatics can help identify potential off-target effects of CRISPR-Cas9, which is critical for ensuring the specificity of the tool. Additionally, bioinformatics can be used to analyse the genomic data generated by CRISPR-Cas9 experiments to identify the effects of specific genetic modifications.

CONCLUSION

Compared with traditional gene-editing technology, CRISPR-Cas9 has a higher gene-editing efficiency, lower off-target effect, and no DNA integration, so it is an ideal gene-editing technology. They can modify DNA with great precision. This technique is known for its simplicity and efficiency. It also reduces the time required for the modification of target DNA.

Overall, the use of CRISPR-Cas9 in research is highly dependent on bioinformatics to design guide RNAs, analyse data, and ensure the specificity and safety of the tool.

Written by: Sathiga Devi P, 1st-year B.Tech Biotechnology

Featured

Preserving the future: Vital role of bioinformatics in the Arctic seed vault

“Plant conservation is not just about saving species from extinction, but about ensuring that the future generations have access to benefits provided by a rich and diverse plant life.”  

-Hans Dieter Sues

https://images.app.goo.gl/C51a669cMAk5JatZ6

Preserving seeds has been practiced by humans for thousands of years. Earlier people used to save seeds for future cropping seasons in earthen pots. Today with the advent of modern agricultural and bioinformatics technologies, the government of Norway and global crop diversity trust has established Svalbard seed bank, located in Norway which holds millions of seed samples from all over the world. These seeds are represented as critical resources for ensuring food security & for studying the genetic diversity of the plants. The Arctic seed vault works as a backup repository for the seed banks from all over the world, securing them from natural and manmade disasters.

Various forms of bioinformatics technologies used in Svalbard seed vault.

1.Database and record keeping systems:

Large collection of seeds stored in the vault are managed by this system. They help in efficient tracking of seed samples by including their origin, genetic diversity, and germination rates, which are all crucial information needed for preservation and conversation efforts.

2.Phylogenetic analysis:

This technique is used to determine the relationship between plant species and the evolutionary history of species over time. It helps us in predicting how plant species would respond to changing environmental conditions in the future & develop strategies to preserve them.

3.Genetic sequencing and analysis:

This technique is used to study the DNA of seeds stored in the vault, which helps in determining their potential to adapt to environmental changes and in breeding programmes.

4.Seed viability analysis:

It is the process of determining the health and viability of stored seeds in the seed vault which gives the potential for germination of stored seeds for future uses.

Some more techniques like gene expression analysis, genotyping, etc. are also a part of the bioinformatics tools used in the arctic seed vault.

Advantages of bioinformatics tools in seed banking:

1. Increased accuracy and efficiency of seed preservation and characterisation:

Helps to analyse large amounts of genetic information rapidly & accurately from seed samples, which determines the seed characteristics quickly.  

2.Better management of seed bank data:

Bioinformatics is used to store, manage, and analyse the seed bank data, including information of seed samples, seed storage conditions, and seed viability, which ensures the reliability of the data and improves the management of seed bank, and also help in facilitation of genetic diversity & improved seed preservation techniques.

Disadvantages of bioinformatics in seed banking

While bioinformatics does have several advantages in seed banking, there are also some potential disadvantages:

1.Cost:

Bioinformatics tools are highly expensive, the seed bank needs to invest a lot on these software’s and hardware to analyse data. 

2.Lack of technical expertise:

The field of bioinformatics requires someone with specialised technical skills to perform certain analysis in seed banks, thus the seed bank needs to hire those experts.

3.Data security and privacy concerns:

Seed banks store a large amount of confidential, sensitive genetic information which can be accessed by an unauthorized user. Thus seed banks have privacy and security risks and we need to work on it more deeply.

Even though bioinformatics has many advantages in seed banking, there are also certain disadvantages which makes it risky at times.

CONCLUSION:

The use of bioinformatics in the arctic seed vault has several advantages and disadvantages as discussed above, despite these challenges the use of bioinformatics in Svalbard seed vault is crucial for preserving and managing all the seed samples stored there and it continues to serve as an important global resource for plant diversity.

https://images.app.goo.gl/ZdFhXxZhPd7RYiHJA

Written by : C.Renganyagi , 2nd year B.Tech biotechnology.

Unleashing Innovation: AI’s Role in Agriculture technology

In the future, biotechnology could enable personalized medicine, tailored to an individual’s genetic makeup, leading to more effective treatments and better patient outcomes.”

Dr. Emma Roberts

ABSTRACT

In recent years, the integration of Artificial Intelligence (AI) in the field of biotechnology has emerged as a transformative force, driving innovation and efficiency across various facets of research and development. This review synthesizes the current landscape of AI applications within biotechnology, elucidating its profound impact on diverse areas, including drug discovery, genomics, diagnostics, and personalized medicine. We delve into the pivotal role of machine learning algorithms and natural language processing in interpreting complex biological datasets, facilitating the identification of novel biomarkers, drug candidates, and therapeutic targets. The adoption of AI technologies in biotechnology is not without challenges, and we discuss the current limitations and ethical considerations associated with these advancements. The need for robust data governance, model interpretability, and ongoing collaboration between computational scientists and biotechnologists is emphasized to ensure responsible and effective implementation.

With the recent advancements in ease of accessibility of data and better techniques, it is now possible to perform the same experiments at a cheaper and much faster rate compared to a decade ago. In the biotechnology sector alone, sequencing and other high-throughput instrumentation technologies have been developing at an exponential rate that have surpassed even the Moore’s Law. The humongous amount of data that is generated and stored in the biotechnology area has created a range of new opportunities for researchers, while also enhancing companies’ growth potential. AI models use the basic principles of statistical methods in biotechnology, while having much advanced and accurate results.

AI in Agricultural Technology

Around 40% of the world’s population is employed in agriculture, including both small-scale and large-scale farming activities. AI plays a crucial role in agricultural biotechnology by enhancing various aspects of research, development, and practical applications. Here are several ways in which AI contributes to agricultural biotechnology: Genetic Analysis and Modification, Crop Yield Prediction, Disease Detection and Management, Precision Agriculture, Climate Resilience, Bioprocessing and Metabolic Engineering, Crop Breeding, Data Integration and Analysis, Phenotyping and Trait Characterization. Figure 4 shows the diverse categories of agriculture that are being introduced to AI.

There are several methods of machine learning that can be applied in the agriculture, some of them include: Regression, clustering, Bayesian Models, instance based models, decision trees, artificial neural networks, support vector machines and ensemble learning. With the recent advancements in AI technology in drones, there has been a massive increase in the crop production in a shorter span of time compared to the traditional methods of agriculture. There has also been a recent trend in introduction of automated robotics to reduce the mechanical requirements and speed up the process and generate a larger output.

Automation in the agricultural sector requires accurate information of the past, present and future data and trends to improve the yield. To achieve this, the concept of Agricultural Data Space(ADS) that focuses engineering, biofoundries and de novo design of function proteins. It uses simple quantifiable methods to help understand the activities occurring biologically. After the advent of computer vision, deep learning and machine learning, there has been a possibility of non invasive imaging procedures with better quality picture resourcing. In addition to showing better performance in terms of external factors that look into the phenotype of the crop, it is also possible to obtain genotypic information easily due to the deep learning and neural network methods. Combining the tools used in genotypic and phenotypic analysis, the crop improvement can be rapidly increased. Genomic analysis of crops also helps in identifying adaptive crops specific to a particular habitat. This has resulted in a significant rise in the crop yield by harnessing its potential in cultivating specific crops to a specific compatible climate.

Some of the computer based tools include: thermal imaging, chlorophyll fluorescence, digital imaging, spectroscopic imaging, imaging sensors, growth chambers, data management and analysis software. Apart from these existing techniques, there has also been a rise in the use of cyberinfrastructure, an integrated system of hardware, software, networks, data, and human expertise designed to support advanced computational and data-intensive research and development activities. It is a term often used in the context of scientific and engineering research that relies on high-performance computing, large-scale data storage and management, advanced networking, and other digital technologies. With the incorporation of robotics and visual sensors, it has also been made possible to focus on sustainable agriculture and develop models for tissue culture.

CONCLUSION

In conclusion, artificial intelligence (AI) stands as a transformative force within agricultural technology, revolutionizing traditional farming practices and enhancing efficiency, productivity, and sustainability across the entire food supply chain. Through advanced data analytics, machine learning algorithms, and robotics, AI enables farmers to make data-driven decisions, optimize resource allocation, and mitigate risks associated with weather, pests, and disease. Moreover, AI-driven solutions facilitate precision agriculture, enabling farmers to tailor interventions at a granular level, thereby conserving resources and minimizing environmental impact. As AI continues to evolve, its integration into agricultural technology holds immense promise for addressing the growing challenges of feeding a rapidly expanding global population while fostering resilience in the face of climate change and resource constraints. However, realizing the full potential of AI in agriculture requires ongoing collaboration between technologists, researchers, policymakers, and farmers to ensure equitable access, ethical deployment, and sustainable outcomes for all stakeholders involved.

– Zainab Zafar

3rd Year, B.Tech Biotechnology.

What is TCGA?

“Bioinformatics is still in its early stages, but its potential is enormous. It will play a key role in developing new drugs, diagnosing diseases and understanding the very nature of life.”

- Mark Zuckerberg

Introduction

TCGA, a groundbreaking collaborative effort initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) in the United States, was designed to comprehensively characterize genomic alterations in diverse cancer types. The primary objectives were to advance our understanding of the molecular underpinnings of cancer, pinpoint potential therapeutic targets, and enhance cancer diagnosis and treatment. Commencing in 2005, the project concluded its data generation phase in 2013. TCGA encompassed the collection and analysis of genomic data from numerous cancer patients across a variety of cancer types, including, but not limited to, breast, lung, ovarian, colorectal, and brain cancers. The Cancer Genome Atlas (TCGA) involved a systematic and multi-step process to comprehensively characterize the genomic alterations in various types of cancer.

Here is an overview of the key steps in the TCGA process:

Sample collection

The primary objective of TCGA was to ensure a comprehensive representation of genetic diversity within each type of cancer. To achieve this, the collection of tumor tissue samples was undertaken, along with corresponding normal tissue samples whenever feasible. This meticulous process occurred predominantly during surgical procedures or biopsies, with explicit consent obtained from individuals diagnosed with cancer. The careful and deliberate sampling strategy was instrumental in capturing the intricate genetic nuances inherent in both cancerous and normal tissues.

DNA & RNA extraction

Genomic DNA, containing an organism’s entire set of genes, was meticulously isolated, serving as the foundation for comprehensive analyses. These analyses included whole-genome sequencing to identify genetic mutations and alterations, as well as DNA copy number analysis to assess variations in specific DNA segments, revealing insights into genomic instability often observed in cancer. Concurrently, RNA, which is responsible for transferring genetic information from DNA for protein synthesis, was also extracted. RNA played a crucial role in gene expression profiling, providing insights into active genes and variations in their activity levels between cancerous and normal tissues. This detailed molecular information proved pivotal for understanding the intricate molecular mechanisms underlying cancer development and progression.

Genomic analysis

1. Whole genome sequencing

Whole-genome sequencing was employed to scrutinize the genomic DNA comprehensively, aiming to discern genetic alterations encompassing mutations, copy number variations, and structural rearrangements. This crucial step sought to furnish a thorough and inclusive depiction of the genetic landscape inherent in the cancer samples.

2. RNA sequencing

The process of RNA sequencing was executed to quantify gene expression levels. This pivotal stage served the purpose of identifying fluctuations in gene expression, providing crucial insights into how genetic alterations influence the activity of specific genes and pathways. This analytical step played a vital role in unraveling the intricate dynamics of gene regulation associated with cancer development and progression.

Epigenetic analysis

TCGA’s exploration of epigenetic modifications involved a detailed examination of DNA methylation patterns, focusing on the addition of methyl groups to cytosine bases. This process aimed to uncover how changes in DNA methylation influence gene expression, playing a crucial role in cancer development. The analysis targeted the identification of specific methylation signatures associated with various cancer types, offering insights into the epigenetic landscape and molecular heterogeneity in cancer. These findings not only served as diagnostic markers but also provided potential targets for the development of precision therapies tailored to individual patients’ unique epigenetic profiles. TCGA’s epigenetic analysis significantly advanced our understanding of the intricate interplay between genetics and epigenetics in cancer biology.

Proteomic and clinical data collection

The integration of proteomic data within TCGA signified a holistic approach, providing a deeper understanding of the molecular landscape by analyzing protein expression levels. Proteins, being the effectors of biological functions, offer a dynamic perspective on cellular activity and contribute valuable information to complement the genomic findings. Additionally, the incorporation of comprehensive clinical information added contextual depth to the dataset, encompassing diverse aspects such as patient characteristics, treatment responses, and survival outcomes. This amalgamation of proteomic and clinical data facilitated a more nuanced exploration of the intricate relationship between molecular alterations and clinical phenotypes in cancer.

DATA – Integration, Analysis, Sharing and Accessibility

The vast amount of data generated from genomic, epigenomic, proteomic, and clinical analyses were integrated into a comprehensive dataset. Bioinformatics tools and computational methods were employed to analyze and interpret the complex relationships between genetic alterations and clinical characteristics. TCGA adopted an open-access philosophy, making its data publicly available to the research community. This approach facilitated collaboration and allowed researchers worldwide to explore, analyze, and build upon the TCGA dataset.

Bioinformatics support

Acknowledging the intricacies involved in the analysis of extensive genomic datasets, TCGA strategically instituted Genomic Data Analysis Centers (GDACs) as integral components of its framework. These specialized centers were assigned the crucial role of developing and implementing advanced bioinformatics tools and pipelines tailored specifically for processing and analyzing the substantial volume of genomic data generated throughout the project. The primary objectives of GDACs encompassed a diverse range of responsibilities.

Firstly, they assumed a pivotal role in upholding the quality, consistency, and reliability of the genomic data. This encompassed the implementation of rigorous quality control measures, addressing technical variations, and harmonizing diverse datasets to preserve the coherence of the information gathered from varied sources.

Secondly, GDACs were instrumental in spearheading the development of sophisticated bioinformatics tools and analytical pipelines. Crafted to navigate the complexities inherent in diverse genomic datasets, these tools enabled researchers to distill meaningful insights from the abundant information supplied by TCGA. The pipelines spanned various stages, from initial raw data preprocessing to advanced statistical analyses, thereby facilitating the identification of genetic alterations, biomarkers, and potential therapeutic targets.

In addition, GDACs played an active role in disseminating knowledge and expertise throughout the research community. Through engagement in training and collaboration initiatives, they ensured that researchers globally could adeptly navigate and leverage the TCGA data for their own investigations. This collaborative ethos significantly augmented the overall impact of TCGA, nurturing a community of researchers equipped with the indispensable tools and knowledge essential for advancing cancer genomics research.

Conclusion

In summary, the establishment of GDACs within TCGA underscored a commitment not merely to generating large-scale genomic data but also to providing the vital bioinformatics infrastructure necessary for transforming this data into meaningful and actionable insights within the realms of cancer research and personalized medicine. Findings from TCGA analyses were published in scientific journals, contributing to the broader understanding of cancer biology. The knowledge generated by TCGA has had a lasting impact on cancer research and clinical practice


– Shaistha Farheen. U. H
2nd year, BTech Biotechnology

BIOINFORMATICS FOR ONCOLOGY

But biology and computer science- life and computation- are related. I am confident that at their interface great discoveries await those who seek them

Leonard Adleman

WHAT IS BIOINFORMATICS?

It is an interdisciplinary research area that deals with the computational management and analysis of biological information: genes, genomics, proteins, cells, ecological systems, medical information, robots, artificial intelligence.

WHAT IS ONCOLOGY?

Oncology is a branch of medicine that deals with the study, treatment, diagnosis and prevention of tumor.

Cancer is a disease with a long history dating back to 1500 BCE and is a common cause of patient death. There has always been poor diagnosis and therapies due to variations, duration, cell differentiation and origin, and poor understanding of pathogens. But in the last few years due to advancements in bioinformatics, there has been a great improvement in therapies and new drug developments for cancer treatment. Sequencing of the genome has provided a major understanding of cancer and its effects. Due to the continuous study of mutations, and networks between genes and proteins, there is a collection of bulk data which helps in creating large databases. These large databases provide infrastructure for data preparation and data extraction.

HOW IS BIOINFORMATICS USEFUL FOR CANCER?

Clinical bioinformatics is an emerging science combining clinical informatics, bioinformatics, information technology, and mathematics which help to concentrate bioinformatics methods in cancer by considering the metabolism, signalling, communication, and proliferation. Computational approaches like meta-analysis have helped to differentiate the gene expression between normal and cancer cells. High throughput sequencing methods like NGS and RNA Sequencing have helped in cancer research as they give fast-speed sequencing at low rates. Multi-omics approach uses bioinformatics methods for high-level data management and integrates primary analysis of raw molecular profiling data for generation of clinical reports. Bioinformatics tools such as web technology, cytoscape, gene expression profiling interactive analysis(GEPIA) and databases such as National Centre for Biotechnology Information (NCBI), Gene Omnibus Database, Surveillance, Epidemiology and End Results database(SEER) are being used in cancer research and diagnosis in the identification of biomarkers by analysing the entire gene expression profile to approach the disease and genome level.

BIOINFORMATICS TOOLS AND DATABASES USED:

The process of data obtaining is called data mining, and it helps in finding gene, protein function domain detection, protein function inference, disease diagnosis, disease prognosis, etc. Oncomine, a cancer microarray database is mining platform that makes available cancer microarray data.

DATABASES CONTAINING DATA ON CANCER:

1. The Gene Omnibus Database (GEO)

Database created in 2000 stores and provides free access to high-throughput gene expression. They help to examine chromatin structure, genome methylation, and genome-protein interaction. They also provide free web-based tools to visualize and analyse the data. This database can be accessed through http://www.ncbi.nlm.nih.gov/geo/.

2. The Cancer Genome Atlas (TCGA):

It uses genome sequencing and bioinformatics to collect information on genetic mutations that cause cancer. This helps in a better understanding of the genetic basis of diseases, and uses high throughput genome analysis techniques that help in easy diagnosis and treatment. The project initially started by collecting 500 samples. These samples were characterized by different techniques like gene expression profiling, copy number variation profiling, SNP genotyping, exon sequencing etc. The project began in 2006 with the characterization of 3 cancer types: glioblastoma multiforme, lung squamous carcinoma, and ovarian serous adenocarcinoma. TCGA molecularly characterized over 20000 primary cancers and matched normal samples spanning 33 cancer types. The purpose is to provide free access to data for the cancer research community.  It can be accessed through https://www.cancer.gov/ccg/research/genome-sequencing/tcga.

3. The Human Protein Atlas:

A portal that uses antibody-based imagining, mass spectrometry-based proteomics, transcriptomics, and system biology to sketch the proteins present in cells, tissues, and organs. Launched in 2003, it provides easy access to the information, which is available on www.proteinatlas.org and contains protein expression data based on approximately 700 antibodies.  The antibodies used undergo validation processes like western blotting, immunohistochemical staining and immunofluorescence.

BIOINFORMATICS TOOLS:

1.The Database for Annotations, Visualization and Integrated Discovery (DAVID):

Aims to provide collection of annotation tools for a large list of genes from genomic studies like micro-array. This can be accessed through https://david.ncifcrf.gov. These tools are fuelled by DAVID Knowledgebase. This tool is used to:

  • Convert gene identifiers from one type to another
  • Link gene-disease associations
  • Discover enriched functional related gene groups
  • List interacting proteins and many more

2. Surveillance, Epidemiology and End Result Program (SEER):

Tracks the data on cancer incidence, mortality and survival based on 30% of US population. Information can be collected from the website http://seer.cancer.gov/index.html. This provides a basis for cancer research, and cancer prevention. Data includes cancer site, staging, treatment regimen, procedure for diagnoses, etc. This data is collected routinely.

3. Gene Expression Profiling Interactive Analysis (GEPIA):

Web-based tool to analyse gene expression in cancer cell and normal cell. Provides user-friendly interface to perform differential expression analysis, survival analysis, pathway analysis, etc. It helps to examine the expression of particular genes in different cancer types.  It is a powerful and intuitive platform. This can be accessed through http://gepia.cancer-pku.cn/

4. University of Alabama Cancer Database (UALCAN):

Web tool to provide analysis of cancer transcriptome data. This web tool helps in performing in-silico validation of gene of interest, helps in identification of biomarkers, provides quality graphs and plots based on gene expression. It provides easy access to pre-computed, tumour subgroup-based gene or protein expression, promoter DNA methylation status, etc. It can be accessed at http://ualcan.path.uab.edu

CONCLUSION:

Cancer diagnosis is a challenging field worldwide and it remains challenging with the non-computational methods. Bioinformatics being a powerful tool helps in early diagnosis of cancer by improving the methodology. It helps in discovering the biomarkers of different types of cancer.

– Aparna L N
B. Tech Biotechnology, 3rd year.

Omics and Bioinformatics

Introduction

“Computer science is no more about computers than astronomy is about telescopes, biology is about microscopes or chemistry is about beakers and test tubes. Science is not about tools. It is about how we use them, and what we find out when we do.”

  • Edsger Dijkstra

Biology, computer science, Technology, Bioinformatics, omics? So many terms to understand right? That’s not a big deal. In this blog we will see about omics and how omics is related to bioinformatics.

 Omics

      Omics is the branch of biology that specializes in the complete examination of large volumes of data on molecules found in biological systems like genes, proteins, metabolites, and other compounds are generated and analyzed using high-throughput technology. The suffix “omics” means, ‘The study of a particular group of molecules.’

Examples,

  • Proteomics – study of the proteins
  • Genomics – study of the entire genome
  • Metabolomics – studies the full set of metabolites
  • Transcriptomics – studies all the RNA molecules of an organism
  • Epigenomics – studies all epigenetic modifications
https://www.google.com/imgres?imgurl=https://www.liebertpub.com/cms/10.1089/ten.tec.2020.0300/asset/images/medium/ten.tec.2020.0300_figure1.jpg&tbnid=GUBg0W9Rv_eoNM&vet=1&imgrefurl=https://www.liebertpub.com/doi/10.1089/ten.tec.2020.0300&docid=8o7VjkdBDwhuSM&w=500&h=396&itg=1&hl=en-IN&source=sh/x/im

Bioinformatics and omics are related

      Omics and bioinformatics are used to analyze huge amounts of data produced by high-throughput technologies. Bioinformatics provides the tools and techniques to store, process, analyze, carry out quality control, find new therapeutic targets, and understand the massive volumes of data that omics technologies produce. This helps to advance our knowledge of molecular mechanisms and create individualized therapy. Large datasets are analyzed and interpreted by disciplines in omics by computational and statistical techniques.

https://www.google.com/imgres?imgurl=https://www.intechopen.com/media/chapter/67272/media/F1.png&tbnid=cF0HYWgukLCPoM&vet=1&imgrefurl=https://www.intechopen.com/chapters/67272&docid=9xvmc_7HzEsj5M&w=730&h=411&source=sh/x/im

Using bioinformatics and many omics technologies together

IN GENOMICS

Next Generation Sequencing (NGS) offers an unprecedented level of precision and accuracy in genome analysis by enabling researchers to sequence millions of DNA fragments concurrently. Bioinformatics and genomics extensively work together, especially to characterize gene expression patterns, find disease-causing genes, and examine genetic variations within and between populations.

IN PROTEOMICS

Usage of methods like protein microarrays and mass spectrometry. The development of protein biomarkers for illness diagnosis and the clarification of protein-protein interactions and signaling cascades are only a few of the many bioinformatics uses for proteomics.

IN METABOLOMICS

Use of methods like nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry. The identification of metabolic pathways involved in disease processes and the creation of tailored therapy based on a person’s metabolic profile are only two of the many bioinformatics uses for metabolomics.

IN TRANSCRIPTOMICS

RNA sequencing (RNA-seq) is one method used in transcriptomics to locate and count RNA molecules in biological samples. The identification of disease-causing genes and the characterization of gene expression patterns in response to varied stimuli are just two of the many bioinformatics applications of transcriptomics.

IN EPIGENOMICS

Using methods like chromatin immunoprecipitation sequencing (ChIP-seq) to find and measure epigenetic alterations in biological samples is known as epigenomics. The identification of epigenetic markers for illness diagnosis and the clarification of epigenetic mechanisms involved in gene regulation and disease processes are only two of the many bioinformatics applications of epigenomics.

https://www.google.com/imgres?imgurl=https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FGopalareddy-Krishnappa%2Fpublication%2F353708680%2Ffigure%2Ffig2%2FAS%3A11431281079274091%401660632943984%2FCommonly-used-OMICs-in-agriculture-complexity-increases-with-the-progression-of-arrow.jpg&tbnid=1JK173oXFfMOmM&vet=1&imgrefurl=https%3A%2F%2Fwww.researchgate.net%2Ffigure%2FCommonly-used-OMICs-in-agriculture-complexity-increases-with-the-progression-of-arrow_fig2_353708680&docid=qsyZaAYbIysWlM&w=689&h=537&source=sh%2Fx%2Fim

Advantages of omics in bioinformatics

  1. High-throughput data generation – analysis of large amounts of data in a short time.
  2. Comprehensive analysis – analysis of multiple aspects of biological systems (gene expression, protein expression, and metabolite levels), gives more comprehensive understanding of biological processes.
  3. Personalized medicine – identification of biomarkers that can be used to develop personalized medicine, improving the efficacy of treatments and reducing the risk of side effects.
  4. Better understanding of diseases – identification of disease-causing genes and biomarkers and better understanding of the molecular mechanisms of diseases.
  5. Drug discovery – identification of drug targets and the screening of potential drug candidates, leading to the development of new drugs.

Disadvantages of omics in bioinformatics

  1. Data analysis and interpretation – challenging to analyze and interpret the results accurately.
  2. Cost – expensive
  3. Limited knowledge – data in molecular level but limited knowledge on the functions of many genes, proteins, metabolites, etc.,
  4. False positives – can produce false positive results, leading to incorrect conclusions about biological processes.
  5. Ethical concerns – generates large amounts of personal data, leading to ethical concerns about data privacy and security.

Conclusion

           Omics technologies have revolutionized bioinformatics, providing researchers with a powerful tool to study biological processes at a molecular level. However, there are still challenges associated with the analysis and interpretation of omics data, and ethical concerns regarding data privacy and security. However, the potential benefits of omics technologies make it an exciting and promising field for future research.

Written by

Rasitha Arafa R

(210151601045)

B. Tech Biotechnology 2nd year

          MATLAB in bioinformatics

“But biology and computer science – life and computation – are related. I am confident that at their interface great discoveries await those who seek them.”

Leonard Adleman

Abstract:

MATLAB is a popular high-level programming language and numerical computing environment widely used in science, engineering, and finance. It provides a powerful set of tools for data analysis, visualization, and simulation. With its simple syntax, built-in functions, and comprehensive documentation, MATLAB makes it easy for users to perform complex mathematical computations and develop algorithms for a wide range of applications. This abstract provides an overview of MATLAB’s capabilities and applications, highlighting its strengths as a versatile tool for scientific computing and data analysis. It also discusses some of the latest features and updates in MATLAB, including its support for machine learning and deep learning, and its integration with other programming languages and platforms.

Blog for MATLAB

 In this blog, we’ll explore some of the features and capabilities of MATLAB that make it such a popular tool for scientific computing. MATLAB does not exist in a vacuum so I’ll also be discussing topics that, at first glance, might not seem to have a place on the ‘MATLAB’ blog. Python, C++, Excel, R, CUDA, TensorFlow and many more will also be making guest appearances here and there.

What is MATLAB?

MATLAB (short for “MATrix LABoratory”) is a numerical computing environment that allows users to perform complex calculations, manipulate matrices and arrays, visualize data, and create custom functions and applications. MATLAB was developed by MathWorks in the 1980s and has since become one of the most widely used tools in the fields of engineering, science, and mathematics.

MATLAB features

MATLAB offers a wide range of features and capabilities that make it a versatile tool for scientific computing. Here are some of the key features of MATLAB:

Matrix operations: MATLAB was designed to handle matrix and array operations efficiently, making it ideal for linear algebra computations. You can easily create, manipulate, and perform calculations on matrices and arrays of any size.

Built-in functions: MATLAB comes with a large number of built-in functions for performing common mathematical operations. These functions are optimized for speed and accuracy and can be used to perform tasks such as curve fitting, signal processing, and image analysis.

Visualization tools: MATLAB includes powerful visualization tools for creating 2D and 3D plots, graphs, and charts. You can customize the appearance of your plots and add labels, titles, and annotations to make them more informative.

Interactivity: MATLAB provides a graphical user interface (GUI) that allows you to interact with your code in real-time. You can explore your data and experiment with different parameters to see how they affect the results of your calculations.

Programming language: MATLAB is also a programming language, which means you can create your own functions and scripts to perform custom calculations and automate repetitive tasks. MATLAB supports a wide range of programming constructs, including loops, conditional statements, and functions.

MATLAB in bioinformatics.

MATLAB is used to create and use pipelines to carry out complete bioinformatics workflows.

  1. Gene expression and genetic variation analyses using microarray data. Microarray analysis is most often performed on blood samples.

      Steps –

          Sample preparation/isolation

          Hybridization

          Washing

         and image analysis

  1. Sequence analysis includes phylogenetics, alignment, and genomic and proteomic sequences. A DNA, RNA, or peptide sequence is subjected to computational analysis to learn more about its characteristics, biological function, structure, and evolution.
  1. Analysis of Structure: A database that is based on the experimentally discovered protein structures is known as a protein structure database. To see and work with the three-dimensional structures of proteins and other biomolecules; Prediction of RNA secondary structure
https://images.app.goo.gl/v6ZAWCQF6bdBxkZp7

 Future prospects for Matlab:

Advancements in machine learning and AI: Matlab has been widely used in machine learning and AI applications, and as these fields continue to grow and evolve, Matlab is likely to remain a valuable tool for researchers and practitioners.

Increasing demand for data analysis and visualization: As data becomes increasingly central to many areas of research and industry, there is likely to be growing demand for tools that can handle large datasets and provide sophisticated visualization and analysis capabilities. Matlab is well-suited to these tasks.

Expansion of engineering and scientific applications: Matlab has long been popular among engineers and scientists for modeling and simulation tasks, and as these fields continue to expand and evolve, Matlab is likely to remain a valuable tool.

Conclusion

MATLAB is a powerful tool for scientific computing that offers a wide range of features and capabilities. It is widely used in fields such as engineering, science, and finance for performing calculations, analyzing data, and creating custom applications. Whether you’re a beginner or an experienced programmer, MATLAB can help you solve complex problems and bring your ideas to life. This broad spectrum application of MATLAB can be exploited for bioinformatics in order to reduce the workload of researchers.

Written by: Shareeq Ibrahim, 3rd year B.Tech Biotechnology

NANOINFORMATICS

Nanoinformatics:

                      The Intersection of Nanotechnology and  Informatics.                  

Introduction:

  Nanoinformatics is a new and exciting field of study that combines the principles of nanotechnology and informatics. The field aims to develop new techniques and tools to store, manage, analyze, and interpret vast amounts of data generated by nanotechnology research.

What is Nanotechnology?

         Nanotechnology is the study and application of materials and systems on a nanoscale, often less than 100 nanometers in size. It involves the manipulation of matter at the molecular and atomic level to create new materials and devices with unique properties. Nanotechnology has the potential to revolutionize many areas of science and technology, including medicine, electronics, energy, and environmental science.

What is Informatics?

        Informatics is the study of information processing and the design and use of information systems. It involves the development of algorithms, data structures, and computer programs to manipulate and analyze information. Informatics is a critical component of modern science and technology, as it provides the tools and methods needed to manage and interpret the vast amounts of data generated by modern research and applications.

The Importance of Nanoinformatics:

https://www.google.com/search?q=nanoinformatics+tools+applications&sxsrf=AJOqlzVVaWFtDaaGe0KVMISr3dQnzyryXg:1676298449787&source=lnms&tbm=isch&sa=X&ved=2ahUKEwishoyt2pL9AhUmzDgGHTj1CngQ_AUoAXoECAEQAw&biw=1366&bih=657&dpr=1#imgrc=sHVS9o-sxDmoFM

                    Nanoinformatics is crucial to the advancement of nanotechnology because it provides a way to manage and analyse the data generated by nanotechnology research and applications which requires proper handling and storage methodologies. This data includes information on the properties and behavior of nanoscale materials and devices, as well as data on the environmental and health impacts of nanotechnology.

                   Bioinformatics tools and methods can help researchers to better understand the behavior of nanoscale materials and devices, and to design new materials and devices with improved properties. Additionally, nanoinformatics can be used to identify potential health and environmental risks associated with nanotechnology, and to develop the methods for reducing these risks using in-silico studies.

Nanoinformatics Tools and Methods :

             There are several tools and methods used in nanoinformatics, including:

  1. Nanoscale modeling and simulation: This involves the use of computer models to simulate the behavior of nanoparticles and devices, and to design new materials and devices with improved properties.
  • Data management and analysis: It focuses on the development of databases and algorithms to store, manage, and analyze the vast amounts of data generated by nanotechnology research.
  • Visualization: This involves the use of computer graphics and visualization tools to display and interpret nanoscale data, such as images of nanoscale materials and devices, and simulations of their behavior.
  • Decision support: This technique helps in the development of algorithms and software to support decision making in the field of nanotechnology, such as risk assessments and materials selection.

Future of Nanoinformatics:

  1. Advancements in Nanoscale Modeling and Simulation: The development of more advanced nanoscale modeling and simulation tools will play a key role in shaping the future of nanoinformatics. These tools will allow researchers to better understand the behavior of nanoscale materials and devices, and to design new materials and devices with improved properties.
  2. Increased Focus on Data Management and Analysis: As the amount of data generated by nanotechnology continues to grow, the need for effective data management and analysis tools will become increasingly important. This will drive the development of new algorithms, databases, and software specifically designed for nanoinformatics.
  • Integration with Other Fields: Nanoinformatics will continue to integrate with other fields, such as biotechnology, environmental science, and materials science. This will allow researchers to leverage the tools and methods of nanoinformatics to address new challenges and opportunities in these fields.
  • Increased Focus on Health and Environmental Impact: As the impact of nanotechnology on human health and the environment becomes a growing concern, the field of nanoinformatics will increasingly focus on developing tools and methods to assess these risks. This will include the development of new algorithms for risk assessment and the creation of databases to store and analyze health and environmental data.

Conclusion:

                Nanoinformatics is a new and exciting field that has the potential to revolutionize the study and application of nanotechnology. By combining the principles of nanotechnology and informatics, nanoinformatics provides a powerful tool for managing, analyzing, and interpreting the vast amounts of data generated by nanotechnology on a day to day basis. As the field continues to grow and evolve, it will play an increasingly important role in shaping the future of nanotechnology and how it impacts on our world.

Written by Lokesh.S, 2nd year B.Tech Biotechnology

BIOINFORMATICS TOOLS FOR COMBATING COVID19

“The problem is that traditional testing is like trying to find a needle in a haystack. It’s a very slow and linear process.” – Manoj Gopalakrishnan

https://news.cnrs.fr/sites/default/files/styles/visuel_principal/public/assets/images/adobestock_427853134_72dpi.jpg?itok=hRzsvRcY

SARS-CoV-2(severe acute respiratory syndrome coronavirus 2) also known as COVID-19 was the cause for the global pandemic crisis from 2019 to 2021. More than 5,00,000 people died across the world. The unique and constant genetic mutation of this virus created difficulty in biological research. It causes pneumonia in the lungs which ultimately leads to death. To extinguish this fatal cause and to provide treatment for the disease, bioinformatics tools played a vital role in performing research.

These bioinformatics tools are extensively used in the identification of the gene that is responsible for the infectious disease, characterization and prediction of the protein structure of the SARS-CoV-2 and treatment.

About COVID 19:

SARS-CoV-2 which causes respiratory illness is minute in size and contains a single stranded RNA. This virus’s genome size is extremely small and it uses effective methods to maximize the coding potential of genomes. The following steps are a part of the SARS-CoV-2 life cycle in host cells:

  • Attachment with the host cell receptor (Ig-like receptors).
  • Penetration (endocytosis).
  •  Un-coating- when the virus’s S-protein attaches to the cellular ACE2 receptor. It changes conformation, and the virus’s envelope fuses with the cellular membrane via the endosomal route.
  • Virus releases RNA into the host cell genome. This RNA is made into multiple copies inside the cell.
  • Then assembled in the ER and Golgi, transported by vesicles, and discharged outside of the cell.
https://www.frontiersin.org/files/MyHome%20Article%20Library/554339/554339_Thumb_400.jpg

BIOINFORMATICS TOOLS AND SOFTWARE:

These tools are very crucial in introducing novel treatment approaches and to develop effective prevention strategies. There are more bioinformatics tools that are used in research, among them few have been listed down:

  1. Next generation sequencing(NGS):  NGS has been applied to the study of COVID-19 and has greatly promoted SARS-CoV-2 origin tracing. It also acts as a tool for COVID-19 diagnosis, monitoring new strains and phylodynamic modeling in epidemiology.
  2. BLAST (Basic Local Alignment Search Tool): It is used to compare the novel sequence of protein or nucleotide sequence with template sequences or previously characterized genes in the database. This helps in deciding which family COVID-19 belongs to.
  3.  Tandem repeat finder: Is the database software tool used for identifying the tandem repeats in the sequence. Tandem repeats, which means consisting of two or more nucleotides placed very close to each other in the DNA sequence.
  4. Genome-wide association study (GWAS):  It’s an approach that involves scanning biomarkers such as single nucleotide polymorphism (SNP) from DNA of many people in order to find the genetic variation, associated with a disease field phenotype. Once new genetic association is identified, researchers use this information to develop better strategies like, to detect, prevent and treat the disease.
  5. Computer-aided drug design (CADD): It includes finding, developing and analyzing medicines and related biological active compounds by computer methodologies. The CADD is considered as a challenging practice because of its proper algorithms including the development of digital repositories for the study of chemical interaction relationships, computer programs for designing compounds with physicochemical characteristics. It also speeds up drug discovery and development for scientific research.

CYTOKINE STORM(CS) IN COVID 19:

CS is when the immune system overreacts to an infection, causing damage to healthy cells and tissues around the body. The CS could be one of the reasons why COVID 19 has been so deadly. Researchers are studying CS through reviewing the literature and data analysis. A unique computational approach looking at the whole system will help us understand how this differs from other viruses, which is unique to SARS-COV-2. This approach could help explain additional symptoms we see with COVID 19 like stomach pain, nausea, diarrhea. And it helps in understanding how our immune system reacts to the illness.

CONCLUSION:

The sudden pandemic was intimidating for people to overcome. However, thanks to advances in bioinformatics we were able to fight against the virus. Based on the tools and software’s, we can perform our corona virus studies easily with accurate results. And in future we will be able to fight new mutations of COVID 19 or any other kind of pandemic that is caused by other forms of viruses using advanced bioinformatics technology.

Written by : J. Jayashree ,2nd year B.Sc Biotechnology

NoSQL Databases in Bioinformatics

” Computer science is no more about computers than astronomy is about telescopes, biology is about microscopes or chemistry is about beakers and test tubes. Science is not about tools. It is about how we use them, and what we find out when we do.”

~ Edsger Dijkstra

A database is something that provides mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. A relational database is a database which stores the information about the data and where it is related.

https://www.google.com/url?sa=i&url=https%3A%2F%2Fdev.to%2Fnyashanziramasanga%2Ftypes-of-nosql-data-storages-part-2-mi8&psig=AOvVaw1PduSzfOpx-Yu9DDXknLlW&ust=1648982528710000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCIiE2MeY9fYCFQAAAAAdAAAAABAD

Basics of NoSQL databases

In order to store large data in relational database was a challenging task to researchers and scientists. So non-traditional database (NoSQL) emerge to offer alternative, scalable and flexible data stores.NoSQL database has these advantages over relational database:

  • Easy storage and retrieval of data
  • Simplicity of design
  • Horizontal scaling
  • Finer control

NoSQL databases are also called schema free databases. The key advantages of schema free design is that it enables applications to quickly upgrade the structure of data without table rewrites. It also allows for greater flexibility in storing heterogeneously structured data. Though relational database has ACID properties (atomicity, consistency, isolation and durability), a NoSQL databases may not fully support these properties. Instead they are required to satisfy the BASE properties (Basically Available, Soft state and eventually consistent).

Types of NoSQL databases

  1. Key Value Databases
  2. Document Databases
  3. Wide Column Stores
  4. Graph Databases
https://www.google.com/url?sa=i&url=https%3A%2F%2Fneo4j.com%2Fblog%2Fwhy-nosql-databases%2F&psig=AOvVaw35B2zs6Rnp7RTrOC3tKk2h&ust=1648982843533000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCLDPxNiZ9fYCFQAAAAAdAAAAABAR

KEY VALUE DATABASES

Store data in pair of parts (keys and values) where unique indexes each value.

Features:

  1. Simple primitive data structure
  2. No predefined schema
  3. Limited query capabilities
  4. Dictionary- like functionality at large scale

Limitations:

  1. Key lookup only, no generalized query
  2. Small number of attributes per entity

Some of the key value databases options:

  1. Redis
  2. Oracle Berkeley db
  3. Foundation db
  4. Erospike

DOCUMENT DATABASES

A type of non relational database that is designed to store and query data as JSON-like documents.

Features:

  1. JSON/XML structures
  2. No predefined schema
  3. Query capabilities
  4. Collections analogous to tables

Limitations:

  1. No referential integrity checks
  2. Object-based query language

Some of the document databases options:

  1. Mongo db
  2. Raven db
  3. Couch base

WIDE COLUMN DATA STORES

A wide-column database is a NoSQL database that organizes data storage into flexible columns that can be spread across multiple servers or database nodes, using multi-dimensional mapping to reference data by column, row, and timestamp.

Features:

  1. Groups attributes into column families
  2. Column families store key value pairs
  3. Implemented as sparse multi-dimensional arrays
  4. De normalized

Limitations:

  1. Operationally challenging

Some of the wide column data store options:

  1. Cassandra
  2. Accumulo
  3. Hypertable

GRAPH DATABASES

Those in which the data structures are modeled as a directed, possibly labeled, graph, or its generalizations. The data manipulation is done using graph-oriented operations and type constructors, and appropriate integrity constraints can be defined over the graph structure.

Features:

  1. Highly normalized
  2. Graph based query language
  3. Support for path finding and recursion

Limitations:

  1. Less suited for tabular data

Some of the Graph databases options:

  1. Neo4j
  2. Titan
  3. Orient db

Here are the few supporting statements and references stating that scientists started using NoSQL databases over traditional databases:

  • Quantitatively compared the latencies of different data store on storing and querying the proteomics datasets (mass spectrometry or MS) over NoSQL databases such as Mongo DB and HBase.
  •  In another study which reported and experimental comparison on PostgreSQL (a traditional relational database product) and Neo4j(a graph database) using data imported from STRING v9.1: protein-protein interaction network containing 20,140 proteins and 2.2 million interactions showing that speedup in Neo4j could be hundred or thousand times larger. They concluded that graph databases are ready for bioinformatics and can offer great speedups over relational databases on certain problems.

Concluding thoughts :

The above supporting statements, references and advantages of NoSQL databases show that scientists and researchers started adopting the NoSQL databases over the relational databases. This does not mean that relational databases are expired; they are still in use in the field of bioinformatics. Hence we can conclude that the future of data storage and retrieval of the biological data will be in the hands of NoSQL databases.

Written by : Sakthivel J , 3rd year B.Tech Biotechnology

Design a site like this with WordPress.com
Get started