The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, …


Background and Significance

1 Diabetes Background

Diabetes is a major public health problem worldwide In the United States,
146 million people have been diagnosed with diabetes 2005 data, and a
further 62 million are estimated to have the disease undiagnosed Thus
about 7 of the population suffer from it [1, 2] About ninety five percent
of all diagnosed cases are type 2 diabetes, with the remainder type 1 or
other less frequent types The incidence of diabetes has been increasing
dramatically There was a 33 increase in incident diabetes in the US
between 1990 and 1998, with an especially alarming rate of increase in the
30-39 age group of 70 [3] Of particular cause for alarm is the increase
in diabetes among young adults and children and the high prevalence of the
metabolic syndrome, which is associated with insulin resistance and an
increased risk of type 2 diabetes Furthermore, diabetes has been estimated
to have a huge economic impact on the country with direct medical and
indirect costs estimated at 132 billion in 2002 Indirect costs include
lost workdays, mortality and permanent disability [4] It has thus become
a priority
to more fully understand the complex pathophysiology of the
progression to diabetes in high-risk individuals, and to identify high-risk
prediabetic individuals This understanding will translate into new
prognostics and therapeutics [2]

The I2B2 collaborators at the Joslin Diabetes Center in Boston are focused
on type 2 diabetes Current research strategies include use of human
muscle and adipose tissue samples from metabolically characterized
volunteers to identify potentially pathogenic changes in gene expression
profiles which precede and accompany diabetes risk and overt diabetes
However, it is challenging to obtain samples of muscle and adipose tissue
from volunteers due to the invasive nature of the biopsies Thus, a key
goal or the i2b2 is to determine whether peripheral blood mononuclear
cells, which can be obtained easily, can be used as a surrogate instead of
muscle and fat cells to assess and predict metabolic physiology

The collaborators at the NCIBI in Michigan are focused on the molecular
mechanisms of oxidative stress in both type 1 and type 2 diabetes The
major goals are to study the
mechanisms underpinning glucose-induced
oxidative stress and the specific pathways that are responsible for such
damage They seek to understand the cellular mechanisms of diabetic end-
organ damage and to identify new potential targets for treatment In these
studies they are investigating several potential targets of therapy
including the regulation of taurine transporters, metabotropic glutamate
receptors mGluRs, agents modulating glucose transporters- GLUT 1 and
GLUT4 and PPAR-gamma ligands

Recent advances in genomic and bioinformatics technologies are having an
increasing impact on diabetes research For example, high-density
oligonucleotide arrays gene-chips are now being used systematically
together with automated differential display techniques [2, 5] These
techniques are being used in parallel with mechanism-based biomarkers for
example, in the study of oxidative stress [6] but there is a lack of user-
friendly tools to integrate queries over composite genomic, proteomic and
metabolomic data sets This kind of local experimental data needs also to
be integrated with the widely available biological pathway
data in order to
construct networks and pathways of possible interactions and mechanisms
Currently, one of the project partners i2b2 is constructing the Hive
system based on Web Services technology which will act as a workflow
manager enabling a variety of relevant services to be integrated [7] and
below

Our overall goal is to provide user-friendly tools to integrate composite
data sets derived from genomic and metabolite profiling from our I2B2 and
University of Michigan investigators

2 Pathways Background

Progress in diabetes research is largely dependant on modeling and
understanding the biological pathways involved in the disease which will
enable the development of in silico approaches that will speed up and make
the current research effort more effective Pathway research has expanded
enormously in recent years ever since the pioneering work of Karp [8, 9]
and has resulted in an explosive growth of Pathway databases [10]

There are four main categories of biological pathways: metabolic pathways,
molecular interactions, gene regulation networks and signaling pathways,
which are all traditionally represented differently
Metabolic pathways are
usually shown as a series of enzyme-substrate-product reactions; molecular
interactions, such as protein-protein interactions obtained from yeast two-
hybrid Y2H experiments, are usually depicted as simple binary
interactions; gene regulation pathways show connections between
transcription factors and the genes whose transcription they activate or
repress; signaling pathway representations are the most varied, ranging
from vague and general representations of the form theres an activation
chain in which A activates B activates C to specific and detailed
representations involving a series of complex binding reactions and protein
post-translational modifications

The size and number of pathway databases are continuously growing At last
count the online list of pathway resources, PathGuide [11]], listed over
220 pathway resources up from 138 in July 2004 [10] Examples include the
KEGG database, a collection pathway maps representing knowledge about
metabolism, cellular processes and human diseases; and the BioCyc family,
a metabolic pathway database including EcoCyc and MetaCyc [12] While
these
represent a wealth of available data, the fact that each database uses its
own data model semantics and data format syntax means that the full
potential of the data collected is still unavailable to researchers Every
time a researcher seeks to identify all the biological pathways which may
play a part in a particular biological process, she has to manually search
a large number of databases

This means that there is great opportunity for human errors and omissions
For example patterns across datasets, which would need to be spotted
manually, are often missed The silos in which these data reside are a
major barrier to their exploitation Microarray results typically show sets
of modulated genes Such modulation, for example of metabolic processes,
will itself involve signaling pathways and gene regulation networks In
such results all these forms of pathway will be implicated The distributed
nature of the in silico pathway networks are a simple barrier to cross
domain querying, but the semantic differences between those resources form
a greater barrier The different representations of the pathways and the
heterogeneous names
for all entity representations means it is almost
impossible, for instance, to query about the gene regulatory mechanism for
the metabolic pathway highlighted as being implicated through analysis of
microarray results

The limitations of such approaches have are well-known to the Joslin
genomics collaborating investigators Since changes in gene expression are
small in healthy humans with type 2 diabetes at steady-state, it is
difficult to identify single genes which may play a pathogenic role For
this reason, pathway-based approaches to analysis have played a critical
role in development of a major hypothesis in type 2 diabetes - namely, that
alterations in mitochondrial expression and function are a key phenotype
associated with insulin resistance and diabetes cite Patti, Mootha
papers Current approaches to microarray analysis utilized by the Joslin
genomics collaborating investigators include: a identification of
pathways overrepresented in a set of differentially expressed genes eg
MappFinder, and b identification of pre-selected gene sets which are
enriched in a gene list ordered by differential expression
eg Gene Set
Enrichment Analysis As noted above, however, this process is time-
consuming, prone to human errors, and likely to generate hypotheses which
reflect current knowledge, rather than adequately incorporating novel
pathway information Thus, it would be highly desirable to link pathways
in order to create a model relating gene expression differences between
healthy controls and individuals with insulin resistance and/or diabetes

One of the major initiatives to overcome these barriers has been the
development of BioPAX in recent years BioPAX is both a format for the
exchange of pathway data between databases and a specification of an
ontology which enables reasoners to process this data At a basic level,
BioPAX is a common representation for signal transduction, gene regulation,
protein-protein interaction and metabolic pathways in order to enable
coherent and reliable queries across multiple databases The BioPAX
definition of a pathway is deliberately generic, defining a pathway as a
set or series of interactions or reactions, often forming a network, which
biologists have found useful to group
together[biopax spec reference] This
definition captures the essential characteristics common to the many
different pathway representations - that of a biologically meaningful
collection of interactions At a more advanced level, BioPAX chose OWL as
the representation language in order to enable the use of reasoning
technology over the representation With the correct representation in OWL,
reasoners can be used to find inconsistencies in current databases
integrate data from multiple sources and generate plausible hypotheses to
guide experimentation

3 Computing Background

The core objective of this proposal is to build an SBML model of diabetes
The Systems Biology Markup Language SBML is a computer-readable format
for representing models of biochemical reaction networks see
http://sbmlorg/ SBML has been evolving since mid-2000 through the
efforts of an international group of software developers and users SBML is
applicable to metabolic networks, cell-signaling pathways, regulatory
networks, and many others In our particular, proposed, application of SBML
the models of protein interactions express
accumulated knowledge that is
amenable to tracking, inference, and simulation

SBML is supported by over 90 software systems, including CellDesigner and
the Systems Biology Workbench

Beyond the SBML, we are planning to explore how to link ontology and
workflow resources with visualization tools developed in other centers,
particularly CellDesigner[1], which provides a flexible means of displaying
and decorating complex pathway maps so that the effects of various
experimental manipulations can be visualized effectively

CellDesigner http://celldesignerorg/ is a structured diagram editor for
drawing and manipulating gene-regulatory and biochemical networks The
models described in this proposal are such networks The networks are drawn
based on the process diagram, with graphical notation system proposed by
Kitano, and are stored using the Systems Biology Markup Language SBML, a
standard for representing models of biochemical and gene-regulatory
networks Networks are able to link with simulation and other analysis
packages through Systems Biology Workbench SBW

Thus, as our models evolve through the integration of new
information from
both experimental data and the integration of known data sources, they can
be verified through simulation to correspond to know chemical effects The
means by which the model is best expressed in SBML are still open issues
For example, we might need to use the semantics present in BioPAX model
but not usually in SBML models to aid visualization, eg showing whether
a node is a protein or a gene, etc These uses and extensions of SBML will
support new types of modeling, inference, and simulation

Workflows are employed to automate tedious manual processes and ensure no
known analyses or patterns are missed A key task of ontologies in
Manchester is to provide the descriptions and annotations for describing
workflows for the Taverna [18] engine, which is widely used in the
biological community[2] Workflows in Bioinformatics are a key strategic
resource both for reducing the effort required for research and for
improving the quality of the documentation and metadata Ontologies allow
tasks and services to be described and entered in central registries
Workflows allow descriptions of types of services and
resources to be
linked together in such a way that a workflow engine can assemble specific
instances of the services and information automatically The effect on
effort can be dramatic, eliminating days of routine tedium Semantically
enabled service oriented architectures SOAs are increasingly the key to
effective use of computing resources in Bioinformatics and at the heart of
the emerging Grid architecture for both compute and data Grids

Protégé [14, 15] developed by Mark Musens group at Stanford now cBio and
has become the most widely used open source ontology development
environment and is at the core the CBio effort This work is now
converging with the work on Semantic Web, OWL and Description Logic
technologies at University of Manchester

The widespread adoption of OWL as a W3C certified standard cf the OWL
plug-in for Protégé [16] makes possible greater interoperability between
different ontologies more feasible and thereby laid the groundwork for the
type of database integration based on ontologies of which BioPAX is an
excellent example As mentioned above, the BioPAX ontology is expressed in
OWL and
this makes possible not just data integration, but also the
application of reasoners in knowledge management and research [17]

Ontologies have been most widely applied to managing terminologies, however
ontologies represented in OWL can be considered as general logical models
and provide a formal approach to a number of important issues including
providing common controlled vocabulary, classifying information into
logically consistent poly-hierarchies, integrating information from
multiple schemas and data bases, determining when two descriptions are
logically equivalent, discovering resources based on their descriptions -
the basis of the semantic web and workflow technology

4 Significance

This project has the potential to have a major impact on the conduct of
clinical research in diabetes and by extension in many other clinical
areas By providing the foundation for an innovative and integrated
workflow for diabetes research, and constructing a model of the disease
which integrates a multiplicity of data sources, future research in this
area will be significantly more rapid and effective
We expect this
exploratory grant to show the way that pathway databases can
be integrated in practice with clinical research by exemplifying the
technology that can be utilized to bridge the research silos currently in
existence In this regard the outcomes of this project will have long-term
consequences for all researchers involved in biological pathway research
Since health and life science research is an extremely data rich research
environment, there is fundamental need to leverage technologies which
utilize semantic information and inference and thus take steps to make the
data more usable and more accessible

———————–
[1] http://wwwcelldesignerorg/
[2] http://twikimygridorguk/twiki/bin/view/Mygrid/WorkflowLinks

———————–

This figure shows CellDesigner rendering the Skeletal-Muscle ineraction
network in SBML that we intend to extend

Source:guidedog.org

del.icio.us:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... digg:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... spurl:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... newsvine:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... blinklist:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... furl:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... reddit:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... fark:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ... Y!:The incidence of diabetes has been increasing dramatically. There was a 33% increase in incident diabetes in the US between 1990 and 1998, ...