Terek Peterson, Vice President, Clinical Analytics and Data Strategies

What’s the Issue?

Adoption of data standards are necessary to underpin higher data quality, efficiencies and integrated applications across increasingly complex clinical research processes. Global regulatory agencies have clearly embraced data standards for the submission of data, laying the groundwork for expanded adoption, to data collection, tabulation and analysis.  Collecting data in a standardized way is the next step to reduce messy, time-wasting efforts that can negatively impact patient safety, development timelines and resource utilization.

On an organizational level, failure to establish standards upfront makes it difficult – and in some cases, impossible to connect data across disparate systems for efficient study execution.

Standards enable efficiency gains related to time-intensive processes involving acquisition, aggregation, analysis and report preparation through re-use of standard formats for protocols, case report forms (CRFs) and other data-related formats. Standards eliminate the expense and time of reformatting data for transfer to or from third party vendors, and converting proprietary data into standard formats.

Effective standards also drive access to data across trials, providing insight into trial design and operations based on past research experience.  With appropriate standards in place, data can be linked moving backward in time, much the way a genealogy traces ancestor lives. Standards make it possible to connect and trace previous research intelligence to mine historic trial data from the “genealogy” of a drug development program or therapeutic indication.

A Short History of Global Standards in Clinical Research

More than twenty years ago, Clinical Data Interchange Standards Consortium (CDISC), a global standards development organization, was founded on the growing industry recognition of the need for standards.  The CDISC mission is to develop and support global, platform-independent data standards the enable information system interoperability to improve medical research and related areas of healthcare. CDISC “provides clarity by developing and advancing data standards of the highest quality to transform incompatible formats, inconsistent methodologies, and diverse perspectives into a powerful framework for generating clinical research data that is as accessible as it is illuminating.”

Over time, the work of the CDISC has made notable progress in creating shareable, end-to-end data standards for clinical and nonclinical research. To date, the foundational standards focus on core principles of data standard definitions that include models, domains and specifications for data representation.  Standards around protocols and data collection exist but adoption varies greatly so more work is needed.

Mature standards which are widely used include:

  • The Study Data Tabulation Model (SDTM) the tabulated representation of the collected data and one of the first standards developed, has evolved to not only support clinical data but also non-clinical, medical devices, and pharmaco-genomics data. SDTM Implementation Guide (SDTMIG) guides the organization, structure, and format of standard clinical trial tabulation datasets.
  • The Analysis Data Model (ADaM) and the ADaM Implementation Guide (ADaMIG), provides a standard structure of analysis data which is derived from SDTM data to support the creation of tables, listing, and figures as part of the study’s clinical study report.
  • Questionnnaires, Ratings and Scales (QRS)supplements- Each QRS instrument is a series of questions, tasks or assessments used in clinical research to provide a qualitative or quantitative assessment of a clinical concept or task-based observation. The QRS team develops Controlled Terminology and SDTM (tabulation) supplements; the ADQRS Team develops ADaM (analysis) supplements.

Starting with a solid foundation of standardization make logical sense.  However, the standards that need additional work and less commonly adopted include: 

  • Protocol Representation Model (PRM)provides a standard for protocols with focus on study design, eligibility criteria, and other requirements from global health authorities. PRM assists in automating CRF creation and electronic health records (HER) configuration to support clinical research and data sharing.
  • The Clinical Data Acquisition Standards Harmonization (CDASH) collection standard establishes standardized CRF questions, best practices, and controlled terminology that aligns with SDTM. Data collection formats and structures provide clear traceability of submission data and more transparency for regulators.

(source: https://www.cdisc.org/standards/foundational)

In 2004, the Food and Drug Administration (FDA) set the stage for future directives by announcing that SDTM could be used when submitting data for clinical trials.  The FDA saw this as one of the first steps to greater efficiencies during NDA review.  In 2014, FDA established a timetable requiring that electronic study data must be submitted according to section 745A(a) of the Federal Food, Drug, and Cosmetic Act (FD&C Act) and eStudy Data Guidances.    Japan’s PMDA also has conditions outlined in a timetable for submitting data standards as part of a submission.  There are major consequences of non-conformance, including the possibility of a technical rejection of an eCTD, refuse to file (RTF) for NDAs and BLAs, and refuse to receive (RTR) for ANDAs. Other consequences include company reputation and needless approval delays, resulting in lost revenue.

Where We Are Today

Although awareness of CDISC standards has increased immensely over the past decade, with successful adoption of the SDTM, there continues to be lack of understanding of the why, how, and benefit of collection standards execution.  Adoption of standards for data collection have been slower than tabulation and analysis standards.  Even implementation of these mature standards varies by company, collection standards vary significantly.

Data transformation and management is fraught with issues when creating SDTM that can be lengthy and require costly experts.  These processes can be inefficient with numerous review cycles across different companies and a variety of stakeholders. On average, activities can add 6-8 weeks to a typical clinical trial. Many transformation processes are conducted concurrently during the clinical trial, not adding to the duration of the trial, but adding unneeded expenses that could be avoided.  These extra efforts include increases to programming around dirty data and managing deltas due to multiple data cuts.  If not thought through and implemented early in the drug development lifecycle, sponsor impacts include the need to implement standards into current or new processes, additional regulatory documents, new tools for data tabulation and analysis, increased training, and additional coordination.

Today, data standards play a significant role in the current trend of increased automation from protocol to collection. These efforts require the work of individuals who understand clinical trials, programming, data standards, and mathematics/statistics. Evolving roles of data scientists and standards subject matter experts are needed to successfully execute the requirements to support a successful submission.  It’s not only important to for these research professionals to understand standards, but also how FDA and PMDA reviewers use them within the review process.

Development of standards and tools across the research community helps standardize data across studies which facilitates data exchange with many stakeholders.  Automating the creation of SDTM datasets will reduce study timelines and costs while improving data quality. It also removes the extra data transformation step to deliver data to the analysis and clinical teams faster. This efficient workflow accelerates data integration with higher quality data.

The Future

As we look to the future, forward-looking companies are looking to develop or access data science expertise and embrace agile open source technologies to increase visibility into their data.

As big data reshapes drug development, sponsors must be able to manage more data, from more disparate and diverse sources, across more electronic information systems, without loss of quality or efficiency. Linking multiple systems and implementing new technologies pose increasing demands on existing research ecosystems. Sponsors need novel, highly efficient approaches to quickly analyze and act on insights extracted from large volumes of data. Organizational collaboration is also essential, although a radical departure from the traditional functional silos of data management, statistical programming, biostatistics and clinical operations.

Industry-wide adoption of CDISC standards will expedite the integration of complementary data sources.  The thoughtful adoption of artificial intelligence (AI)-driven techniques, such as machine learning and natural language processing offer transformative benefits to drug development and clinical outcomes through the ability to reduce timelines, predict outcomes and improve data quality.

Ultimately, organizations who invest in data standards now are building a competitive advantage for tomorrow’s essential operating components.