Data harmonization within the Clinical Trials Network: Strengths, challenges, and opportunities in the era of generative AI [commentary].

The integration of data across randomized controlled trials (RCTs) testing interventions and treatments for substance use disorder (SUD) offers a rich opportunity for improving the evidence base and analytic methods used in SUD research. With over 50 completed trials of the National Institute on Drug Abuse (NIDA) National Drug Abuse Treatment Clinical Trials Network (CTN) available for secondary analysis and harmonization, the possibilities are extensive, but the effort to harmonize and document datasets demands complex analytic formulation and methodology. This commentary discusses strengths and challenges of data harmonization, sharing clinical and data science considerations based on four exemplar studies that harmonized data across multiple CTN trials. We offer recommendations for others planning data harmonization for secondary analysis, discuss guiding principles for research data management, outline suggestions to bridge gaps in the context of the CTN, and finally frame considerations for using state-of-the-art tools such as generative AI and integration of data from clinical trials and electronic health records to enhance the promise of data harmonization.