Big Data analytics-driven surveillance of Postmarket Drug Side Effects

Real drug usage and post market surveillance remains a challenge. While there is a lot of information available on drugs due to regulatory processes, it has been shown that social media can help identify new drug related information (reference, Pubmed ID 27311964, 26776212, 26163365). The use of social media information streams and interactions is an important new source of data for drugs, especially to evaluate the effects of drugs outside the controlled settings of trials. In addition, pharmacogenetics, that is the influence of individual molecular process of drugs, is increasingly in focus. For example, the SLCO1B1* 5 variant has been identified to cause myopathies when statins hypocholesterolemia drugs are taken. However, the identification of populations with similar characteristics is difficult. Social networks interactions could potentially identify these populations allowing targeted genotyping with a much improved cost-benefit result. Existing bioNLP tools, such as Gimli (Campos et al. 2013), supporting natural language processing, are not easily applicable to social media. There are numerous new challenges to address, such as temporal disjunction, jargon. This joint project between TSMM (Text & Social Media Mining) at Yonsei Univ. and SIMED (Sciences de l'Information M├ędicale) at Univ. of Geneva aims at constructing a biomedical surveillance system in order to identify patterns and trends of medication treatments and their side effects over time (10 years). Our system will create a knowledge base by mining two main sources: scientific publications and social media. We will detect the biomedical concepts in scientific literature and also survey and trace the side effects over population.