Once an expert profile has been created using our machine learning framework, we use sophisticated algorithms to assign primary affiliations for each profile. The algorithms account for potential affiliations in publications, clinical trials, grant payments and industry payments. Potential affiliations in the source documents are weighted and scored based on frequency and time to generate the primary affiliation. Using this approach, you are able to obtain constantly updated affiliation information on millions of experts, including the experts you are interested in.
Even though we have very high-quality standards in assigning the primary affiliation to each expert profile, it is very difficult to obtain 100% accuracy. This is due to the inherent nature of the data, due to the movement of individuals and the quality of source data sets utilised. We are constantly working to further improve the quality of affiliation data, which increases each month.
The issue with incorrect primary affiliation assignment is most often related to incomplete entity recognition in the source, or the lack of reliable affiliation information altogether (in which case “no affiliation” or only location information is provided). Put simply, we do not always identify the presented organisation, which in times can lead to outdated or non-precise affiliation information disclosed. The three examples below illustrate different cases for data retrieval and the challenges involved:
Example one: simple entity recognition
The following affiliation information for a co-author is easy to identify and rank, this since the affiliation is clearly described together with geographical location and a unique identifier (email):
University of Texas M.D. Anderson Cancer Center, Houston, TX [name]@mdanderson.org
Example two: more challenging entity recognition
In this case, multiple affiliations are presented for a single author of a publication. As seen, it is unclear if the author is based on one or several of these affiliations. It could also be the affiliation of his or her co-authors that are presented. Some affiliations are also incomplete and no initials or other identifiers are available:
Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, Georgia, USA and Cedars-Sinai Medical Center, Los Angeles, California and Department of Biostatistics and Bioinformatics, Rollins School of PH.
Example three: more challenging entity recognition
In this final example, very limited information is available and the names are highly generic. The Medical School is a name used as part or in full of hundreds of academic medical centers. Typically, additional affiliation information is required for us to get the primary affiliation of this expert correctly assigned:
Dermatological Sciences, Institute of Cellular Medicine, The Medical School