Back to projects

Author Identity Disambiguation System

A service to intelligently search and disambiguate academic authors from partial or ambiguous input information.

NLP Transformers Python LLaMA Fine-tuning

At Reviewerly, core workflows like peer-review matching required reliably linking recommended reviewers and authors to internal database records — a prerequisite for conflict-of-interest checking and reviewer recommendation pipelines.

To solve this, I built an author disambiguation service that combines pattern matching with semantic analysis of academic trajectories (publication history, affiliations, co-authorship networks) to identify the correct person even in cases of identical names or name variations across sources.

The service is followed by a data enrichment pipeline that crawls the web to surface all known academic email addresses associated with a resolved author. To handle ambiguous cases, I fine-tuned a LLaMA model to predict the most likely email address from a set of candidates, improving downstream contact reliability.