Understanding Competitive Landscape data sources
Every competitive profile in Prudentia is built from a structured pipeline that queries a tiered set of authoritative sources and enforces strict accuracy guardrails at every stage. This article explains where the data comes from, how it is validated, and what that means for the confidence you can place in the outputs.
How competitive profiles are built
For each asset in the Competitive Landscape, Prudentia's pipeline runs a defined set of searches across six profile areas: Basic Information, Preclinical, Clinical, Regulatory, Milestones, and Intellectual Property. Coverage is engineered along two dimensions:
- Breadth: multiple independent source categories are queried for each fact, so no single source is relied upon exclusively.
- Authority: primary sources (e.g., regulatory agencies, trial registries, patent offices) are always preferred over secondary reporting.
Data sources by category
The pipeline draws from eight categories of primary sources, each queried with domain-restricted searches to ensure results come from authoritative origins:
|
Category |
Primary Sources |
What We Extract |
|
Clinical Trial Registries |
ClinicalTrials.gov (clinicaltrials.gov), EU Clinical Trials Register |
NCT numbers, trial design, population, dosing, endpoints, status, dates |
|
Regulatory Agencies |
FDA.gov, accessdata.fda.gov (labels, Orange Book), DailyMed (FDA-approved prescribing info), EMA (ema.europa.eu), PMDA (Japan), NMPA (China) |
Approvals, designations (Breakthrough, Orphan, Fast Track, RMAT, Priority Review), PDUFA dates, boxed warnings, MAA/BLA/NDA status |
|
Peer-Reviewed Literature |
PubMed, journal DOIs |
Efficacy/safety publications, preclinical studies, mechanism of action |
|
Medical Conferences |
ASCO, ESMO, ASH, AAN, AES, MDS, ACC, AHA, ESC (disease-tailored) |
Late-breaking abstracts, oral presentations, posters |
|
Patent Databases |
Google Patents, USPTO, EPO, INPADOC/Derwent families |
Composition-of-matter, formulation, method-of-use patents, filing/grant/expiry dates, patent family, FTO signals |
|
Company Disclosures |
Company IR pages, SEC filings (10-K, 10-Q, 8-K), earnings calls, investor decks |
Pipeline status, guidance, financing, licensing deals, manufacturing updates |
|
Press Releases |
GlobeNewswire, BusinessWire, PRNewswire, Accesswire |
Topline results, regulatory announcements, partnership news |
|
Business & Trade Media |
Bloomberg, Reuters, FierceBiotech, BioPharma Dive |
Deal flow, context, cross-verification of company claims |
Data sources by category
The six sections of every competitive profile are built from specific source categories and extraction schemas:
- Basic information. Mechanism of action, target, modality, route of administration, developer, development phase, and FDA designations.
- Preclinical. Efficacy in animal and xenograft models, ADME/PK, and toxicology. Anchored on PubMed and FDA briefing documents.
- Clinical. Full NCT inventory from ClinicalTrials.gov, per-trial results from publications and abstracts, and disease-tailored conference data (e.g., ASCO/ESMO for oncology, AAN/AES for neurology).
- Regulatory. IND, Orphan, Breakthrough, Fast Track, RMAT, and Priority Review status; BLA/NDA/MAA submissions; PDUFA dates; advisory committees; and boxed warnings. Sourced from FDA, EMA, and DailyMed.
- Milestones. Last 12 months of news covering financing, partnerships, manufacturing, and company guidance; sourced from IR pages, SEC filings, and media services (BusinessWire, PRNewsWire, FierceBiotech, Bloomberg, Reuters, BioPharmaDive).
- IP landscape. Composition-of-matter, formulation, and method-of-use patents; filing, grant, and expiry dates; patent families; and licensing and freedom-to-operate signals from Google Patents, USPTO, and EPO.
How accuracy is enforced
Accuracy is maintained at three stages: at collection (by prescribing queries and restricting sources), at extraction (by validating every output against a structured schema), and at merge (by normalizing, deduplicating, and renumbering references into a single consolidated profile). The guardrails that operate across these stages are:
- No speculation rule. If a data point cannot be verified after a thorough search, the output will say "Not found." Speculation, estimation, and inference are prohibited.
- Citation on every statement. Every factual claim carries a numeric reference tied to a citations list. Uncited statements are rejected at validation, meaning every finding can be traced back to a primary source.
- Domain-restricted searches. Key searches are pinned to authoritative domains — ClinicalTrials.gov, FDA.gov, EMA, PubMed, USPTO, EPO, and others — so results cannot drift toward lower-quality secondary sources.
- Official source hierarchy. When sources conflict, a defined hierarchy is applied. Milestones without a verifiable dated source are excluded entirely.
- Structured extraction fields. Clinical trial data must report a defined set of fields: NCT number, design, blinding, control, sample size, dosing, endpoints, statistical results, adverse events, and dates. Ad-hoc narrative without structure is not accepted.
- URL normalization & deduplication. At merge, all reference URLs are normalized and deduplicated to prevent the same primary source from being counted as multiple distinct references, keeping the citation list clean.
- Clickable source links. Every reference is a live URL. Any fact in a competitive profile can be traced back to its primary source in one click — no dead links, no unverifiable claims.
What the guardrails prevent
The combination of these controls is designed to protect against four specific failure modes that commonly affect AI-generated competitive intelligence:
- Fabrication. The no-speculation rule and mandatory citation make unsourced claims structurally impossible to produce.
- Stale data. The 12-month milestones window and content-freshness re-search catch events published after initial profile generation.
- Drift between sections. Cross-section QA checks verify that NCT numbers, patent expiries, and development phases are consistent across Clinical, Regulatory, Milestones, and IP sections.
- Broken provenance. The pre-merge references check aborts the process rather than silently dropping citations, ensuring every profile ships with a complete, traceable source list.