Skip to content
English - United States
  • There are no suggestions because the search field is empty.

Understanding Competitive Landscape data sources

Every competitive profile in Prudentia is built from a structured pipeline that queries a tiered set of authoritative sources and enforces strict accuracy guardrails at every stage. This article explains where the data comes from, how it is validated, and what that means for the confidence you can place in the outputs.

How competitive profiles are built

For each asset in the Competitive Landscape, Prudentia's pipeline runs a defined set of searches across six profile areas: Basic Information, Preclinical, Clinical, Regulatory, Milestones, and Intellectual Property. Coverage is engineered along two dimensions:

  • Breadth: multiple independent source categories are queried for each fact, so no single source is relied upon exclusively.
  • Authority: primary sources (e.g., regulatory agencies, trial registries, patent offices) are always preferred over secondary reporting.

Data sources by category

The pipeline draws from eight categories of primary sources, each queried with domain-restricted searches to ensure results come from authoritative origins:

Category

Primary Sources

What We Extract

Clinical Trial Registries

ClinicalTrials.gov (clinicaltrials.gov), EU Clinical Trials Register

NCT numbers, trial design, population, dosing, endpoints, status, dates

Regulatory Agencies

FDA.gov, accessdata.fda.gov (labels, Orange Book), DailyMed (FDA-approved prescribing info), EMA (ema.europa.eu), PMDA (Japan), NMPA (China)

Approvals, designations (Breakthrough, Orphan, Fast Track, RMAT, Priority Review), PDUFA dates, boxed warnings, MAA/BLA/NDA status

Peer-Reviewed Literature

PubMed, journal DOIs

Efficacy/safety publications, preclinical studies, mechanism of action

Medical Conferences

ASCO, ESMO, ASH, AAN, AES, MDS, ACC, AHA, ESC (disease-tailored)

Late-breaking abstracts, oral presentations, posters

Patent Databases

Google Patents, USPTO, EPO, INPADOC/Derwent families

Composition-of-matter, formulation, method-of-use patents, filing/grant/expiry dates, patent family, FTO signals

Company Disclosures

Company IR pages, SEC filings (10-K, 10-Q, 8-K), earnings calls, investor decks

Pipeline status, guidance, financing, licensing deals, manufacturing updates

Press Releases

GlobeNewswire, BusinessWire, PRNewswire, Accesswire

Topline results, regulatory announcements, partnership news

Business & Trade Media

Bloomberg, Reuters, FierceBiotech, BioPharma Dive

Deal flow, context, cross-verification of company claims

    Data sources by category

    The six sections of every competitive profile are built from specific source categories and extraction schemas:

    • Basic information. Mechanism of action, target, modality, route of administration, developer, development phase, and FDA designations.
    • Preclinical. Efficacy in animal and xenograft models, ADME/PK, and toxicology. Anchored on PubMed and FDA briefing documents.
    • Clinical. Full NCT inventory from ClinicalTrials.gov, per-trial results from publications and abstracts, and disease-tailored conference data (e.g., ASCO/ESMO for oncology, AAN/AES for neurology).
    • Regulatory. IND, Orphan, Breakthrough, Fast Track, RMAT, and Priority Review status; BLA/NDA/MAA submissions; PDUFA dates; advisory committees; and boxed warnings. Sourced from FDA, EMA, and DailyMed.
    • Milestones. Last 12 months of news covering financing, partnerships, manufacturing, and company guidance; sourced from IR pages, SEC filings, and  media services (BusinessWire, PRNewsWire, FierceBiotech, Bloomberg, Reuters, BioPharmaDive).
    • IP landscape. Composition-of-matter, formulation, and method-of-use patents; filing, grant, and expiry dates; patent families; and licensing and freedom-to-operate signals from Google Patents, USPTO, and EPO.

    How accuracy is enforced

    Accuracy is maintained at three stages: at collection (by prescribing queries and restricting sources), at extraction (by validating every output against a structured schema), and at merge (by normalizing, deduplicating, and renumbering references into a single consolidated profile). The guardrails that operate across these stages are:

    • No speculation rule. If a data point cannot be verified after a thorough search, the output will say "Not found." Speculation, estimation, and inference are prohibited. 
    • Citation on every statement. Every factual claim carries a numeric reference tied to a citations list. Uncited statements are rejected at validation, meaning every finding can be traced back to a primary source.
    • Domain-restricted searches. Key searches are pinned to authoritative domains — ClinicalTrials.gov, FDA.gov, EMA, PubMed, USPTO, EPO, and others — so results cannot drift toward lower-quality secondary sources.
    • Official source hierarchy. When sources conflict, a defined hierarchy is applied. Milestones without a verifiable dated source are excluded entirely.
    • Structured extraction fields. Clinical trial data must report a defined set of fields: NCT number, design, blinding, control, sample size, dosing, endpoints, statistical results, adverse events, and dates. Ad-hoc narrative without structure is not accepted.
    • URL normalization & deduplication. At merge, all reference URLs are normalized and deduplicated to prevent the same primary source from being counted as multiple distinct references, keeping the citation list clean.
    • Clickable source links. Every reference is a live URL. Any fact in a competitive profile can be traced back to its primary source in one click — no dead links, no unverifiable claims.

    What the guardrails prevent

    The combination of these controls is designed to protect against four specific failure modes that commonly affect AI-generated competitive intelligence:

    • Fabrication. The no-speculation rule and mandatory citation make unsourced claims structurally impossible to produce.
    • Stale data. The 12-month milestones window and content-freshness re-search catch events published after initial profile generation.
    • Drift between sections. Cross-section QA checks verify that NCT numbers, patent expiries, and development phases are consistent across Clinical, Regulatory, Milestones, and IP sections.
    • Broken provenance. The pre-merge references check aborts the process rather than silently dropping citations, ensuring every profile ships with a complete, traceable source list.