Fundamental Uniqueness & Restatements

This page explains why two requests for the same data point can return different fundamental_id values, the key we recommend using to resolve a stable unique record, and the difference between an original and a restated fundamental.


The problem: fundamental_id is not a stable key

A single reported value can be captured by more than one process, and those captures arrive at different times:

  • Autotagger fundamentals are produced by our automated tagging pipeline. They are available quickly after a filing or earnings release.
  • Analyst fundamentals are created or reviewed manually. They arrive later, but represent a human-validated capture of the same value.

Because these are distinct records, they carry distinct fundamental_id values even when they describe the same underlying data point.

Why this matters in practice

Consider a user pulling a series immediately after earnings, then again an hour later:

WhenWhat existsfundamental_id returned
Right after earningsAutotagger capture onlye.g. fundamental_id = 5001
~1 hour after earningsAutotagger and analyst capturee.g. fundamental_id = 5042

Both pulls describe the same value for the same series and period — but the fundamental_id differs. Do not treat fundamental_id as a stable identifier for a data point, and do not key your storage or deduplication on it directly. If you do, the same fact will appear as two different records depending on when you fetched it.


Recommended uniqueness key

To resolve a single, stable record for each data point, deduplicate on this composite key:

series_id (int)  +  calendar_period (str)  +  restated (bool)

Then, within each group, take the metadata from the maximum fundamental_id.

ComponentTypeRole in the key
series_idintIdentifies the data series.
calendar_periodstrIdentifies the reporting period (see Period Format Standards — note this is the calendar period).
restatedboolSeparates the original disclosure from a later restated value (see below).

Why the maximum fundamental_id

fundamental_id increases over time, so the highest value in a group is the most recently created capture. Selecting the max means:

  • You prefer the later capture — typically the analyst-reviewed record over the earlier Autotagger one.
  • You get a deterministic result: repeated pulls converge on the same record once captures have settled.
  • Your data self-heals: an early Autotagger value is automatically superseded by a later capture without any special-case logic on your side.

Pattern: Group by (series_id, calendar_period, restated), sort each group by fundamental_id descending, and keep the first row. All metadata for the data point should be read from that winning record.


What is a restated fundamental?

A restatement occurs when a company revises a financial figure it had previously reported. Restatements happen for legitimate reasons, including:

  • Correction of an accounting error or clerical mistake.
  • Reclassification of line items between periods.
  • Adoption of a new accounting standard that changes how a figure is calculated.
  • Revisions surfaced during audit, or discovery of a prior misstatement.

When a company restates, the same metric for the same period now has two valid representations:

Description
Original fundamentalThe value as first reported for that period, reflecting what was disclosed at the time.
Restated fundamentalThe revised value for that same period, as later re-disclosed by the company.

This is why a single series_id + period can legitimately have both an original and a restated fundamental. They are not duplicates to be collapsed — they are two distinct, correct facts about the same period, captured at different points in the company's reporting history. The restated boolean is what distinguishes them.

Why both are kept

Different use cases need different versions:

  • As-originally-reported values matter for point-in-time analysis — for example, modeling what the market actually knew at the time, or backtesting against historical expectations.
  • Restated values matter when you want the company's most accurate current view of a period.

Because both are valid and serve different needs, restated is part of the uniqueness key rather than something to deduplicate away. Collapsing across restated would silently discard one of two legitimate facts.


Putting it together

For a given series and period there can be up to two canonical records — one original, one restated — and each of those may have been captured multiple times (Autotagger, then analyst):

series_id = 1914880, calendar_period = 2024Q1
├── restated = false
│     ├── fundamental_id = 5001  (Autotagger)
│     └── fundamental_id = 5042  (Analyst)   ← winner (max id)
└── restated = true
      └── fundamental_id = 6310  (Analyst)   ← winner (max id)

Resolving with the recommended key yields two stable records: the original (from 5042) and the restated (from 6310). Each is the most recent capture of its respective fact.


Quick reference

  • Do not key data points on fundamental_id — it may changes as new captures arrive.
  • Do deduplicate on series_id + calendar_period + restated.
  • Within each group, take metadata from the maximum fundamental_id (most recent capture).
  • An original and a restated fundamental for the same series + period are both valid — keep them separate via the restated flag.
  • Autotagger captures arrive first; analyst captures arrive later with a higher fundamental_id and supersede them automatically under the max-id rule.