Fundamental Uniqueness & Restatements
This page explains why two requests for the same data point can return different fundamental_id values, the key we recommend using to resolve a stable unique record, and the difference between an original and a restated fundamental.
The problem: fundamental_id is not a stable key
fundamental_id is not a stable keyA single reported value can be captured by more than one process, and those captures arrive at different times:
- Autotagger fundamentals are produced by our automated tagging pipeline. They are available quickly after a filing or earnings release.
- Analyst fundamentals are created or reviewed manually. They arrive later, but represent a human-validated capture of the same value.
Because these are distinct records, they carry distinct fundamental_id values even when they describe the same underlying data point.
Why this matters in practice
Consider a user pulling a series immediately after earnings, then again an hour later:
| When | What exists | fundamental_id returned |
|---|---|---|
| Right after earnings | Autotagger capture only | e.g. fundamental_id = 5001 |
| ~1 hour after earnings | Autotagger and analyst capture | e.g. fundamental_id = 5042 |
Both pulls describe the same value for the same series and period — but the fundamental_id differs. Do not treat fundamental_id as a stable identifier for a data point, and do not key your storage or deduplication on it directly. If you do, the same fact will appear as two different records depending on when you fetched it.
Recommended uniqueness key
To resolve a single, stable record for each data point, deduplicate on this composite key:
series_id (int) + calendar_period (str) + restated (bool)
Then, within each group, take the metadata from the maximum fundamental_id.
| Component | Type | Role in the key |
|---|---|---|
series_id | int | Identifies the data series. |
calendar_period | str | Identifies the reporting period (see Period Format Standards — note this is the calendar period). |
restated | bool | Separates the original disclosure from a later restated value (see below). |
Why the maximum fundamental_id
fundamental_idfundamental_id increases over time, so the highest value in a group is the most recently created capture. Selecting the max means:
- You prefer the later capture — typically the analyst-reviewed record over the earlier Autotagger one.
- You get a deterministic result: repeated pulls converge on the same record once captures have settled.
- Your data self-heals: an early Autotagger value is automatically superseded by a later capture without any special-case logic on your side.
Pattern: Group by
(series_id, calendar_period, restated), sort each group byfundamental_iddescending, and keep the first row. All metadata for the data point should be read from that winning record.
What is a restated fundamental?
A restatement occurs when a company revises a financial figure it had previously reported. Restatements happen for legitimate reasons, including:
- Correction of an accounting error or clerical mistake.
- Reclassification of line items between periods.
- Adoption of a new accounting standard that changes how a figure is calculated.
- Revisions surfaced during audit, or discovery of a prior misstatement.
When a company restates, the same metric for the same period now has two valid representations:
| Description | |
|---|---|
| Original fundamental | The value as first reported for that period, reflecting what was disclosed at the time. |
| Restated fundamental | The revised value for that same period, as later re-disclosed by the company. |
This is why a single series_id + period can legitimately have both an original and a restated fundamental. They are not duplicates to be collapsed — they are two distinct, correct facts about the same period, captured at different points in the company's reporting history. The restated boolean is what distinguishes them.
Why both are kept
Different use cases need different versions:
- As-originally-reported values matter for point-in-time analysis — for example, modeling what the market actually knew at the time, or backtesting against historical expectations.
- Restated values matter when you want the company's most accurate current view of a period.
Because both are valid and serve different needs, restated is part of the uniqueness key rather than something to deduplicate away. Collapsing across restated would silently discard one of two legitimate facts.
Putting it together
For a given series and period there can be up to two canonical records — one original, one restated — and each of those may have been captured multiple times (Autotagger, then analyst):
series_id = 1914880, calendar_period = 2024Q1
├── restated = false
│ ├── fundamental_id = 5001 (Autotagger)
│ └── fundamental_id = 5042 (Analyst) ← winner (max id)
└── restated = true
└── fundamental_id = 6310 (Analyst) ← winner (max id)
Resolving with the recommended key yields two stable records: the original (from 5042) and the restated (from 6310). Each is the most recent capture of its respective fact.
Quick reference
- Do not key data points on
fundamental_id— it may changes as new captures arrive. - Do deduplicate on
series_id+calendar_period+restated. - Within each group, take metadata from the maximum
fundamental_id(most recent capture). - An original and a restated fundamental for the same series + period are both valid — keep them separate via the
restatedflag. - Autotagger captures arrive first; analyst captures arrive later with a higher
fundamental_idand supersede them automatically under the max-id rule.
Updated about 3 hours ago