AWS S3
Daloopa's fundamental data is now available as file-based delivery directly into a dedicated Amazon S3 bucket. Access our data in your own cloud environment without building ETL pipelines or managing data movement — ideal for teams that want raw Parquet files integrated into their own pipelines, regardless of stack.
What Is S3?
Amazon S3 (Simple Storage Service) is AWS's cloud storage product. An S3 bucket is a container in the cloud where companies store and retrieve large datasets at scale. Daloopa provisions a dedicated bucket, loads your subscribed data on a scheduled basis, and grants your AWS account read-only access.
Why S3?
Snowflake and Databricks serve teams already on those platforms. S3 fills the gap for clients who want raw file-based delivery into their own cloud pipelines — whether you're running Spark, Athena, Redshift, or a custom data stack.
Key Benefits
- Cost-Effective: No per-query API fees or warehouse compute costs — just S3 storage
- Rapid Setup: Clients access data shortly after provisioning
- No ETL Required: Data is delivered as company-specific Parquet files directly into your bucket, ready to consume as-is
- Seamless Integration: Join Daloopa data with internal datasets without data movement
- Scalable Access: Multiple teams can query without hitting API rate limits
- Instant Updates: Data refreshes reflect in your bucket every 30 minutes
How It Works
1. We Publish
Daloopa creates a dedicated S3 bucket and loads your agreed-upon data as ticker-partitioned Parquet files, starting with a full historical backfill.
2. We Grant Access
Engineering provisions read-only cross-account access to your AWS account so you can query directly from your environment.
3. We Update
When our data updates, new files land in your bucket within 30 minutes. Files older than 30 days are automatically removed — be sure to ingest data within that window.
S3 vs. API Coverage
| Capability | S3 | API |
|---|---|---|
| Companies | ✓ | ✓ |
| Series | ✓ | ✓ |
| Fundamentals | ✓ | ✓ |
| Download (company model) | — | ✓ |
| Export (CSV with series and datapoints) | — | ✓ |
| Industry Models | — | ✓ |
| Taxonomy (standardized metrics mapping) | ✓ | ✓ |
| Documents (download and keyword search) | — | ✓ |
Future Data: Additional data sources will be added in future releases.
Important Details: Fields within each Parquet file mirror our API responses with minor differences — clients familiar with our API will recognize the same data.
Data Format
S3 clients receive data as ticker-partitioned Parquet files organized by company. Each data file contains the full set of company metadata, series definitions, and fundamental financial data for that ticker.
Note: This differs from Snowflake and Databricks, where the same data is structured across three relational tables (Companies, Series, and Fundamentals) rather than consolidated by ticker.
File Structure
Your bucket is organized with a top-level metadata.parquet index file and ticker-partitioned data files:
s3://<your-daloopa-bucket>/
metadata.parquet
ticker=AAPL/data_<timestamp>.parquet
ticker=GOOGL/data_<timestamp>.parquet
...
metadata.parquetserves as an index — one row per ticker with the latest file timestamp and company identifiers.ticker=<TICKER>/data_<timestamp>.parquetfiles contain the fundamental data. Each update produces a new timestamped file per updated ticker.
Schemas
metadata.parquet — One row per company. Use this to discover available tickers and locate the latest data file for each:
| Column | Type | Description |
|---|---|---|
| company_id | BIGINT | Unique identifier for the company |
| ticker | STRING | Ticker symbol of the company |
| name | STRING | Legal name of the company (e.g., "Apple Inc.") |
| company_identifiers | LIST | List of company identifiers (e.g., ISIN, CIK, CapIQCompanyId), each with identifier_type and identifier_value fields. ISIN is recommended as the primary match key |
| last_updated | TIMESTAMP | Timestamp of the most recent data file upload for this company |
data_<timestamp>.parquet — Per-ticker data files. Each row is a single datapoint (one metric, one period, one filing):
| Column | Type | Description |
|---|---|---|
| id | BIGINT | Unique identifier for the datapoint |
| ticker | STRING | Ticker symbol of the company |
| company_name | STRING | Legal name of the company (e.g., "Apple Inc.") |
| series_id | BIGINT | Unique identifier for the company series the datapoint belongs to |
| industry | STRING | Parent industry classification from the Daloopa taxonomy (e.g., "Technology") |
| sector | STRING | Parent sector classification from the Daloopa taxonomy (e.g., "Information Technology") |
| sub_industries | LIST | List of sub-industry classifications from the Daloopa taxonomy (e.g., ["Software Publishers"]). Most granular level of the sector > industry > sub-industry hierarchy |
| full_series_name | STRING | Full hierarchical context of the series within the Daloopa model (e.g., "Income Statement | Total cost of sales"). Pipe-delimited path from section to line item |
| label | STRING | Short description of the datapoint (e.g., "iPhone", "Total Revenue") |
| category | STRING | Section or grouping under which the datapoint is categorized in the model (e.g., "Income Statement", "Balance Sheet", "Cash Flow Statement", "KPIs", "Segmental Breakdown", "Guidance") |
| restated | BOOLEAN | Whether the datapoint has been restated in a subsequent filing (true = this value was revised from its original report) |
| filing_type | STRING | Type of regulatory filing from which the datapoint was sourced (e.g., "8-K", "10-Q", "10-K") |
| value_raw | DECIMAL(38,9) | Raw value of the datapoint as reported in the source filing |
| value_normalized | DECIMAL(38,9) | Normalized value, adjusted for consistency (e.g., annualized quarterly values converted to a comparable basis) |
| unit | STRING | Unit of measurement (e.g., "Billion", "Million", "Dollar", "Percent") |
| calendar_period | STRING | Calendar period, formatted as YYYYQ# for quarters or YYYYFY for fiscal years (e.g., "2021Q2", "2020FY"). Determined by the midpoint of the fiscal period falling within a calendar quarter |
| fiscal_period | STRING | Company's actual fiscal period, formatted as YYYYQ# or YYYYFY (e.g., "2024Q1"). May differ from calendar_period for companies with non-standard fiscal year ends |
| span | STRING | Periodicity of the datapoint (e.g., "Quarterly", "Annual") |
| fiscal_date | DATE | End date of the fiscal period for the datapoint (e.g., 2023-12-31) |
| document_id | BIGINT | Unique identifier for the source document (filing) from which the datapoint was extracted |
| filing_date | DATE | Date the regulatory filing was reported |
| document_released_at | TIMESTAMP | Datetime when the source document was ingested into Daloopa (i.e., when the datapoint became available) |
| created_at | TIMESTAMP | Timestamp when the datapoint was originally created in Daloopa |
| updated_at | TIMESTAMP | Timestamp of the last modification to the datapoint |
Getting Started
1. Provide Your AWS Account ID
Share your AWS account ID with your Daloopa account team so we can provision cross-account read-only access to your dedicated bucket.
2. Access Data
Once access is granted, read files directly from the bucket using any AWS-compatible tool:
AWS CLI
# List available ticker partitions
aws s3 ls s3://<your-daloopa-bucket>/
# List files for a specific ticker
aws s3 ls s3://<your-daloopa-bucket>/ticker=AAPL/
# Copy the metadata index locally
aws s3 cp s3://<your-daloopa-bucket>/metadata.parquet ./data/
# Sync all data to a local directory
aws s3 sync s3://<your-daloopa-bucket>/ ./daloopa-data/Python (pandas + PyArrow)
import pandas as pd
# Read the metadata index to see available tickers and latest timestamps
metadata = pd.read_parquet("s3://<your-daloopa-bucket>/metadata.parquet")
# Read all data for a single ticker
aapl = pd.read_parquet("s3://<your-daloopa-bucket>/ticker=AAPL/")
# Read the full dataset (all tickers)
df = pd.read_parquet("s3://<your-daloopa-bucket>/")Amazon Athena
-- Create an external table over the Daloopa bucket
CREATE EXTERNAL TABLE daloopa_fundamentals
STORED AS PARQUET
LOCATION 's3://<your-daloopa-bucket>/'
TBLPROPERTIES ('parquet.compress'='SNAPPY');
-- Query a specific ticker
SELECT * FROM daloopa_fundamentals WHERE ticker = 'AAPL';Updated 14 days ago