AWS S3

What Is S3?

Amazon S3 (Simple Storage Service) is AWS's cloud storage product. An S3 bucket is a container in the cloud where companies store and retrieve large datasets at scale. Daloopa provisions a dedicated bucket, loads your subscribed data on a scheduled basis, and grants your AWS account read-only access.

Why S3?

Snowflake and Databricks serve teams already on those platforms. S3 fills the gap for clients who want raw file-based delivery into their own cloud pipelines — whether you're running Spark, Athena, Redshift, or a custom data stack.

Key Benefits

Cost-Effective: No per-query API fees or warehouse compute costs — just S3 storage
Rapid Setup: Clients access data shortly after provisioning
No ETL Required: Data is delivered as company-specific Parquet files directly into your bucket, ready to consume as-is
Seamless Integration: Join Daloopa data with internal datasets without data movement
Scalable Access: Multiple teams can query without hitting API rate limits
Instant Updates: Data refreshes reflect in your bucket every 30 minutes

How It Works

1. We Publish

Daloopa creates a dedicated S3 bucket and loads your agreed-upon data as ticker-partitioned Parquet files, starting with a full historical backfill.

2. We Grant Access

Engineering provisions read-only cross-account access to your AWS account so you can query directly from your environment.

3. We Update

When our data updates, new files land in your bucket within 30 minutes. Files older than 30 days are automatically removed — be sure to ingest data within that window.

S3 vs. API Coverage

Capability	S3	API
Companies	✓	✓
Series	✓	✓
Fundamentals	✓	✓
Download (company model)	—	✓
Export (CSV with series and datapoints)	—	✓
Industry Models	—	✓
Taxonomy (standardized metrics mapping)	✓	✓
Documents (download and keyword search)	—	✓

Future Data: Additional data sources will be added in future releases.

Important Details: Fields within each Parquet file mirror our API responses with minor differences — clients familiar with our API will recognize the same data.

Data Format

S3 clients receive data as ticker-partitioned Parquet files organized by company. Each data file contains the full set of company metadata, series definitions, and fundamental financial data for that ticker.

Note: This differs from Snowflake and Databricks, where the same data is structured across three relational tables (Companies, Series, and Fundamentals) rather than consolidated by ticker.

File Structure

Your bucket is organized with a top-level metadata.parquet index file and ticker-partitioned data files:

s3://<your-daloopa-bucket>/
    metadata.parquet
    ticker=AAPL/data_<timestamp>.parquet
    ticker=GOOGL/data_<timestamp>.parquet
    ...

metadata.parquet serves as an index — one row per ticker with the latest file timestamp and company identifiers.
ticker=<TICKER>/data_<timestamp>.parquet files contain the fundamental data. Each update produces a new timestamped file per updated ticker.

Schemas

metadata.parquet — One row per company. Use this to discover available tickers and locate the latest data file for each:

Column	Type	Description
company_id	BIGINT	Unique identifier for the company
ticker	STRING	Ticker symbol of the company
name	STRING	Legal name of the company (e.g., "Apple Inc.")
company_identifiers	LIST	List of company identifiers (e.g., ISIN, CIK, CapIQCompanyId), each with `identifier_type` and `identifier_value` fields. ISIN is recommended as the primary match key
last_updated	TIMESTAMP	Timestamp of the most recent data file upload for this company

data_<timestamp>.parquet — Per-ticker data files. Each row is a single datapoint (one metric, one period, one filing):

Column	Type	Description
id	BIGINT	Unique identifier for the datapoint
ticker	STRING	Ticker symbol of the company
company_name	STRING	Legal name of the company (e.g., "Apple Inc.")
series_id	BIGINT	Unique identifier for the company series the datapoint belongs to
industry	STRING	Parent industry classification from the Daloopa taxonomy (e.g., "Technology")
sector	STRING	Parent sector classification from the Daloopa taxonomy (e.g., "Information Technology")
sub_industries	LIST	List of sub-industry classifications from the Daloopa taxonomy (e.g., ["Software Publishers"]). Most granular level of the sector > industry > sub-industry hierarchy
full_series_name	STRING	Full hierarchical context of the series within the Daloopa model (e.g., "Income Statement \| Total cost of sales"). Pipe-delimited path from section to line item
label	STRING	Short description of the datapoint (e.g., "iPhone", "Total Revenue")
category	STRING	Section or grouping under which the datapoint is categorized in the model (e.g., "Income Statement", "Balance Sheet", "Cash Flow Statement", "KPIs", "Segmental Breakdown", "Guidance")
restated	BOOLEAN	Whether the datapoint has been restated in a subsequent filing (`true` = this value was revised from its original report)
filing_type	STRING	Type of regulatory filing from which the datapoint was sourced (e.g., "8-K", "10-Q", "10-K")
value_raw	DECIMAL(38,9)	Raw value of the datapoint as reported in the source filing
value_normalized	DECIMAL(38,9)	Normalized value, adjusted for consistency (e.g., annualized quarterly values converted to a comparable basis)
unit	STRING	Unit of measurement (e.g., "Billion", "Million", "Dollar", "Percent")
calendar_period	STRING	Calendar period, formatted as `YYYYQ#` for quarters or `YYYYFY` for fiscal years (e.g., "2021Q2", "2020FY"). Determined by the midpoint of the fiscal period falling within a calendar quarter
fiscal_period	STRING	Company's actual fiscal period, formatted as `YYYYQ#` or `YYYYFY` (e.g., "2024Q1"). May differ from `calendar_period` for companies with non-standard fiscal year ends
span	STRING	Periodicity of the datapoint (e.g., "Quarterly", "Annual")
fiscal_date	DATE	End date of the fiscal period for the datapoint (e.g., 2023-12-31)
document_id	BIGINT	Unique identifier for the source document (filing) from which the datapoint was extracted
filing_date	DATE	Date the regulatory filing was reported
document_released_at	TIMESTAMP	Datetime when the source document was ingested into Daloopa (i.e., when the datapoint became available)
created_at	TIMESTAMP	Timestamp when the datapoint was originally created in Daloopa
updated_at	TIMESTAMP	Timestamp of the last modification to the datapoint

Getting Started

1. Provide Your AWS Account ID

Share your AWS account ID with your Daloopa account team so we can provision cross-account read-only access to your dedicated bucket.

2. Access Data

Once access is granted, read files directly from the bucket using any AWS-compatible tool:

AWS CLI

# List available ticker partitions
aws s3 ls s3://<your-daloopa-bucket>/

# List files for a specific ticker
aws s3 ls s3://<your-daloopa-bucket>/ticker=AAPL/

# Copy the metadata index locally
aws s3 cp s3://<your-daloopa-bucket>/metadata.parquet ./data/

# Sync all data to a local directory
aws s3 sync s3://<your-daloopa-bucket>/ ./daloopa-data/

Python (pandas + PyArrow)

import pandas as pd

# Read the metadata index to see available tickers and latest timestamps
metadata = pd.read_parquet("s3://<your-daloopa-bucket>/metadata.parquet")

# Read all data for a single ticker
aapl = pd.read_parquet("s3://<your-daloopa-bucket>/ticker=AAPL/")

# Read the full dataset (all tickers)
df = pd.read_parquet("s3://<your-daloopa-bucket>/")

Amazon Athena

-- Create an external table over the Daloopa bucket
CREATE EXTERNAL TABLE daloopa_fundamentals
  STORED AS PARQUET
  LOCATION 's3://<your-daloopa-bucket>/'
  TBLPROPERTIES ('parquet.compress'='SNAPPY');

-- Query a specific ticker
SELECT * FROM daloopa_fundamentals WHERE ticker = 'AAPL';