AWS S3

Daloopa's fundamental data is now available as file-based delivery directly into a dedicated Amazon S3 bucket. Access our data in your own cloud environment without building ETL pipelines or managing data movement — ideal for teams that want raw Parquet files integrated into their own pipelines, regardless of stack.

What Is S3?

Amazon S3 (Simple Storage Service) is AWS's cloud storage product. An S3 bucket is a container in the cloud where companies store and retrieve large datasets at scale. Daloopa provisions a dedicated bucket, loads your subscribed data on a scheduled basis, and grants your AWS account read-only access.

Why S3?

Snowflake and Databricks serve teams already on those platforms. S3 fills the gap for clients who want raw file-based delivery into their own cloud pipelines — whether you're running Spark, Athena, Redshift, or a custom data stack.

Key Benefits

  • Cost-Effective: No per-query API fees or warehouse compute costs — just S3 storage
  • Rapid Setup: Clients access data shortly after provisioning
  • No ETL Required: Data is delivered as company-specific Parquet files directly into your bucket, ready to consume as-is
  • Seamless Integration: Join Daloopa data with internal datasets without data movement
  • Scalable Access: Multiple teams can query without hitting API rate limits
  • Instant Updates: Data refreshes reflect in your bucket every 30 minutes

How It Works

1. We Publish

Daloopa creates a dedicated S3 bucket and loads your agreed-upon data as ticker-partitioned Parquet files, starting with a full historical backfill.

2. We Grant Access

Engineering provisions read-only cross-account access to your AWS account so you can query directly from your environment.

3. We Update

When our data updates, new files land in your bucket within 30 minutes. Files older than 30 days are automatically removed — be sure to ingest data within that window.

S3 vs. API Coverage

CapabilityS3API
Companies
Series
Fundamentals
Download (company model)
Export (CSV with series and datapoints)
Industry Models
Taxonomy (standardized metrics mapping)
Documents (download and keyword search)

Future Data: Additional data sources will be added in future releases.

Important Details: Fields within each Parquet file mirror our API responses with minor differences — clients familiar with our API will recognize the same data.

Data Format

S3 clients receive data as ticker-partitioned Parquet files organized by company. Each data file contains the full set of company metadata, series definitions, and fundamental financial data for that ticker.

Note: This differs from Snowflake and Databricks, where the same data is structured across three relational tables (Companies, Series, and Fundamentals) rather than consolidated by ticker.

File Structure

Your bucket is organized with a top-level metadata.parquet index file and ticker-partitioned data files:

s3://<your-daloopa-bucket>/
    metadata.parquet
    ticker=AAPL/data_<timestamp>.parquet
    ticker=GOOGL/data_<timestamp>.parquet
    ...
  • metadata.parquet serves as an index — one row per ticker with the latest file timestamp and company identifiers.
  • ticker=<TICKER>/data_<timestamp>.parquet files contain the fundamental data. Each update produces a new timestamped file per updated ticker.

Schemas

metadata.parquet — One row per company. Use this to discover available tickers and locate the latest data file for each:

ColumnTypeDescription
company_idBIGINTUnique identifier for the company
tickerSTRINGTicker symbol of the company
nameSTRINGLegal name of the company (e.g., "Apple Inc.")
company_identifiersLISTList of company identifiers (e.g., ISIN, CIK, CapIQCompanyId), each with identifier_type and identifier_value fields. ISIN is recommended as the primary match key
last_updatedTIMESTAMPTimestamp of the most recent data file upload for this company

data_<timestamp>.parquet — Per-ticker data files. Each row is a single datapoint (one metric, one period, one filing):

ColumnTypeDescription
idBIGINTUnique identifier for the datapoint
tickerSTRINGTicker symbol of the company
company_nameSTRINGLegal name of the company (e.g., "Apple Inc.")
series_idBIGINTUnique identifier for the company series the datapoint belongs to
industrySTRINGParent industry classification from the Daloopa taxonomy (e.g., "Technology")
sectorSTRINGParent sector classification from the Daloopa taxonomy (e.g., "Information Technology")
sub_industriesLISTList of sub-industry classifications from the Daloopa taxonomy (e.g., ["Software Publishers"]). Most granular level of the sector > industry > sub-industry hierarchy
full_series_nameSTRINGFull hierarchical context of the series within the Daloopa model (e.g., "Income Statement | Total cost of sales"). Pipe-delimited path from section to line item
labelSTRINGShort description of the datapoint (e.g., "iPhone", "Total Revenue")
categorySTRINGSection or grouping under which the datapoint is categorized in the model (e.g., "Income Statement", "Balance Sheet", "Cash Flow Statement", "KPIs", "Segmental Breakdown", "Guidance")
restatedBOOLEANWhether the datapoint has been restated in a subsequent filing (true = this value was revised from its original report)
filing_typeSTRINGType of regulatory filing from which the datapoint was sourced (e.g., "8-K", "10-Q", "10-K")
value_rawDECIMAL(38,9)Raw value of the datapoint as reported in the source filing
value_normalizedDECIMAL(38,9)Normalized value, adjusted for consistency (e.g., annualized quarterly values converted to a comparable basis)
unitSTRINGUnit of measurement (e.g., "Billion", "Million", "Dollar", "Percent")
calendar_periodSTRINGCalendar period, formatted as YYYYQ# for quarters or YYYYFY for fiscal years (e.g., "2021Q2", "2020FY"). Determined by the midpoint of the fiscal period falling within a calendar quarter
fiscal_periodSTRINGCompany's actual fiscal period, formatted as YYYYQ# or YYYYFY (e.g., "2024Q1"). May differ from calendar_period for companies with non-standard fiscal year ends
spanSTRINGPeriodicity of the datapoint (e.g., "Quarterly", "Annual")
fiscal_dateDATEEnd date of the fiscal period for the datapoint (e.g., 2023-12-31)
document_idBIGINTUnique identifier for the source document (filing) from which the datapoint was extracted
filing_dateDATEDate the regulatory filing was reported
document_released_atTIMESTAMPDatetime when the source document was ingested into Daloopa (i.e., when the datapoint became available)
created_atTIMESTAMPTimestamp when the datapoint was originally created in Daloopa
updated_atTIMESTAMPTimestamp of the last modification to the datapoint

Getting Started

1. Provide Your AWS Account ID

Share your AWS account ID with your Daloopa account team so we can provision cross-account read-only access to your dedicated bucket.

2. Access Data

Once access is granted, read files directly from the bucket using any AWS-compatible tool:

AWS CLI

# List available ticker partitions
aws s3 ls s3://<your-daloopa-bucket>/

# List files for a specific ticker
aws s3 ls s3://<your-daloopa-bucket>/ticker=AAPL/

# Copy the metadata index locally
aws s3 cp s3://<your-daloopa-bucket>/metadata.parquet ./data/

# Sync all data to a local directory
aws s3 sync s3://<your-daloopa-bucket>/ ./daloopa-data/

Python (pandas + PyArrow)

import pandas as pd

# Read the metadata index to see available tickers and latest timestamps
metadata = pd.read_parquet("s3://<your-daloopa-bucket>/metadata.parquet")

# Read all data for a single ticker
aapl = pd.read_parquet("s3://<your-daloopa-bucket>/ticker=AAPL/")

# Read the full dataset (all tickers)
df = pd.read_parquet("s3://<your-daloopa-bucket>/")

Amazon Athena

-- Create an external table over the Daloopa bucket
CREATE EXTERNAL TABLE daloopa_fundamentals
  STORED AS PARQUET
  LOCATION 's3://<your-daloopa-bucket>/'
  TBLPROPERTIES ('parquet.compress'='SNAPPY');

-- Query a specific ticker
SELECT * FROM daloopa_fundamentals WHERE ticker = 'AAPL';