AWS S3

Daloopa's fundamental data is now available as file-based delivery directly into a dedicated Amazon S3 bucket. Access our data in your own cloud environment without building ETL pipelines or managing data movement — ideal for teams that want raw Parquet files integrated into their own pipelines, regardless of stack.

What Is S3?

Amazon S3 (Simple Storage Service) is AWS's cloud storage product. An S3 bucket is a container in the cloud where companies store and retrieve large datasets at scale. Daloopa provisions a dedicated bucket, loads your subscribed data on a scheduled basis, and grants your AWS account read-only access.

Why S3?

Snowflake and Databricks serve teams already on those platforms. S3 fills the gap for clients who want raw file-based delivery into their own cloud pipelines — whether you're running Spark, Athena, Redshift, or a custom data stack.

Key Benefits

  • Cost-Effective: No per-query API fees or warehouse compute costs — just S3 storage
  • Rapid Setup: Clients access data shortly after provisioning
  • No ETL Required: Data is delivered as company-specific Parquet files directly into your bucket, ready to consume as-is
  • Seamless Integration: Join Daloopa data with internal datasets without data movement
  • Scalable Access: Multiple teams can query without hitting API rate limits
  • Instant Updates: Data refreshes reflect in your bucket every 30 minutes

How It Works

1. We Publish

Daloopa creates a dedicated S3 bucket and loads your agreed-upon data as ticker-partitioned Parquet files, starting with a full historical backfill.

2. We Grant Access

Engineering provisions read-only cross-account access to your AWS account so you can query directly from your environment.

3. We Update

When our data updates, new files land in your bucket within 30 minutes. Files older than 30 days are automatically removed — be sure to ingest data within that window.

S3 vs. API Coverage

CapabilityS3API
Companies
Series
Fundamentals
Download (company model)
Export (CSV with series and datapoints)
Industry Models
Taxonomy (standardized metrics mapping)
Documents (download and keyword search)

Future Data: Additional data sources will be added in future releases.

Important Details: Fields within each Parquet file mirror our API responses with minor differences — clients familiar with our API will recognize the same data.

Data Format

S3 clients receive data as ticker-partitioned Parquet files organized by company. Each data file contains the full set of company metadata, series definitions, and fundamental financial data for that ticker.

Note: This differs from Snowflake and Databricks, where the same data is structured across three relational tables (Companies, Series, and Fundamentals) rather than consolidated by ticker.

File Structure

Your bucket is organized with a top-level metadata.parquet index file and ticker-partitioned data files:

s3://<your-daloopa-bucket>/
    metadata.parquet
    ticker=AAPL/data_<timestamp>.parquet
    ticker=GOOGL/data_<timestamp>.parquet
    ...
  • metadata.parquet serves as an index — one row per ticker with the latest file timestamp and company identifiers.
  • ticker=<TICKER>/data_<timestamp>.parquet files contain the fundamental data. Each update produces a new timestamped file per updated ticker.

Schemas

metadata.parquet — One row per ticker. Use this to discover available tickers and locate the latest data file for each:

ColumnTypeDescription
tickerSTRINGTicker symbol
latest_file_timestampTIMESTAMPTimestamp of the most recent data file for this ticker
company_identifiersSTRINGJSON-encoded company identifiers

data_<timestamp>.parquet — Per-ticker data files:

ColumnType
tickerSTRING
company_nameSTRING
industrySTRING
sectorSTRING
sub_industrySTRING
full_series_nameSTRING
taxonomy_metric_nameSTRING
labelSTRING
categorySTRING
titleSTRING
restatedBOOLEAN
filing_typeSTRING
value_rawDECIMAL(38,9)
value_normalizedDECIMAL(38,9)
unitSTRING
calendar_periodSTRING
fiscal_periodSTRING
spanSTRING
fiscal_dateDATE
document_idBIGINT
filing_dateDATE
document_released_atTIMESTAMP
created_atTIMESTAMP
updated_atTIMESTAMP

Getting Started

1. Provide Your AWS Account ID

Share your AWS account ID with your Daloopa account team so we can provision cross-account read-only access to your dedicated bucket.

2. Access Data

Once access is granted, read files directly from the bucket using any AWS-compatible tool:

AWS CLI

# List available ticker partitions
aws s3 ls s3://<your-daloopa-bucket>/

# List files for a specific ticker
aws s3 ls s3://<your-daloopa-bucket>/ticker=AAPL/

# Copy the metadata index locally
aws s3 cp s3://<your-daloopa-bucket>/metadata.parquet ./data/

# Sync all data to a local directory
aws s3 sync s3://<your-daloopa-bucket>/ ./daloopa-data/

Python (pandas + PyArrow)

import pandas as pd

# Read the metadata index to see available tickers and latest timestamps
metadata = pd.read_parquet("s3://<your-daloopa-bucket>/metadata.parquet")

# Read all data for a single ticker
aapl = pd.read_parquet("s3://<your-daloopa-bucket>/ticker=AAPL/")

# Read the full dataset (all tickers)
df = pd.read_parquet("s3://<your-daloopa-bucket>/")

Amazon Athena

-- Create an external table over the Daloopa bucket
CREATE EXTERNAL TABLE daloopa_fundamentals
  STORED AS PARQUET
  LOCATION 's3://<your-daloopa-bucket>/'
  TBLPROPERTIES ('parquet.compress'='SNAPPY');

-- Query a specific ticker
SELECT * FROM daloopa_fundamentals WHERE ticker = 'AAPL';