AWS S3
Daloopa's fundamental data is now available as file-based delivery directly into a dedicated Amazon S3 bucket. Access our data in your own cloud environment without building ETL pipelines or managing data movement — ideal for teams that want raw Parquet files integrated into their own pipelines, regardless of stack.
What Is S3?
Amazon S3 (Simple Storage Service) is AWS's cloud storage product. An S3 bucket is a container in the cloud where companies store and retrieve large datasets at scale. Daloopa provisions a dedicated bucket, loads your subscribed data on a scheduled basis, and grants your AWS account read-only access.
Why S3?
Snowflake and Databricks serve teams already on those platforms. S3 fills the gap for clients who want raw file-based delivery into their own cloud pipelines — whether you're running Spark, Athena, Redshift, or a custom data stack.
Key Benefits
- Cost-Effective: No per-query API fees or warehouse compute costs — just S3 storage
- Rapid Setup: Clients access data shortly after provisioning
- No ETL Required: Data is delivered as company-specific Parquet files directly into your bucket, ready to consume as-is
- Seamless Integration: Join Daloopa data with internal datasets without data movement
- Scalable Access: Multiple teams can query without hitting API rate limits
- Instant Updates: Data refreshes reflect in your bucket every 30 minutes
How It Works
1. We Publish
Daloopa creates a dedicated S3 bucket and loads your agreed-upon data as ticker-partitioned Parquet files, starting with a full historical backfill.
2. We Grant Access
Engineering provisions read-only cross-account access to your AWS account so you can query directly from your environment.
3. We Update
When our data updates, new files land in your bucket within 30 minutes. Files older than 30 days are automatically removed — be sure to ingest data within that window.
S3 vs. API Coverage
| Capability | S3 | API |
|---|---|---|
| Companies | ✓ | ✓ |
| Series | ✓ | ✓ |
| Fundamentals | ✓ | ✓ |
| Download (company model) | — | ✓ |
| Export (CSV with series and datapoints) | — | ✓ |
| Industry Models | — | ✓ |
| Taxonomy (standardized metrics mapping) | ✓ | ✓ |
| Documents (download and keyword search) | — | ✓ |
Future Data: Additional data sources will be added in future releases.
Important Details: Fields within each Parquet file mirror our API responses with minor differences — clients familiar with our API will recognize the same data.
Data Format
S3 clients receive data as ticker-partitioned Parquet files organized by company. Each data file contains the full set of company metadata, series definitions, and fundamental financial data for that ticker.
Note: This differs from Snowflake and Databricks, where the same data is structured across three relational tables (Companies, Series, and Fundamentals) rather than consolidated by ticker.
File Structure
Your bucket is organized with a top-level metadata.parquet index file and ticker-partitioned data files:
s3://<your-daloopa-bucket>/
metadata.parquet
ticker=AAPL/data_<timestamp>.parquet
ticker=GOOGL/data_<timestamp>.parquet
...
metadata.parquetserves as an index — one row per ticker with the latest file timestamp and company identifiers.ticker=<TICKER>/data_<timestamp>.parquetfiles contain the fundamental data. Each update produces a new timestamped file per updated ticker.
Schemas
metadata.parquet — One row per ticker. Use this to discover available tickers and locate the latest data file for each:
| Column | Type | Description |
|---|---|---|
| ticker | STRING | Ticker symbol |
| latest_file_timestamp | TIMESTAMP | Timestamp of the most recent data file for this ticker |
| company_identifiers | STRING | JSON-encoded company identifiers |
data_<timestamp>.parquet — Per-ticker data files:
| Column | Type |
|---|---|
| ticker | STRING |
| company_name | STRING |
| industry | STRING |
| sector | STRING |
| sub_industry | STRING |
| full_series_name | STRING |
| taxonomy_metric_name | STRING |
| label | STRING |
| category | STRING |
| title | STRING |
| restated | BOOLEAN |
| filing_type | STRING |
| value_raw | DECIMAL(38,9) |
| value_normalized | DECIMAL(38,9) |
| unit | STRING |
| calendar_period | STRING |
| fiscal_period | STRING |
| span | STRING |
| fiscal_date | DATE |
| document_id | BIGINT |
| filing_date | DATE |
| document_released_at | TIMESTAMP |
| created_at | TIMESTAMP |
| updated_at | TIMESTAMP |
Getting Started
1. Provide Your AWS Account ID
Share your AWS account ID with your Daloopa account team so we can provision cross-account read-only access to your dedicated bucket.
2. Access Data
Once access is granted, read files directly from the bucket using any AWS-compatible tool:
AWS CLI
# List available ticker partitions
aws s3 ls s3://<your-daloopa-bucket>/
# List files for a specific ticker
aws s3 ls s3://<your-daloopa-bucket>/ticker=AAPL/
# Copy the metadata index locally
aws s3 cp s3://<your-daloopa-bucket>/metadata.parquet ./data/
# Sync all data to a local directory
aws s3 sync s3://<your-daloopa-bucket>/ ./daloopa-data/Python (pandas + PyArrow)
import pandas as pd
# Read the metadata index to see available tickers and latest timestamps
metadata = pd.read_parquet("s3://<your-daloopa-bucket>/metadata.parquet")
# Read all data for a single ticker
aapl = pd.read_parquet("s3://<your-daloopa-bucket>/ticker=AAPL/")
# Read the full dataset (all tickers)
df = pd.read_parquet("s3://<your-daloopa-bucket>/")Amazon Athena
-- Create an external table over the Daloopa bucket
CREATE EXTERNAL TABLE daloopa_fundamentals
STORED AS PARQUET
LOCATION 's3://<your-daloopa-bucket>/'
TBLPROPERTIES ('parquet.compress'='SNAPPY');
-- Query a specific ticker
SELECT * FROM daloopa_fundamentals WHERE ticker = 'AAPL';Updated about 2 hours ago