Skip to content
Last updated

Output Data File Details

The following table describes the general details for the output files generated by Forager:

FeatureSupportedNotes
Location of filesFiles in Amazon S3Files can be unloaded directly to any user-supplied bucket in S3, then can be downloaded locally using AWS utilities.
Files in Google Cloud StorageFiles can be unloaded directly to any user-supplied container in Cloud Storage, then can be downloaded locally using Cloud Storage utilities.
Files in Microsoft AzureFiles can be unloaded directly to any user-supplied container in Azure, then can be downloaded locally using Azure utilities.
File formatsDelimited files (CSV, TSV, etc.)Any valid delimiter is supported; default is comma (i.e. CSV).
JSON
Parquet
File encodingUTF-8Output files are always encoded using UTF-8, regardless of the file format; no other character sets are supported.

Example file

Person NDJSON file: data_0_0_0.ndjson

Organization NDJSON file: data_0_0_0.ndjson

File paths & names

File paths are constructed as so:

/forager_data_feed/<data_feed_id>/<date_created>/<data_feed_type>_<schema_version>/data_<partition_id>.<file_type>.<compression_type>

# Example:
/forager_data_feed/123/2025-10-15/person_1.0.0/data_0_0_1.json.gzip

Where the above variables represent:

  • data_feed_id: A Forager provided data feed ID, provide by our support team.
  • data_feed_type: Possible options are; person, organization, or job.
  • schema_version: Forager schema version that represents the versioning of serialized data for data feed type, this will be provided by your customer support rep.
  • date_created: Date when the export was created, Ex: 2025-10-16.
  • partition_id: Partition column values generated when exporting your files, Ex: 0_0_0, 0_0_1, etc.
  • file_type: This will be one of the following options; json, csv, tsv, parquet.'
  • compression_type: This will be one of the following options; gzip, bz2, brotli, zstd.

Serialized Data Schema

Schema located here.