I see 840 million person records in my data feed but only 794 million have been updated in the last month. Why?
This is expected behavior.
The total record count represents all person profiles currently in the dataset.
The “updated in the last month” count reflects how many of those profiles we were able to recrawl.
We no longer crawl profiles when a:
- Profile becomes inactive.
- Profile is banned on Linkedin.
- Profile no longer exists.
Why do some person records not have LinkedIn identifiers?
You may discover a small subset of Linkedin profiles with missing Linkedin Identifiers. This typically happens when a Linkedin profile becomes banned.
How do refresh cycles work?
- Forager data feeds are continuously refreshed on a rolling basis. We typically recrawl our entire dataset on a 3 week average.
- Delivery schedules (daily, weekly, monthly) control how often the recrawled profiles are delivered to you, not how often the underlying data is refreshed.
- Forager only transfers data to your bucket or tables that was crawled since the last scheduled transfer.
- The
date_updatedfield indicates when the profile was last crawled.
How are deletions handled?
When Forager determines that a record is no longer valid, legally distributable, or should no longer appear in the dataset. The crawlers will no longer update this profile, causing the profile not to be transfered in any future data transfers.
How do schema versions change?
Forager schemas are versioned to ensure stability for downstream pipelines.
When fields are added, structures evolve, or improvements are made, a new schema version is released. The schema version is included in:
- File paths for bucket deliveries
- Table metadata for Snowflake
This allows your data engineering team to:
- Continue using older versions
- Migrate on your timeline
- Validate changes before moving production pipelines
Schema changes are designed to be additive whenever possible, minimizing breaking changes.