As we scale our operations, maintaining data integrity across thousands of product feeds is critical. To stay ahead of potential issues, the team has rolled out two new robust alerting systems.
These tools are designed to move our teams from reactive fixes to proactive monitoring, ensuring Kargo’s processing and publisher integrations remain healthy.
Here's a breakdown of the features and how you can use them to monitor and manage feed performance.
System 1: Internal Feed Metrics Monitoring
This system monitors our internal Kargo processing. It identifies significant shifts in product status as they are being assessed on our end.
What it monitors: Specifically focused on Out of Stock and Archived product counts.
The Threshold: An alert is triggered when the volume of archived or out-of-stock products is 50% higher than the rolling two-day average.
Where to find it: Join the
#feed-metrics-alertsSlack channel.
How to use it:
Review the Alert: When a spike occurs, a notification will appear in Slack showing the Feed ID (e.g., FT-6184).
Expand for Detail: Click "Show More" to see all feeds currently hitting that metric.
Investigate: Click the provided link to jump directly into the Grafana dashboard for a deep dive into the historical data and engine logs.
Take Action: If the spike is expected (e.g., a planned seasonal clearance), you can click "Silence" to remove the alert from the channel and prevent further noise.
System 2: Publisher Alerting (External Health)
While Feed Metrics looks at our internal data, Publisher Alerting monitors how our data is being received by external partners. Currently, this is fully integrated with Meta via their Catalog Health API.
What it monitors: Focused primarily on Rejected Items at the publisher level.
The Threshold: An alert is triggered if the number of rejected products exceeds 25% of the total catalog.
Where to find it: Join the
#publisher-alertingSlack channel.
How to use it:
Identify the Source: The Slack alert provides the Feed ID, the specific Meta Catalog ID, and the Company Name.
Analyze Trends: Click the "Dashboard" link to open the Refine/Grafana view. You can expand the time range (e.g., last 24 hours) to see every time the feed ran.
-
Read the Visualization: * Blue Line: Represents the number of rejected items.
Red Line: Represents the 25% threshold.
If the blue line crosses the red line, the system flags a major catalog health issue.
Check Summaries: The dashboard provides a summary of specific warnings and errors causing the rejections, allowing for faster troubleshooting.
Summary of Channels
To stay updated on catalog health, please ensure you are a member of these two channels:
Channel Name |
Purpose |
|
Internal spikes in Out of Stock / Archived products within Kargo. |
|
External API alerts for rejected items (Meta/Facebook/Instagram). |
Comments
0 comments
Please sign in to leave a comment.