The "Sitemap Cleaner" app lets you set up automatic, scheduled cleanup of your sitemaps. It removes invalid URLs based on specific rules and generates a new sitemap file ready to be submitted to Search Console.
Using the "Sitemap Cleaner" app is easy 😉
1. Log in to our EdgeSEO solution and click on the app.
2. Click the "Add a Sitemap" button.
3. Fill in the configuration options (detailed below 👇).
4. Click “Save.”
5. After the first cleanup, retrieve the URL of the cleaned sitemap from the home page and submit it to Search Console.
The "Sitemap" field is where you enter the URL of the sitemap you want to clean up.
This field is required.
The 5xx option automatically removes all URLs that return a server error code from the sitemap.
The status codes affected are: 500, 503, and 504.
When should you enable it?
Check this option if you want to exclude pages that are unavailable on the server side. A 5xx URL in your sitemap indicates to Google a page that may exist but is inaccessible, which can negatively impact your crawl budget.
Example result
The 301 option automatically removes all URLs that have a permanent redirect from the sitemap.
The relevant status code is: 301.
When should you enable it?
Check this option if you want to exclude redirected URLs. A 301 redirect URL in your sitemap forces Google to follow the redirect before accessing the final page, which unnecessarily consumes crawl budget. Ideally, your sitemap should contain only canonical URLs that return a 200 status code.
The 404 option automatically removes all URLs that do not point to an existing page from the sitemap.
The relevant status code is: 404.
When should you enable it?
Check this option to exclude pages that cannot be found. This is the most critical option: submitting 404 URLs to Google signals dead pages and lowers the perceived quality of your site. This option is recommended in the vast majority of cases.
L'option Meta robot noindex permet de supprimer automatiquement toutes les URLs dont la page contient une balise <meta name="robots" content="noindex">.
Quand l'activer ?
Une page taguée noindex ne doit pas être soumise à Google dans un sitemap — c'est un signal contradictoire : vous demandez à Google de ne pas indexer la page tout en la signalant comme importante via le sitemap. Cochez cette option pour garantir la cohérence entre vos directives d'indexation et votre sitemap.
L'option Canonical permet de supprimer automatiquement toutes les URLs dont la balise <link rel="canonical"> pointe vers une URL différente de la page elle-même.
Quand l'activer ?
Une URL non-canonique dans votre sitemap envoie là encore un signal contradictoire à Google : vous soumettez une URL tout en indiquant que son contenu de référence est ailleurs. Cochez cette option pour ne conserver dans votre sitemap que les URLs qui s'auto-canonicalisent correctement.
This section allows you to automatically modify the format of the URLs in your sitemap before it is generated. These options are particularly useful for correcting technical inconsistencies without altering the source sitemap.
Convert HTTP to HTTPS
This option automatically converts all http:// URLs to their https:// equivalents in the cleaned-up sitemap.
When should you enable it?
If your source sitemap still contains HTTP URLs even though your site uses HTTPS, check this option to correct the inconsistency without having to regenerate your sitemap on the CMS side.
This option automatically converts relative URLs to absolute URLs in the cleaned sitemap.
When should you enable it?
Some CMS platforms generate sitemaps containing relative paths (e.g., /product/skateboard). Google expects absolute URLs in a sitemap. Check this option to correct this behavior without any technical intervention.
This option standardizes all URLs in your sitemap so that they consistently use the same version of your domain (with or without www).
When should you enable it?
If your source sitemap contains a mix of URLs with and without www, check this option to ensure consistency in the file submitted to Google and avoid any duplication issues.
Day
The "Day" selector allows you to specify the day(s) of the week on which the automatic cleanup will run.
Recommendation: For an e-commerce catalog that changes frequently, schedule the cleanup daily. For a more stable site, once or twice a week is sufficient.
Hour and Minute
The “Hour” and “Minute” fields allow you to set the exact time for the cleanup to run.
Recommendation: Schedule the cleanup early in the morning (e.g., 3:00 a.m.), before the usual crawl bots visit, so that the clean sitemap is available right at the start of the crawl day.
Request Limit (req/min)
The "Request Limit" field allows you to set the maximum number of requests per minute that the Sitemap Cleaner crawler will send to your server while analyzing the sitemap.
The value must be between 300 and 600 req/min. The default value is 300.
When should you change it?
Keep the default value (300) in most cases. Increase this limit only if your infrastructure can handle a higher load and you want to speed up the analysis of large sitemaps. Conversely, if your hosting is sensitive to load, keep the value low to avoid any impact on site availability during execution.
The "Concurrency" field lets you set the number of simultaneous requests sent during the sitemap crawl.
The value must be between 10 and 1000. The default value is 10.
When should you change it?
High concurrency speeds up the crawl by checking multiple URLs at the same time. Increase this value if you have a very large sitemap and a stable infrastructure. Reduce it if you encounter timeouts or 5xx errors during runs, which may indicate that your server is overloaded by simultaneous requests.
Sitemap Cleaner includes an analytics module to continuously monitor the quality of your sitemaps as they run.
Available metrics
Witheach run, you can view:
1. URLs analyzed: Total number of URLs in the source sitemap
2. URLs retained: Number of valid URLs retained in the cleaned sitemap
3 - URLs removed: Number of URLs removed, all types combined
4 - Breakdown by error type: Volume of removals by type: 404, 301, 5xx, noindex, canonical
Trends over time
The trend graph allows you to track the trend from one cleanup to the next.
1. Sudden increase in deletions: Production incident, uncontrolled catalog purge, server degradation
2. Gradual decrease in deletions: Cleanup in progress, improving catalog quality
3. Stable metrics: Feed under control, consistent sitemap quality
Cleanup history
The historyretains the latest runs along with their status, the volume of processed URLs, and the final size of the generated sitemap—useful for your internal reporting or discussions with IT.