Building a Cross-Platform DSA View: What the Data Doesn't Tell You

The VLOP dashboard I published two days ago brings 30 services into a single view. That's genuinely useful, it's the first time this data has been aggregatable across platforms in an interactive format, and the H2 2025 reports are the first to follow the Commission's harmonized template in full. But the limitations are as important as the data, and some of them are structural problems with how DSA transparency reporting works, not just features of this particular dashboard.

Category definitions aren't standardized

The most significant limitation: what TikTok calls "Hate Speech" and what Meta calls "Hate Speech" are defined by each platform's own content policies, not by the DSA. The regulation requires platforms to report by content category, and it specifies a list of categories. But it doesn't define what content falls within each category, that's left to platform policy.

This means a platform with a narrow hate speech policy definition will show lower numbers than a platform with a broader one, even if they're moderating equivalent amounts of content. Cross-platform volume comparisons in the dashboard are comparisons of platform-defined categories against other platform-defined categories, not comparisons of a consistent underlying construct.

The DSA Observatory argued in January that prior reporting had made the accuracy requirement essentially meaningless. The Commission's harmonized template, which the H2 2025 reports follow for the first time, adds precision and recall indicators for automated detection tools, which addresses that specific gap. Precision and recall are genuinely more informative than raw removal counts. But precision and recall within each platform's own category definition still doesn't resolve the cross-platform comparability problem, because the categories themselves aren't defined consistently.

Data is self-reported without third-party audit

Every number in the dashboard is self-reported by the platform. The DSA doesn't currently require third-party verification of transparency report data, no external auditor confirms that the numbers are accurate, that the categorization methodology is consistent across reporting periods, or that the data extraction process is reliable.

This doesn't mean the numbers are wrong. Platforms have legal compliance obligations, and the reports are reviewed by internal Legal and Policy teams before publication. But it means the data should be treated as self-reported operational data: useful for understanding trends and patterns within a platform, less reliable as a precise measurement of an absolute quantity, and not verifiable against an external reference point.

Aggregation methods differ

Google reports its six designated services separately: Search, Maps, Play, Shopping, YouTube, and a sixth service entry. Meta reports aggregate figures for Facebook and Instagram combined. Other platforms report at the platform level.

This affects how you read the numbers. Google's entry for any single service looks smaller than the comparable figures from platforms that report in aggregate. The total across Google's six entries in the dashboard is the comparable figure to Meta's single combined entry, but the dashboard also lets you accidentally compare Google Search alone to TikTok's platform-wide total, which is not a meaningful comparison. Filtering carefully matters.

How to use the dashboard reliably

The dashboard is most useful for tracking volume trends within a single platform over time. That's a like-for-like comparison within a consistent methodology and category definition, the numbers mean the same thing across periods for a given platform, even if they don't mean the same thing as another platform's numbers in the same category.

It's also useful for understanding the rough shape of the ecosystem: which categories generate the most notices across platforms, which platforms are processing substantially more appeals than others, what the government orders picture looks like geographically. Those are patterns visible at the level of order-of-magnitude differences, where the category definition divergences matter less.

It's less useful for precise cross-platform comparisons of absolute volume, and for drawing normative conclusions about which platforms are moderating more or better. Those conclusions require category definitions and audit standards that don't yet exist in DSA reporting. The structural problems that remain, self-reported data, inconsistent category definitions, are real constraints on what the data can support, and treating the numbers as more comparable than they are produces misleading conclusions.