How I Use LLMs in Compliance PM Work

I've worked in content policy and regulatory compliance long enough to be skeptical of "AI will automate X" claims. What I've found is more specific: there are particular tasks in compliance PM work where LLMs compress time significantly, and particular tasks where they don't help and shouldn't be trusted. The distinction matters.

Requirement extraction

The most reliable use case is parsing statutory text. When a new law or regulatory guidance comes out, the first PM task is understanding what it actually requires, specifically, what data must be captured, what categories must be maintained, and what format the output must take.

The process: paste the full text of the statute or guidance document into the model and ask it to enumerate every reportable data element. For each element, ask whether the description maps to something current data infrastructure already produces. Then compare that output against what the platform's data systems actually generate, with a domain expert who knows what those systems produce.

This works well because statutory text is precise and doesn't require deep contextual judgment to parse. The model is doing structured information extraction, pulling out lists of requirements and organizing them, which it does reliably. Where it breaks down is in interpreting ambiguous definitions, which still requires legal judgment. The model will give you a confident answer about what an ambiguous requirement means; that answer may or may not reflect how the regulator will read the same language.

The real value shows up when a new law overlaps with an existing one. When California AB 587 came out, we were already reporting under EU DSA Article 15. The category frameworks partially overlap but don't map cleanly, AB 587's "disinformation or misinformation" has no direct DSA equivalent, and the DSA's Article 16 notice mechanism has no California analog. Running both statutory texts through this extraction process and then doing a structured diff against existing data infrastructure identified the gaps precisely: what data we already had, what needed to be added, and what required definitional decisions before we could even begin data work.

Cross-jurisdiction gap analysis

A related use case is maintaining a cross-jurisdiction requirements table as regulatory guidance evolves. The DSA Commission published a harmonized reporting template in 2024; when it updated the precision and recall requirements for automated detection tools in the H2 2025 version, I needed to understand specifically what changed relative to the H1 template and whether it affected anything we were already producing.

The workflow: maintain a structured table of requirements by jurisdiction, DSA Article 15, DSA Article 42, AB 587, S895, and so on, with columns for the specific data required and the current infrastructure status. When new guidance publishes, run the new text against the existing table to flag what changed. The model does the comparison; the PM judgment is in evaluating whether flagged changes require engineering work, definitional decisions, or nothing.

This is faster than reading a full updated guidance document against memory of the prior version, and more reliable, the model catches things that are easy to miss in dense regulatory text, particularly changes in measurement methodology or reporting cadence buried in a footnote.

Draft review against a statutory checklist

Before any regulatory report goes to Legal for review, I run a near-final draft against the relevant statutory checklist. The prompt is straightforward: here is the draft report, here is the relevant statute, identify anything in the statute that is not addressed in the report, and flag anything in the report that appears to conflict with the statute's requirements.

This catches omissions that are easy to miss in long documents, a disclosure required under one subsection that got dropped in a revision cycle, or a category that appears in the statute but was inadvertently combined with another in the report. These are the kinds of errors that surface in Legal review and require a full revision, which costs time when you're close to a filing deadline.

It doesn't replace Legal review. The model will miss context-dependent interpretations and won't catch substantive accuracy problems, it can't tell you that the number of appeals reported is wrong, only that the report includes an appeals section. But it's a useful pre-flight check that reduces the probability that Legal review surfaces structural omissions rather than substantive questions.

What doesn't work

Anything requiring judgment about regulatory intent, what a regulator is actually likely to enforce, whether a particular disclosure satisfies a requirement whose language is ambiguous, doesn't work well. The model will generate a confident-sounding answer. That answer may or may not reflect how the regulator reads the same language, and in compliance work, the difference matters.

Similarly, anything requiring knowledge of specific enforcement history, recent commission guidance, or pending rulemaking that postdates the training cutoff is unreliable. The model doesn't know what it doesn't know, and regulatory compliance is an area where acting on out-of-date information creates real risk.

The pattern that's worked: use LLMs for structured text tasks, extraction, comparison, checklist review, where the inputs and outputs are well-defined and a domain expert can verify the output. Don't use them to answer questions about what a requirement means in an enforcement context or how a regulator will respond to a particular disclosure approach. Those require expertise the model doesn't have and can't reliably simulate.