GTM Language Parity

The GTM showcase proves that one signed ANIP contract can generate native services in all supported languages.

This is not a proxy setup. Each language stack is native to that language.

What This Proves

The parity proof has three layers:

Contract parity — every language is generated from the same signed package.
Manifest parity — every generated stack exposes the same public capability set and public manifest semantics.
Behavior parity — each language stack can answer the same GTM scenario bank through the same agent runtime.

The important constraint:

Custom bundles may implement behavior.
Custom bundles must not mutate the signed public contract.

That means a language-specific bundle can fill backend adapters, approval stores, fixtures, and runtime integration seams. It must not change capability IDs, inputs, side-effect posture, composition metadata, input resolution, required scopes, or approval policy.

Canonical Package

The promoted baseline lives in:

examples/showcase/gtm/generated/language-parity/

It includes Python, TypeScript, Go, Java, and C# implementations generated from the same GTM contract.

The package is:

[email protected]

Package path:

examples/showcase/gtm/registry-packages/gtm-pipeline-q2-review-0.4.5.anip-package.json

Expected contract shape:

Property	Expected value
Spec	`anip/0.24`
Package	`[email protected]`
Services	4
Capabilities	23
Languages	Python, TypeScript, Go, Java, C#
Transport	HTTP for compose stacks; stdio can be generated from the same package

The generated language outputs are:

examples/showcase/gtm/generated/language-parity/python/
examples/showcase/gtm/generated/language-parity/typescript/
examples/showcase/gtm/generated/language-parity/go/
examples/showcase/gtm/generated/language-parity/java/
examples/showcase/gtm/generated/language-parity/csharp/

The hand-written Python showcase remains useful as historical/reference material. It is not the parity target. Parity compares generated language outputs from the same contract.

Capability Domains

The GTM contract covers a revenue-operations workflow, not a toy calculator.

Capability groups include:

Pipeline summaries.
Forecast summaries.
Stage bottleneck analysis.
Sales-team performance.
Product pipeline.
Account risk.
Account enrichment.
Outreach drafts.
Follow-up and reassignment preparation.
Prioritized routing and compound scenarios.

The exact capability IDs are defined by the signed service definition and exposed through each generated manifest. Docs should not become a second source of truth for the capability list.

Native Language Output

Each target is generated as a native implementation:

Language	Output posture
Python	Python service code with native runtime/custom bundle seams.
TypeScript	TypeScript service code with native runtime/custom bundle seams.
Go	Go service code with native runtime/custom bundle seams.
Java	Java service code with native runtime/custom bundle seams.
C#	C# service code with native runtime/custom bundle seams.

The non-Python stacks are not HTTP proxies to Python. If a language cannot pass parity without delegating to another implementation, it is not release-quality parity.

Generate From The Package

Generation uses the same package and changes only the target language and bundle:

anip generate \
  --package-bundle examples/showcase/gtm/registry-packages/gtm-pipeline-q2-review-0.4.5.anip-package.json \
  --target typescript \
  --transport http \
  --custom-code-bundle examples/showcase/gtm/custom-code-bundles/gtm_pipeline_typescript \
  --output /tmp/gtm-typescript \
  --force

For release baselines, the committed generated outputs live under examples/showcase/gtm/generated/language-parity/.

Custom Bundle Boundary

The GTM bundles provide implementation material:

fixture-backed GTM data access
backend adapter logic
approval grant handling
actor policy helpers
domain rendering logic
framework-specific service wiring

They must not change the public ANIP contract.

Bad bundle behavior:

changing a composed capability to atomic in the manifest
removing required inputs from the public declaration
changing resolution.mode
weakening approval policy
adding hidden capabilities
changing side-effect posture

Valid bundle behavior:

implementing the declared capability
optimizing internal execution
resolving provider data behind resolver_ref
returning previews or approval-required responses according to the contract
adapting the generated service to language/framework conventions

Run a language stack

examples/showcase/gtm/scripts/smoke-language-compose.sh python
examples/showcase/gtm/scripts/smoke-language-compose.sh typescript
examples/showcase/gtm/scripts/smoke-language-compose.sh go
examples/showcase/gtm/scripts/smoke-language-compose.sh java
examples/showcase/gtm/scripts/smoke-language-compose.sh csharp

Each smoke starts:

Four generated ANIP services.
Agent runtime.
Agent UI route.
Required local dependencies.

It verifies:

Discovery documents.
23-capability union.
Manifest shape.
Runtime JSON.
UI reachability.

The direct compose files are also available:

examples/showcase/gtm/docker-compose.language-parity-python.yml
examples/showcase/gtm/docker-compose.language-parity-typescript.yml
examples/showcase/gtm/docker-compose.language-parity-go.yml
examples/showcase/gtm/docker-compose.language-parity-java.yml
examples/showcase/gtm/docker-compose.language-parity-csharp.yml

Each stack starts:

Postgres loaded with the GTM dataset.
Four generated ANIP services for the selected language.
Metabase for BI inspection.
GTM agent runtime/UI configured against that language's services.

Ports differ by stack so multiple language stacks can be compared without editing service code.

Why this matters

Language parity prevents a common failure mode: one language becomes the real implementation and other languages become wrappers or demos.

The release standard is stricter:

Same contract.
Same capability IDs.
Same public manifest semantics.
Native implementation per language.
Custom bundles fill execution seams only.
No language-specific manifest mutation.

Generator Conformance vs Scenario Tests

Do not rely only on the GTM question bank for generator quality.

Use separate gates:

Gate	Purpose
Generator conformance	Proves generated structure and contract semantics are consistent across targets.
Compose smoke	Proves each generated language stack starts and exposes the expected ANIP surface.
Question banks	Prove the GTM app behavior through realistic user prompts.

The GTM bank is intentionally domain-specific. It is a strong showcase gate, but it is not a generic ANIP conformance suite.

Question-bank validation

The GTM showcase also has LLM-dependent question banks. Those validate planner and runtime behavior against user-facing scenarios.

Use them as release gates, not as the only parity mechanism:

Generator conformance should catch structural drift.
Compose smoke should catch service topology drift.
Scenario/question banks should catch behavior drift.

If a question-bank case fails, first decide whether the contract is missing a generic behavior primitive, the implementation violates the contract, or the test expectation is stale.

Do not patch generic runtime code with GTM-specific phrases.

Question Banks

The release validation surface is:

Bank	Coverage	Size
GTM benchmark suite	Broad behavior, wording variation, converted follow-up cases, and generated two-turn clarification/resolution cases.	540
Hard-mode governance bank	Prompt injection, mixed safe/unsafe intent, actor-boundary pressure, approval bypass attempts, provider-selected targets, negated actions, and multi-turn override handling.	24

The broad phase-bank source used by the benchmark builder lives in:

docs/examples/gtm-showcase/question-banks/

The variation-bank source used by the benchmark builder lives in:

docs/examples/gtm-showcase/variation-question-banks-v3/

Run phase-sized banks while debugging. Use the generated 540-case benchmark suite plus the 24-case hard-mode governance bank as release validation.

Model Configuration

For GTM agent-service testing, use the lightweight test model configured for the bank:

ANIP_AGENT_MODEL=gpt-5.4-mini

That model choice is for generated ANIP service testing. Studio assistant authoring is a different workflow and should use the configured Studio assistant model.

Failure Triage

When parity breaks, triage in this order:

Same contract? Confirm every language was generated from [email protected].
Same capability count? Confirm all five expose the 23-capability union.
Same manifest semantics? Confirm bundle code did not mutate declarations.
Same topology? Confirm the agent runtime points to four service endpoints, not duplicated aliases or stale ports.
Same compact brief? Confirm the runtime sends compact capability briefs, not full manifest JSON, and that every language exposes enough agent-consumption metadata.
Same approval flow? Confirm approval_required responses produce real approval request/grant continuation state.
Same follow-up handling? Confirm clarification follow-ups retain pending capability context rather than replanning from scratch.

If the issue is GTM-specific, fix the GTM bundle or contract. If the issue is generic, fix the generator, shared runtime utilities, or ANIP metadata model. Do not hide drift with language-specific manifest overrides.

Release Standard

The GTM language parity showcase is release-ready only when:

All five language outputs are regenerated from the same signed package.
Manifest structural parity passes across all five.
Compose smoke passes for all five.
The 350-bank passes for all five.
The 140 variation bank passes for all five.
Custom bundles do not mutate public declarations.
Any required app-specific behavior is explicit implementation material, not hidden generic runtime logic.

This is the showcase that proves ANIP is not one runtime with four afterthought ports. It is one contract producing native governed services across the supported language set.

What This Proves​

Canonical Package​

Capability Domains​

Native Language Output​

Generate From The Package​

Custom Bundle Boundary​

Run a language stack​

Why this matters​

Generator Conformance vs Scenario Tests​

Question-bank validation​

Question Banks​

Model Configuration​

Failure Triage​

Release Standard​