special project / cct-vlm

CCT-VLM

A dedicated vision workbench for the Qwen 3.5 sidecar on DGX. Use one image or two images, compare labels, inspect last-token logits, and keep each result as a response card with a stable response id.

Not logged in Sidecar: checking... Mode: chat Example: loading...
Back to chat
guided showcase

Verify the hypothesis, inspect the probabilities, then perturb the image

This page combines a real sidecar workbench with a guided reading flow. Load an example, show the exact instruction, verify entailment/contradiction/neutral, inspect the probability bars, and then test whether the judgment stays stable under prompt edits or Gaussian image noise.

Image A
Hero Image A preview
Image B
Hero Image B preview
1
Load a pair
Start from the SNLI-VE pair or switch to the mock pair.
2
Show the instruction
Keep the actual prompt visible so the evaluation is interpretable.
3
Check faithfulness
Compare the predicted label, its confidence, and the explanation.
4
Add noise or edit the prompt
See what changes and what remains robust.
Current example
Loading example…
Hypothesis
-
Instruction shown to model
-
Current verdict
No run yet
waiting
Run a verification or pipeline step to populate the summary.

Demo flow

Load an e-SNLI style image pair (or the built-in mock images) and run label scoring with Gaussian noise to see stability curves.

Demo buttons set image paths on DGX. Replace them with your e-SNLI paths if needed.
Used for label scoring, comparison, and optional logits-side label scoring.
`return_full_last_token_logits` can be very large. Keep it off unless you are inspecting raw vocab behavior.

Image A

Image A preview

Image B

Image B preview
`compare_images` uses both Image A and Image B. The other modes use only Image A.

Gaussian noise

Noise is applied per image before the sidecar runs. Turn it on to stress-test label stability.

Noise sweep

Sweeps the label scores over multiple Gaussian noise levels and plots the curves.
Ready.

Responses

Every request gets a local `response_id`. Use the cards to inspect output text, label scores, and token/logit distributions.

0 responses

Chat timeline

All tries show up here. Toggle retain context to chain the next prompt.