Skip to content

Giskard-AI/giskard-oss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10,542 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

giskardlogo giskardlogo

Evals, Red Teaming and Test Generation for Agentic Systems

Modular, Lightweight, Dynamic and Async-first

GitHub release License Downloads CI Giskard on Discord


Important

Giskard v3 is a fresh rewrite designed for dynamic, multi-turn testing of AI agents. This release drops heavy dependencies for better efficiency while introducing a more powerful AI vulnerability scanner and enhanced RAG evaluation capabilities. For now, the vulnerability scanner and RAG evaluation still rely on Giskard v2. Giskard v2 remains available but is no longer actively maintained. Follow progress → Read the v3 Announcement · Roadmap

Install

pip install giskard

Requires Python 3.12+.

Telemetry: Libraries built on giskard-core (including giskard-checks) may send optional, aggregated usage analytics to help improve the product. No prompts, model outputs, or scenario text are included. See what is collected and how to opt out.


Giskard is an open-source Python library for testing and evaluating agentic systems. The v3 architecture is a modular set of focused packages — each carrying only the dependencies it needs — built from scratch to wrap anything: an LLM, a black-box agent, or a multi-step pipeline.

Status Package Description
✅ Beta giskard-checks Testing & evaluation — scenario API, built-in checks, LLM-as-judge
🚧 In progress giskard-scan Agent vulnerability scanner — red teaming, prompt injection, data leakage (successor of v2 Scan)
📋 Planned giskard-rag RAG evaluation & synthetic data generation (successor of v2 RAGET)

Giskard Checks — create and apply evals for testing agents

pip install giskard-checks

Giskard Checks is a lightweight library for creating evaluations (evals) that test LLM-based systems — from simple assertions to LLM-as-judge assessments. Unlike traditional unit tests, evals are designed for non-deterministic outputs where the same input can produce different valid responses.

Use Giskard Checks to:

  • Catch regressions — verify your system still behaves correctly after changes
  • Validate RAG quality — check if answers are grounded in retrieved context
  • Enforce safety rules — ensure outputs conform to your content policies
  • Evaluate multi-turn agents — test full conversations, not just single exchanges

Built-in evals include string matching, comparisons, regex, semantic similarity, and LLM-as-judge checks (Groundedness, Conformity, LLMJudge).

Quickstart

from openai import OpenAI
from giskard.checks import Scenario, Groundedness

client = OpenAI()

def get_answer(inputs: str) -> str:
    response = client.chat.completions.create(
        model="gpt-5-mini",
        messages=[{"role": "user", "content": inputs}],
    )
    return response.choices[0].message.content

scenario = (
    Scenario("test_dynamic_output")
    .interact(
        inputs="What is the capital of France?",
        outputs=get_answer,
    )
    .check(
        Groundedness(
            name="answer is grounded",
            context="France is a country in Western Europe. Its capital is Paris.",
        )
    )
)

result = await scenario.run()
result.print_report()

The run() method is async. In a script, wrap it with asyncio.run(). See the full docs for Suites, LLMJudge, multi-turn scenarios, and more.

Looking for Giskard v2?

Giskard v2 included Scan (automatic vulnerability detection) and RAGET (RAG evaluation test set generation) for both ML models and LLM applications. These features are not available in v3.

pip install "giskard[llm]>2,<3"

Scan — automatically detect performance, bias & security issues

Wrap your model and run the scan:

import giskard
import pandas as pd

# Replace my_llm_chain with your actual LLM chain or model inference logic
def model_predict(df: pd.DataFrame):
    """The function takes a DataFrame and must return a list of outputs (one per row)."""
    return [my_llm_chain.run({"query": question}) for question in df["question"]]

giskard_model = giskard.Model(
    model=model_predict,
    model_type="text_generation",
    name="My LLM Application",
    description="A question answering assistant",
    feature_names=["question"],
)

scan_results = giskard.scan(giskard_model)
display(scan_results)

Scan Example

RAGET — generate evaluation datasets for RAG applications

Automatically generate questions, reference answers, and context from your knowledge base:

import pandas as pd
from giskard.rag import generate_testset, KnowledgeBase

# Load your knowledge base documents
df = pd.read_csv("path/to/your/knowledge_base.csv")
knowledge_base = KnowledgeBase.from_pandas(df, columns=["column_1", "column_2"])

testset = generate_testset(
    knowledge_base,
    num_questions=60,
    language='en',
    agent_description="A customer support chatbot for company X",
)

RAGET Example

Full v2 docs

👋 Community

We welcome contributions from the AI community! Read this guide to get started, and join our thriving community on Discord.

Follow the progress and share feedback: v3 Announcement · Roadmap

🌟 Leave us a star, it helps the project to get discovered by others and keeps us motivated to build awesome open-source tools! 🌟

❤️ If you find our work useful, please consider sponsoring us on GitHub. With a monthly sponsoring, you can get a sponsor badge, display your company in this readme, and get your bug reports prioritized. We also offer one-time sponsoring if you want us to get involved in a consulting project, run a workshop, or give a talk at your company.

⚠️ 非官方镜像站 · 请勿登录
⚠️ Unofficial mirror · DO NOT LOG IN
🇨🇳 中国大陆访问困难,此站仅作加速镜像,不是官方网站
官方网站:https://github.com/
登录功能不可用,请勿输入密码!

🇺🇸 Due to difficult access from Mainland China, this is an accelerating mirror.
Not the official site. Official: https://github.com/
Login is disabled — NEVER enter your credentials.
🚨 重要提示 · Important Notice
🇨🇳 中国大陆访问困难 – 本镜像站仅用于加速访问 GitHub,不是官方网站
🇺🇸 Due to difficult access from Mainland China, this is an accelerating mirror. NOT the official site.
🔐 请勿登录!不要输入你的 GitHub 账号密码!
DO NOT LOG IN! NEVER enter your GitHub credentials!
📌 官方网站:https://github.com/
🌐 奇廉官网:https://qichao.pages.dev/
ℹ️ 关闭后刷新页面可重新显示此提示