BACO - Bug Analysis & Cross-reference Orchestrator

A CLI-based security vulnerability scanner that combines static analysis, LLM-powered discovery, and ticket system cross-referencing.
Example Report generated with Regolo.AI models on ins1gn1a/VulnServer-Linux.

Features

Multi-phase scanning: 13+ phases including Indexing → Semgrep → LLM Static Analysis → LLM Discovery → LLM Verification → SecurityAgent Verification → Ticket Cross-Ref → Git Analysis → Cross-File Analysis → Confidence Scoring → AI Aggregation → Reporting → Advanced V3 features (Threat Modeling, CVE Bootstrap, PoC Compilation, Variant Search)
Parallel execution: Semgrep and LLM discovery run concurrently; verification, ticket cross-ref, and git analysis run in parallel
Checkpoint/resume: Automatically saves state after each phase for crash recovery
Multiple output formats: JSON, HTML, SARIF
Config-driven: TOML configuration with environment variable overrides
Prompt customization: Override default LLM prompts per phase via config
Ticket integration: GitHub, GitLab, Bugzilla, Jira support
Cross-file analysis: Traces data flow between files to identify exploitable chains
Composite confidence scoring: Combines multiple signals into a single reliability score

Architecture

Pipeline Phases

Core Pipeline (11 phases):

Indexing: Build file list and call graph
Semgrep: Static analysis with predefined rules
LLM Static Analysis: Independent LLM-based code analysis (uses discovery config)
LLM Discovery: Multi-model vulnerability detection (all configured models analyze each finding)
LLM Verification: Validation with PoC generation and mitigation code
SecurityAgent Verification: Tool-based agent verification using file_read, pattern_search, file_write, run_test to confirm true positives
Ticket Cross-Ref: Search GitHub/GitLab for existing reports
Git Analysis: Check commit history for related fixes
Cross-File Analysis: Trace data flow between files
Confidence Scoring: Calculate composite reliability score
AI Aggregation: Generate executive summary, semantic deduplication, and LLM-enriched descriptions
Reporting: Generate JSON, HTML, and SARIF outputs
Threat Modeling: Generate THREAT_MODEL.md with attack surface analysis
Root Cause Dedup: Deduplicate findings by root cause instead of location
Multi-Verifier: Multiple verification methods with majority voting
Auto-Patching: Generate and validate patches with staging
CVE Bootstrap: Enrich findings with NVD/CISA KEV data
PoC Compiler: Verify PoC code compiles successfully
Variant Search: Search for related vulnerability variants

Data Flow

Config → Indexing → [Semgrep + LLM Static Analysis + LLM Discovery] → [LLM Verification + SecurityAgent Verification + Tickets + Git + Confidence] → Cross-File → AI Aggregation → Reporting → [Threat Modeling, CVE, PoC, Variants] → JSON/HTML/SARIF Output
                         ↑ Checkpoint after each major stage

Installation

cargo build --release
./target/release/baco --version

Usage

1. Create Configuration

cp config.example.toml myproject.toml

Edit myproject.toml:

Set project.path to the target directory
Configure LLM API keys (or use environment variables)
Set up ticket system credentials if needed

2. Run Scan

baco scan --config myproject.toml

Options:

-c, --config <FILE> - Configuration file (required)
-t, --target <PATH> - Override target path from config
-f, --force - Force fresh scan, ignore existing checkpoint

Resume previous scan:

baco scan --config myproject.toml --force

Use --force to start fresh and ignore the checkpoint file.

3. View Results

output/report.html

Configuration

Project Settings

[project]
name = "my-project"
path = "/path/to/target"
languages = ["c", "cpp", "python"]

LLM Configuration

BACO supports single or multiple models per phase. When multiple models are configured, they are used in round-robin fashion to distribute load across different models/providers.

Detailed error logging: When LLM requests fail, BACO reports the HTTP status code, error type (timeout, connection, request, body, decode), and the actual URL for easier debugging.

Single model:

[llm.phases.discovery]
base_url = "https://api.mistral.ai/v1"
api_key = "${MISTRAL_API_KEY}"  # or set env var
model = "mistral-small"

Multiple models:

[llm.phases.discovery]
base_url = "https://api.mistral.ai/v1"
api_key = "${MISTRAL_API_KEY}"
# 'models' takes precedence over 'model' if both are present
models = ["mistral-small", "mistral-medium", "codestral-latest"]

[llm.phases.verification]
base_url = "https://api.qwen.ai/v1"
api_key = "${QWEN_API_KEY}"
model = "qwen35"  # single model

[llm.phases.aggregation]
base_url = "https://api.openai.com/v1"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o", "gpt-4o-mini"]  # multiple models for distributed load

Note: The models array takes precedence over model if both are present. Models are selected in round-robin fashion to distribute load across different providers.

Agent Mode

BACO has two distinct agent modes:

1. Discovery Agent (`agent.enabled = true`)

When enabled, the LLM Discovery phase reads source files directly before analyzing findings:

[agent]
enabled = true
max_turns = 10           # Max conversation turns with tools
tool_timeout_secs = 60   # Timeout for tool execution
keep_artifacts = false   # Keep generated test files

Benefits:

LLM reads actual source code before enriching findings
Uses tools (file_read, pattern_search) for deeper analysis
Provides more accurate vulnerability descriptions with context

2. SecurityAgent Verification (Phase 6)

A separate verification phase that uses an embedded security agent with tools to prove or disprove findings:

file_read: Examine vulnerable code in context
pattern_search: Look for related vulnerability patterns
file_write: Create proof-of-concept test cases
run_test: Execute tests to verify exploitability

The agent automatically removes false positives when tests pass, reducing noise in the final report. This phase runs after LLM Verification and before Ticket Cross-Reference.

Prompt Customization

BACO uses prompt templates for each phase loaded from markdown files at runtime. You can override these via configuration:

Default prompts are stored in prompts/phases/ as markdown files:

prompts/phases/indexing.md
prompts/phases/semgrep.md
prompts/phases/llm_static_analysis.md
prompts/phases/llm_discovery.md
prompts/phases/llm_verification.md
prompts/phases/ticket_crossref.md
prompts/phases/git_analysis.md
prompts/phases/cross_file_analysis.md
prompts/phases/confidence_scoring.md
prompts/phases/ai_aggregation.md
prompts/phases/reporting.md

View the full prompt templates on GitHub to understand default behavior.

Inline override in config.toml:

[llm.phases.prompt_overrides.phases]
llm_static_analysis = """Analyze this %%LANGUAGE%% code for security vulnerabilities.
Focus on: memory safety, injection risks, and insecure API usage.

File: %%FILE_PATH%%
Code:
%%CODE_CONTENT%%
"""

llm_discovery = """Given this finding, determine if it's a true vulnerability:
Title: %%FINDING_TITLE%%
Location: %%FILE_PATH%%:%%LINE_NUMBER%%
Description: %%VULNERABILITY_DESCRIPTION%%
"""

Available template variables:

%%PROJECT_PATH%% - Target project path
%%FILE_EXTENSIONS%% - Detected file extensions
%%LANGUAGES%% - Target languages
%%CODE_CONTENT%% - Code snippet being analyzed
%%LANGUAGE%% - Programming language of the file
%%FILE_PATH%% - File path
%%LINE_RANGE%% - Line numbers
%%FINDING_TITLE%% - Vulnerability title
%%VULNERABILITY_DESCRIPTION%% - Description text
%%FINDINGS_COUNT%% - Total findings count
%%SCAN_DATE%% - Scan date

From external file:

# In config.toml
prompt_overrides = "prompts.toml"

Create prompts.toml:

[phases]
llm_static_analysis = "Your custom prompt here..."
llm_verification = "Your verification prompt..."

Prompts are validated (max 10,000 characters, no null bytes) before use.

Ticket Systems

[[tickets.systems]]
type = "github"
url = "https://api.github.com"
credentials.token = "${GITHUB_TOKEN}"

Output Formats

findings.json: Complete vulnerability data with all 16 fields
report.html: Visual report with severity colors, code snippets, AI summary
report.sarif: SARIF format for CI/CD integration

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
prompts		prompts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
config.example.toml		config.example.toml
example-report.html		example-report.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BACO - Bug Analysis & Cross-reference Orchestrator

Features

Architecture

Pipeline Phases

Data Flow

Installation

Usage

1. Create Configuration

2. Run Scan

3. View Results

Configuration

Project Settings

LLM Configuration

Agent Mode

1. Discovery Agent (`agent.enabled = true`)

2. SecurityAgent Verification (Phase 6)

Prompt Customization

Ticket Systems

Output Formats

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

BACO - Bug Analysis & Cross-reference Orchestrator

Features

Architecture

Pipeline Phases

Data Flow

Installation

Usage

1. Create Configuration

2. Run Scan

3. View Results

Configuration

Project Settings

LLM Configuration

Agent Mode

1. Discovery Agent (agent.enabled = true)

2. SecurityAgent Verification (Phase 6)

Prompt Customization

Ticket Systems

Output Formats

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Discovery Agent (`agent.enabled = true`)

Packages