# Knowledge Base

Add business context to your properties so the AI gives more accurate, relevant analysis.

The knowledge base lets you store contextual information about a property that the AI references during conversations. Instead of getting generic analysis, the AI understands your specific business - your conversion definitions, seasonal patterns, data quirks, and channel setups.

## Why It Matters

The AI models know what a bounce rate is. They don't know that your March traffic spike is from an annual report, that organic traffic from India is mostly bots, or that your team reports on custom channel groupings rather than GA4 defaults.

Without this context, the AI gives technically correct but practically useless answers. With it, the AI factors in your business reality when analysing data.

**Without knowledge base:** "Traffic increased 300% in March, suggesting a successful campaign or viral content."

**With knowledge base:** "The March traffic spike aligns with your annual industry report publication, which typically drives 3-5x traffic for 2-3 weeks. Bounce rate is elevated as expected during this period."

---

## Creating Entries

Navigate to any property and click **Knowledge Base** in the sidebar. Click **Add Entry** to create a new entry.

Each entry has:

- **Title** - A descriptive name (e.g., "March Traffic Spike Explanation")
- **Content** - The full context, written in markdown. Include as much detail as you'd explain to a new analyst joining your team.
- **Summary** - A short description. Click **AI Summary** to generate one automatically from your content.
- **Tags** - Optional labels for organisation. Type a tag and press Enter to add it.
- **Status** - Published entries are available to the AI. Draft entries are hidden from conversations.

---

## What to Put in the Knowledge Base

The most useful entries are things the AI can't learn from the data itself:

### Data Quality Issues

- Bot traffic sources that inflate metrics
- Misconfigured events or tracking gaps
- Known data discrepancies between platforms

### Business Definitions

- What your conversion goals actually mean
- Custom channel groupings and how they differ from defaults
- How you define key metrics internally

### Seasonal Patterns

- Annual events that affect traffic (reports, conferences, product launches)
- Holiday patterns specific to your audience
- Budget cycles that affect paid traffic

### Competitor Context

- Who your competitors are and how you differentiate
- Known competitor activity affecting your metrics (e.g., aggressive PPC bidding)
- Market context the AI wouldn't otherwise know

### Technical Context

- Recent migrations or platform changes
- Known tracking limitations
- How your analytics setup differs from standard implementations

---

## How the AI Uses It

When you chat with the AI, it automatically knows which knowledge base entries exist for your property. Their titles are included in the system prompt, and the AI uses a search tool to retrieve full details when relevant.

You don't need to reference knowledge base entries explicitly. If you ask about traffic patterns and you have an entry about seasonal spikes, the AI will find and use it.

### Search Modes

Entries are searched in one of two ways:

- **Semantic search** - If you have an OpenAI provider configured, entries are embedded as vectors and matched by meaning. A question about "unusual traffic patterns" can match an entry titled "March Traffic Spike" even without shared keywords.
- **Keyword search** - Without an OpenAI provider, entries are matched by words in the title, content, summary, and tags. This works well but requires closer keyword overlap.

The search mode is shown as a small indicator on each entry in the list - a green dot for semantic search, grey for keyword search.

---

## Embeddings

When you publish an entry and your team has an OpenAI provider configured, an embedding is automatically generated in the background. Embeddings enable semantic search - matching by meaning rather than exact keywords.

Embeddings are regenerated automatically when you update an entry's title, content, or summary. Tag-only edits keep the existing embedding.

If you add an OpenAI provider after creating entries, a **Generate All Embeddings** button appears in the header. Click it to backfill embeddings for all published entries.

Entries without embeddings still work - they fall back to keyword search. Both search modes can work together in the same property.

---

## Tips

- **Write for a new team member.** If a new analyst joined your team, what would you need to explain before they could interpret your data correctly? That's what belongs in the knowledge base.
- **Keep entries focused.** One topic per entry works better than a single long document. The AI can search and retrieve individual entries, so specificity helps.
- **Use the AI Summary button.** It generates concise summaries that help the AI quickly assess relevance before reading the full content.
- **Publish when ready.** Use draft status to work on entries without affecting conversations. Only published entries are visible to the AI.

---

## Examples

Here are some example entries to help you get started. You can copy and paste these directly into the knowledge base.

### Seasonal Traffic Pattern

```markdown
**Title:** March Traffic Spike - Annual Industry Report

Every March we publish our annual industry report which drives a
significant traffic spike (typically 3-5x normal levels). This lasts
approximately 2-3 weeks.

Key points:
- Peak usually hits in the second week of March
- Most traffic comes from organic search and social media shares
- Bounce rate during this period is higher than normal (65-70%)
- Conversion rate drops but absolute conversions increase
- Do not compare March metrics to other months without accounting for this

**Tags:** traffic, seasonal, reports
```

### Data Quality Issue

```markdown
**Title:** India Organic Traffic is Mostly Bots

Approximately 80% of organic traffic from India is bot traffic.
This has been consistent since mid-2024.

Characteristics:
- Sessions typically last under 5 seconds
- 95%+ bounce rate from Indian organic traffic
- Primarily hits the homepage and /api/ documentation pages
- Does not convert

When analysing organic traffic, either exclude India entirely or
segment it separately. Total organic traffic numbers are inflated
by roughly 15-20% due to this.

**Tags:** bots, traffic, data-quality
```

### Custom Channel Definitions

```markdown
**Title:** Custom Channel Groupings

We use custom channel groupings that differ from GA4 defaults:

- Partner Traffic = All traffic from affiliate programme partners
  (utm_medium=partner)
- Paid Social = Facebook Ads + LinkedIn Ads
  (utm_medium=paid-social)
- Community = Traffic from our forum, Discord server, and
  community Slack (utm_source containing "community", "discord",
  or "forum")
- Email - Newsletter = Monthly newsletter
  (utm_campaign starting with "newsletter-")
- Email - Automated = Drip campaigns and transactional
  (utm_campaign starting with "auto-" or "drip-")

When reporting on channels, use these groupings rather than
GA4 defaults for accurate attribution.

**Tags:** channels, attribution, ga4
```

### Competitor Context

```markdown
**Title:** Main Competitors and Market Position

Our three main competitors:

1. CompetitorA (enterprise-focused) - We win on pricing and
   ease of use. They dominate branded search for enterprise terms.
2. CompetitorB (similar size) - We compete directly on SEO for
   most non-branded terms. They recently launched a free tier.
3. CompetitorC (newer, VC-funded) - Aggressive on paid ads.
   Their PPC spend has been increasing our CPC on branded terms
   by roughly 20% since January.

When analysing search performance, ranking fluctuations for
commercial terms often correlate with competitor activity rather
than our own changes.

**Tags:** competitors, market, seo
```

---

## FAQs

**Does the knowledge base use my AI tokens?**
Searching the knowledge base is done locally and doesn't cost tokens. The AI Summary feature uses a small number of tokens (it uses the cheapest available model). Embedding generation uses the OpenAI embeddings API at $0.02 per million tokens - a typical entry costs far less than a thousandth of a cent.

**Can team members see each other's entries?**
Yes, the knowledge base is shared across the property. Anyone with read access can view entries, and anyone with write access can create and edit them.

**How many entries can I create?**
There's no hard limit. The knowledge base is designed for focused, contextual entries - typically 5-50 per property. If you find yourself writing hundreds, consider whether some information would be better suited as annotations (for time-specific events) or system prompts (for general AI behaviour).

**Do draft entries affect conversations?**
No. Only published entries are included in the AI's context and search results.
