---
title: "Rolling Out AI Code Search"
subtitle: "Adoption, Measurement, and the 90-Day Plan"
author: "David Kelly Price"
version: "1.0"
date: 2026-03-21
status: draft
type: ebook
target_audience: "Engineering managers and team leads who have decided to adopt AI-powered code search and need the implementation playbook — adoption framework, measurement strategy, and operational setup"
estimated_pages: 60
chapters:
  - "The Champion-Pilot-Rollout Adoption Framework"
  - "Measuring What Matters"
  - "Monorepo and Multi-Team Considerations"
  - "CI/CD Integration and Operational Overhead"
  - "The 90-Day Implementation Plan"
tags:
  - pyckle
  - ebook
  - engineering-management
  - code-intelligence
  - adoption
  - implementation
  - rollout
  - draft
---

<!-- DESIGN & LAYOUT NOTES

Target formats:
- Primary: Markdown (source of truth)
- Export: PDF via Pandoc, web page
- Print-ready: Letter size, 1" margins

Typography:
- Headers: Sans-serif (brand-consistent)
- Body: Serif or clean sans-serif for readability
- Code: Monospace, syntax highlighted, line-numbered where helpful

Color scheme:
- Pyckle brand palette
- Callout boxes use muted background tints, not heavy borders

Callout box types:
- **Try This** — Exercises and hands-on activities
- **Key Insight** — Important concepts worth remembering
- **Warning** — Common mistakes or gotchas

Code blocks:
- Syntax highlighted by language
- Numbered lines for reference in explanatory text
- Copy-pasteable (no line numbers in actual code)

Figures:
- Captioned and numbered (Figure 1, Figure 2, etc.)
- Referenced by number in body text
-->

---

# Rolling Out AI Code Search

## Adoption, Measurement, and the 90-Day Plan

**By David Kelly Price**

Version 1.0 — March 2026

---

## Table of Contents

**Part I: Adoption**
1. The Champion-Pilot-Rollout Adoption Framework
2. Measuring What Matters

**Part II: Operations**
3. Monorepo and Multi-Team Considerations
4. CI/CD Integration and Operational Overhead
5. The 90-Day Implementation Plan

Appendix A: Glossary
Appendix B: Tools & Resources
Appendix C: Further Reading

---

## About This Guide

You have already decided to adopt AI code search. Maybe you ran a proof-of-concept and saw the results. Maybe you read the business case and the numbers convinced you. Maybe your team is drowning in a growing codebase and grep stopped being enough two quarters ago. Whatever brought you here, the question is no longer "should we?" — it is "how do we do this without it dying on the vine?"

This guide is the implementation playbook. It covers how to move from one developer's enthusiasm to team-wide adoption, how to measure whether the tool is actually working, how to handle the operational realities of monorepos and CI/CD integration, and how to execute a 90-day plan that produces measurable results you can present to leadership.

Every chapter produces artifacts — timelines, KPI dashboards, rollout plans, configuration files — that you can use immediately. This is not theory. It is the operational manual.

If you are still evaluating whether AI code search is right for your team, start with the companion book, *The Code Intelligence Buyer's Guide*, which covers the landscape, evaluation frameworks, economics, and security considerations. This guide picks up where that one leaves off.

---

## How to Use This Guide

**Reading order:** Sequential is recommended. The five chapters build on each other — the adoption framework (Chapter 1) produces the team that the measurement chapter (Chapter 2) instruments, which feeds the operational chapters (Chapters 3-4) that the 90-day plan (Chapter 5) ties together. If you are already mid-rollout and need to fix a specific problem, each chapter stands alone well enough to be useful in isolation.

**Exercises:** Each chapter includes a concrete exercise designed to produce a tangible artifact — an adoption plan, a measurement baseline, a service map, a CI/CD workflow, or a project timeline. These are not thought experiments. They take 15-30 minutes and produce outputs you can share with your team, your leadership, or your vendor. By the end of the book, you will have a complete implementation package.

**Prerequisites:** You should have a working understanding of your team's codebase, CI/CD setup, and organizational structure. Familiarity with semantic code search concepts (embeddings, hybrid retrieval, context routing) is helpful but not required — the glossary covers the technical terms. If you need a deeper technical grounding, *The Code Intelligence Buyer's Guide* provides it.

---

# Part I: Adoption

---

## Chapter 1: The Champion-Pilot-Rollout Adoption Framework

### Chapter Overview

Most developer tools die at the individual level. Someone discovers a tool, uses it for two weeks, and nobody else on the team ever touches it. This chapter provides a three-phase adoption framework — Champion, Pilot, Rollout — that turns individual enthusiasm into team-wide adoption. The framework is expanded from the approach outlined in Episode 20 of the Code Intelligence series.

---

### Why Developer Tools Fail to Spread

The adoption failure rate for developer tools is remarkably high. A 2023 survey by SlashData found that the average developer tries four to six new tools per year and adopts one or two. The rest are used briefly and abandoned — not because they do not work, but because the habit change required to integrate them into daily workflow is more expensive than the value they provide in the first two weeks.

The failure pattern is consistent:

1. An enthusiastic developer discovers the tool
2. They use it successfully for their own workflow
3. They tell their teammates about it
4. One or two teammates try it
5. Nobody builds the habit
6. The enthusiastic developer eventually stops using it too, because they are the only one and team workflows have not changed

The problem is not the tool. The problem is that step 3 — telling people — is not an adoption strategy. It is evangelism, and evangelism does not produce habit change.

The Champion-Pilot-Rollout framework replaces evangelism with a structured process that addresses the three real barriers to adoption: proof (does it work?), habit (how do I use it?), and infrastructure (does it stay current?).

---

### Phase 1: Champion (Weeks 1-2)

The champion is the person who has used the tool long enough to have real results. Not "I installed it and it seems cool" — real results. Specific instances where the tool found something that grep could not. A bug caught during code review because semantic search surfaced a cross-cutting concern. A new developer onboarded faster because they could search by concept instead of by filename.

If you are reading this guide, you are probably the champion. Or you are looking for one on your team.

The champion's job is not to evangelize. It is to produce artifacts:

**Artifact 1: The demo query.** A search query that produces a result so clearly superior to grep that it requires no explanation. The ideal demo query searches for a concept ("how do we validate payments") and returns the exact code that implements it — code whose filename and function name share zero words with the query. The vocabulary gap, made visible.

**Artifact 2: The saved patterns file.** A configuration file containing pre-built search patterns tuned to your codebase. "Auth flow." "Payment validation." "Error handling." "Database migrations." These are the queries your team runs repeatedly. Having them pre-configured means the pilot group starts productive immediately instead of staring at a search prompt wondering what to type.

**Artifact 3: The before-and-after.** A concrete example — ideally from a real pull request — where the tool found something that would have been missed otherwise. "I searched for 'session expiration handling' and found that we have three different timeout implementations across two services, two of which conflict." This is the proof that converts skeptics.

> **Warning**
>
> Do not skip the champion phase. If you cannot produce these three artifacts, the tool is not ready for a pilot. Either it does not work well enough on your codebase, or you have not used it long enough to find the compelling use cases. Going to pilot without proof results in a group of people trying a tool that has no demonstrated value, which is worse than never trying it at all.

---

### Phase 2: Pilot (Weeks 3-4)

The pilot group is three to five developers. The composition matters:

- **One senior developer who is skeptical.** If the tool wins over the skeptic, it wins over the team. If it does not, you learn why before you invest in a full rollout.
- **One mid-level developer who is curious.** The early adopter who will use it enthusiastically and provide honest feedback about what works and what does not.
- **One junior developer who is struggling with the codebase.** The person with the most to gain from conceptual search, because they do not yet have a mental model of the codebase.

Setup should take less than five minutes per person. If it takes longer, the tool has an onboarding problem that will kill adoption at scale.

The pilot instruction is one sentence: "When you would normally grep, try the semantic search instead. If it is worse, tell me."

That is it. No training session. No documentation to read. No meeting. Just a substitution pattern and the saved queries file from Phase 1.

The pilot runs for two weeks. During those two weeks, the champion:

- Checks in once at the end of week one ("Have you used it? What happened?")
- Does not ask more than once — nobody likes being nagged about a tool
- Notes any objections or friction points for the debrief

---

### Handling the Four Objections

Every pilot surfaces objections. These are healthy — they mean people are actually trying the tool. Here are the four you will hear and the responses that work:

**"I already know the codebase."** Response: "You know the parts you work on. Search for something outside your area — payment flows if you work on auth, auth flows if you work on payments. See what surfaces." Senior developers benefit most from semantic search on the parts of the codebase they do not touch daily.

**"grep is fine."** Response: "It is, for exact string matches. Try searching for 'authenticate' with grep and then with semantic search. grep will miss `JWTVerifier.validate_claims`. Semantic search will find it." The vocabulary gap is not theoretical — demonstrate it on your codebase.

**"I don't want to learn another tool."** Response: "The interface is one command. You type a question instead of a regex. Setup is one command. Try it for three days and decide." The learning curve for semantic search is genuinely flat.

**"Is my code being sent to the cloud?"** Response: This depends on the tool you chose. For local-first tools: "No. Everything runs on your machine. Nothing leaves unless you explicitly sync." For cloud tools: "Yes, here is the vendor's data handling policy and here is our security team's evaluation." Transparency matters more than the answer.

---

### Phase 3: Rollout (Weeks 5-8)

After two weeks, debrief with the pilot group. The decision criteria are simple:

- **Three or more out of five are using it regularly:** Proceed to rollout.
- **Two out of five:** Investigate why. Fix the friction points. Extend the pilot by one week.
- **One or zero:** The tool is not right for this team, this codebase, or this moment. Shelve it without blame.

Rollout means three concrete changes:

**1. Configuration goes into the repository.** The saved queries file, the indexing configuration, and any project-specific settings are committed to the repo. Every developer gets them on clone. No manual setup beyond the initial index command.

**2. Auto-indexing is configured.** As detailed in Episode 16 of the Code Intelligence series, CI/CD integration keeps the index fresh on every push to main. Nobody runs the index command manually after the first time. Stale indexes erode trust faster than any other failure mode.

**3. Onboarding documentation is updated.** One line in your onboarding guide: "We use semantic search for code navigation. Run `[index command]` after cloning, then try `[search command]` to see available patterns." One line. Not a page. Not a section. One line.

The rollout is additive. It does not replace grep. It does not replace IDE search. It sits alongside them and is better for a specific class of queries — conceptual queries, cross-service searches, "where does this concept live" questions. Developers who prefer grep for exact matches continue using grep. The tool does not need 100% adoption to provide value. As explored in Episode 20, 50% adoption is enough to change how your team navigates code.

---

### The Adoption Curve

Not everyone adopts at the same speed. After the rollout, expect this distribution:

- **Immediate adopters (20%):** They tried it, it clicked, they use it daily.
- **Gradual adopters (50%):** They use it sometimes. They need saved patterns and occasional reminders. They will fully adopt when they see it save them time on a real task — not a demo, a real task during real work.
- **Holdouts (30%):** They stick with grep and IDE search. Do not force it. When they see a teammate find something in 8 milliseconds that took them 10 minutes, they will come around. Or they will not, and the team still benefits from the 70% who did adopt.

The goal is not 100% adoption. The goal is that the team's collective ability to navigate code improves measurably. That happens at 50%.

---

### The Manager's Role in Adoption

As the engineering manager, your role in the adoption process is specific and limited. You are not the champion (that is a developer). You are not the evangelist (nobody should be). You are the enabler — the person who removes organizational obstacles and provides air cover.

**Provide the time.** The champion needs 2-3 hours during the champion phase to build artifacts. The pilot group needs zero extra time — they substitute semantic search for grep during their normal work. But if the team is in crunch mode, no developer will try a new tool. Choose a pilot timing that avoids sprint deadlines, major releases, and organizational upheaval.

**Remove procurement friction.** If the tool requires a purchase, own the procurement process. Do not make the champion navigate budget approvals, legal review, or vendor onboarding. The champion's job is to prove the tool works technically. Your job is to prove it works organizationally.

**Set expectations with leadership.** Before the pilot starts, brief your skip-level manager: "We are piloting a code intelligence tool for two weeks with five developers. If successful, we will roll out to the full team. I will present results at day 60." This pre-briefing prevents the pilot from being derailed by a surprised director who hears about it secondhand and asks "why are we spending time on this?"

**Do not mandate.** The fastest way to kill adoption is to make it mandatory. Developers who are forced to use a tool resent it, and resentment produces compliance without engagement. The tool gets run but not used. The queries are logged but not useful. Let adoption happen organically after the infrastructure (auto-indexing, saved patterns) is in place. If the tool is genuinely better for conceptual queries, developers will adopt it. If it is not, forcing it will not change that.

**Celebrate concrete wins.** When a developer finds a bug using semantic search that grep would have missed, mention it in standup. Not as a tool advertisement — as a recognition of the developer's work. "Alice found three inconsistent timeout implementations across our services using a cross-service search. That is a great catch." The tool is mentioned implicitly. The developer is celebrated explicitly. This is more powerful than any demo.

---

### Reporting Adoption to Leadership

If you need to justify the investment upward, speak their language:

**Token cost savings.** Every query answered locally is a query that does not burn API tokens. If your team makes 50 searches per day that would otherwise go to an LLM with full-file context, the token savings are measurable (see *The Code Intelligence Buyer's Guide* for the formula).

**Code review throughput.** Reviewers using semantic search to prep for reviews find cross-cutting concerns faster. Track review turnaround time before and after adoption.

**Onboarding acceleration.** A new developer with semantic search is self-sufficient on code navigation from day one. Every question they answer with a search instead of an interruption is time saved for the senior who would have answered it.

---

### Exercise

> **Try This**
>
> Create an adoption plan document with three sections:
>
> 1. **Champion artifacts:** List your demo query, three saved search patterns for your codebase, and one before-and-after example from a real PR or debugging session.
> 2. **Pilot group:** Name three to five developers from your team, with their role (skeptic, curious, junior). Write the one-sentence pilot instruction.
> 3. **Rollout checklist:** List the three concrete changes (config in repo, auto-indexing, onboarding doc update) with owners and target dates.
>
> Share this document with your manager and the pilot group. This is your adoption roadmap.

---

### Key Takeaways

- Most developer tools fail to spread because telling people is not an adoption strategy
- The Champion phase produces proof: a demo query, saved patterns, and a before-and-after example
- The Pilot phase tests with 3-5 developers for two weeks; the instruction is one sentence
- Rollout means three changes: config in repo, auto-indexing in CI, one line in onboarding docs
- The goal is 50% adoption, not 100% — that is enough to change how the team navigates code

---

## Chapter 2: Measuring What Matters

### Chapter Overview

Adoption without measurement is anecdote. This chapter defines the KPIs that tell you whether code intelligence is actually working for your team, how to instrument them, and how to avoid the vanity metrics that look good in a report but mean nothing for productivity.

---

### The Measurement Trap

The easiest metrics to track are the ones that matter least. Logins. Installations. Number of queries. These are activity metrics — they tell you the tool is being used, not whether it is useful.

A developer who runs 50 queries and finds nothing useful has high activity and zero value. A developer who runs three queries and solves a two-day-old bug has low activity and enormous value. If your dashboard only shows activity, you cannot tell the difference.

The metrics that matter for code intelligence are outcome metrics: did the tool help developers find what they needed, faster, more accurately, with less effort? Measuring outcomes is harder than measuring activity, but it is the only measurement that correlates with the business case you used to justify the investment.

---

### The Four KPIs

**KPI 1: Time-to-Context**

Time-to-context measures how long it takes a developer to go from "I need to understand this" to "I understand this." It is the single most important metric for code intelligence because it captures the entire search problem in one number.

How to measure:
- **Before adoption:** Conduct a context cost audit as a baseline. Average time spent per search episode (from initial query to finding the relevant code).
- **After adoption:** Repeat the audit two weeks after rollout and monthly thereafter.
- **Target:** A 30-50% reduction in time-to-context within 30 days of rollout.

Time-to-context is a team average. Individual numbers will vary wildly based on the developer's familiarity with the codebase, the complexity of the search, and the quality of the query. The team average smooths these variations and shows the trend.

> **Warning**
>
> Do not measure time-to-context by watching developers or asking them to self-report in real time. This introduces observer bias and Hawthorne effects. Instead, use the audit approach: ask developers to retrospectively estimate search times for recent tasks. The estimates are less precise but less biased.

**KPI 2: Search Accuracy (First-Result Relevance)**

Search accuracy measures whether the top result for a query is actually what the developer needed. A search tool that returns ten results where the relevant one is at position seven is worse than a tool that returns three results where the relevant one is at position one — even though the first tool "found" it.

How to measure:
- **Implicit signal:** If the developer clicks the first result and stays in that file, the first result was relevant. If they click the first result, return to search, and try a different query, it was not.
- **Explicit signal:** A thumbs-up/thumbs-down on search results, if the tool supports it.
- **Batch evaluation:** Monthly, take 20 recent queries and have a senior developer evaluate whether the top result was the right one. Calculate the percentage. This is your first-result relevance rate.
- **Target:** 70%+ first-result relevance for semantic queries (keyword queries should be near 100%).

**KPI 3: Developer Velocity Proxy**

Developer velocity is notoriously difficult to measure directly. Lines of code, commits per day, and story points completed are all gameable and misleading. But relative changes in velocity — before and after adopting a tool — are meaningful if the measurement methodology is consistent.

How to measure:
- **Cycle time:** Time from first commit to merged PR. Track the median, not the mean (outliers skew the mean). A reduction in cycle time after adoption suggests less time spent searching during development.
- **Review turnaround:** Time from PR opened to first review comment. If reviewers use semantic search to understand PRs faster, this number should decrease.
- **Onboarding ramp:** Time from new developer's first day to their first production PR. If semantic search helps new developers navigate the codebase faster, this number should decrease.
- **Target:** 10-20% improvement in cycle time within 60 days.

These are proxy metrics. They do not prove that code intelligence caused the improvement. They are consistent with the improvement being caused by code intelligence, and in the context of a deliberate rollout, the causal story is credible.

**KPI 4: Cost per Query**

Cost per query is the economic metric that matters most to budget owners.

How to measure:
- **Token tracking:** If your AI assistant tracks token usage, divide total tokens by total queries to get average tokens per query. Multiply by your per-token rate.
- **Before and after:** Compare average cost per query before adoption (context flooding) to after adoption (semantic routing). The delta is your actual token savings.
- **Target:** 90%+ reduction in tokens per query when switching from context flooding to semantic routing.

---

### Instrumenting the Metrics

Metrics are useless if they require manual collection. The goal is automated tracking that produces a dashboard without developer effort.

**Query logging.** The code intelligence tool should log every query, its latency, the number of results, and which result the developer selected. This data feeds KPIs 1 and 2.

**CI/CD integration metrics.** Auto-indexing logs from Chapter 4 feed operational metrics: indexing time, index size, freshness. These are health metrics, not productivity metrics, but they catch problems before they affect the team.

**Retrospective surveys.** Once per month, a three-question survey to each developer:

1. "In the past week, did the tool help you find something you would have struggled to find otherwise?" (Yes/No + optional example)
2. "How many times did you fall back to grep because the tool's results were not useful?" (Numeric)
3. "What is your biggest friction point with the tool?" (Free text)

The survey takes 60 seconds. The qualitative data is often more actionable than the quantitative data. A "yes" with a specific example is worth more than any metric.

> **Key Insight**
>
> The most important signal is not any single metric — it is the ratio of semantic search queries to grep queries over time. If this ratio is increasing, the tool is replacing grep for the queries where it is superior. If the ratio is flat or decreasing, the tool is not earning its place in the workflow.

---

### Avoiding Vanity Metrics

Metrics that look good in a quarterly review but do not correlate with productivity:

**Total queries.** A high number of queries might mean the tool is being used. It might also mean the tool is producing bad results and developers are re-querying.

**Number of indexed files.** A large index means nothing if the search results are poor. A small, well-tuned index with high-quality results is more valuable than a large, noisy index.

**"Time saved" (self-reported).** Developers overestimate time saved because they remember the searches that worked and forget the ones that did not. Use the time-to-context audit instead.

**User satisfaction scores.** Satisfaction does not equal productivity. A tool can be satisfying to use (nice UI, fast responses) without actually improving outcomes. Measure outcomes, not feelings.

---

### Measurement Dysfunction and How to Prevent It

Measurement, done wrong, causes the problems it is supposed to solve. Three common dysfunction patterns:

**Dysfunction 1: The metric becomes the target.** If you tell developers "we're tracking queries per day," some developers will run unnecessary queries to hit an implicit target. This is Goodhart's Law in action. The fix: never set targets on activity metrics. Set targets on outcome metrics (time-to-context, first-result relevance) that cannot be gamed without actually improving.

**Dysfunction 2: Surveillance perception.** If developers feel that their search behavior is being monitored for performance evaluation, they will change their behavior — not in the direction you want. They will avoid the tool, avoid asking questions, and avoid appearing slow. The fix: aggregate all metrics at the team level, never the individual level. The question is "is the team finding code faster," not "is developer X searching too much."

**Dysfunction 3: Over-measurement.** Tracking 15 metrics instead of four dilutes attention. Nobody reads a dashboard with 15 charts. Everyone reads a dashboard with four. The fix: track the four KPIs above. Everything else is optional and should only be added if a specific question arises that the four KPIs cannot answer.

> **Warning**
>
> Never tie code intelligence metrics to individual performance reviews. The moment a developer believes that their search behavior is being evaluated, the data becomes worthless. Measurement exists to evaluate the tool, not the developer.

---

### Building the Dashboard

A practical dashboard for code intelligence measurement has four sections, matching the four KPIs:

**Section 1: Time-to-Context Trend.** A line chart showing the team's average time-to-context by month, starting from the baseline. The chart should show a clear downward trend if the tool is working. If the line is flat or rising, something is wrong — stale indexes, poor adoption, or the tool does not fit the team's query patterns.

**Section 2: First-Result Relevance.** A bar chart showing monthly first-result relevance rate from the batch evaluation. Target line at 70%. Below 70% consistently means the embedding model or ranking pipeline needs attention.

**Section 3: Adoption Health.** Two numbers: daily active users (developers who ran at least one semantic search) and the semantic-to-grep ratio. These tell you whether the tool is part of the workflow or gathering dust.

**Section 4: Cost Efficiency.** A comparison bar showing tokens per query (before vs. after) and the cumulative token cost savings. This is the chart you show to the budget owner.

The dashboard should update automatically from tool telemetry. If the tool does not export telemetry, the monthly survey and quarterly audit are your fallback.

---

### The Measurement Calendar

| Frequency | Metric | Method |
|-----------|--------|--------|
| Continuous | Query logging (latency, results, selection) | Automated tool telemetry |
| Weekly | Semantic-to-grep ratio | Automated or weekly count |
| Monthly | First-result relevance (batch evaluation) | 20-query sample, senior review |
| Monthly | Developer survey (3 questions) | Survey tool |
| Quarterly | Time-to-context audit | Team-wide time study |
| Quarterly | Cycle time, review turnaround, onboarding ramp | Engineering metrics platform |
| Annually | Token cost comparison (actual vs. baseline) | Finance data |

*Figure 1: Measurement calendar for code intelligence KPIs.*

---

### Exercise

> **Try This**
>
> Create a measurement baseline before your pilot begins (or now, if you have already adopted a tool). In a spreadsheet:
>
> 1. Record the current team median for: cycle time, review turnaround, and onboarding ramp (if you have had a recent new hire)
> 2. Conduct a one-day time-to-context sample: ask three developers to log their search times for one day
> 3. Estimate your current token cost per query (see *The Code Intelligence Buyer's Guide* for the formula)
>
> Save this spreadsheet. You will compare against it at 30, 60, and 90 days post-adoption.

---

### Key Takeaways

- Activity metrics (queries, logins) do not measure whether the tool is useful — only whether it is used
- The four KPIs that matter: time-to-context, search accuracy, developer velocity proxies, and cost per query
- The semantic-to-grep ratio is the single best leading indicator of whether the tool is earning its place
- Monthly three-question surveys provide actionable qualitative data that numbers miss
- Measure a baseline before adoption — without a before, the "after" is meaningless

---

# Part II: Operations

---

## Chapter 3: Monorepo and Multi-Team Considerations

### Chapter Overview

Monorepos and multi-team codebases break naive code intelligence implementations in predictable ways. This chapter covers what breaks, why it breaks, and how to configure code intelligence tools to handle team boundaries, naming collisions, and scale — drawing from the detailed technical treatment in Episode 18 of the Code Intelligence series.

---

### What Changes at Scale

A code intelligence tool that works brilliantly on a single-service, single-team codebase may produce mediocre results on a monorepo. This is not a quality problem — it is a structural problem. Understanding the structure is necessary to fix the results.

Three things break when a codebase crosses the 10,000-file threshold in a monorepo:

**Naming collisions.** Every service has a `handle_request`. Every service has a `config.py`. Every service has a `models/` directory. When a developer searches for "request handler," the results come from eight services. The one they care about is buried at position six.

This is not a search quality problem. All eight results are semantically relevant to "request handler." The problem is that the search engine has no way to distinguish between relevant-to-the-developer and relevant-to-the-query. In a monorepo, these are different things.

**Cross-service boundary blur.** The payment service imports from the shared library, which is also imported by the notification service. Semantic similarity between these services is high because they share vocabulary — the same functions, the same types, the same patterns. The search engine cannot tell which service the developer is working in, so it returns results from all services with similar scores.

**Flat score distributions.** In a single-service codebase, the top result might score 0.92 and the fifth result 0.71 — clear separation. In a monorepo, the top five might score 0.88, 0.86, 0.85, 0.84, 0.83. The signal-to-noise ratio drops because there are more genuinely similar chunks competing for the top slots. The developer has to read five results instead of one to find what they need.

These are not problems you solve by making the search engine "smarter." They are problems you solve by giving the search engine structural information about the codebase.

---

### Service-Aware Configuration

The fix for naming collisions and boundary blur is service-aware indexing: telling the tool about your monorepo's service boundaries so it can partition the search space and apply context-dependent ranking.

The configuration is typically a mapping of service names to directories:

```
payment = "services/payment/"
auth = "services/auth/"
notification = "services/notification/"
gateway = "services/gateway/"
shared = "libs/shared/"
```

This configuration does two things:

**Partitioned indexing.** Each service gets its own sub-index — a partition of the embedding space that can be searched independently. Cross-service search is still possible, but the partitioning enables service-scoped search when the developer wants it.

**Context-dependent ranking.** When a developer searches from within a service directory, results from that service receive an automatic relevance boost. The boost is small — typically 5-15% — but it is enough to resolve the flat score distribution problem. A result from the developer's current service that scores 0.81 with a +0.12 boost outranks a result from another service that scores 0.85 without the boost.

The impact on a real 12,000-file monorepo, as documented in Episode 18:

| Metric | Without service scope | With service scope |
|--------|----------------------|-------------------|
| Top-1 accuracy | 71% | 81% |
| Top-3 relevance | 84% | 89% |
| Query latency (p50) | 8ms | 8ms |

The latency is unchanged because service-aware ranking is a query-time operation, not an indexing-time one. The accuracy improvement is significant — 10 percentage points on top-1 accuracy means substantially fewer searches where the developer has to scan past irrelevant results.

---

### Team Boundaries and Ownership

In organizations with multiple teams working in the same monorepo, service boundaries and team boundaries often align. The payment team owns the payment service. The platform team owns the shared libraries. The infrastructure team owns the deployment configuration.

Code intelligence tools can leverage this alignment in three ways:

**Ownership-scoped search.** When a developer searches within their team's services, they get results from their team's code by default. Cross-team results appear with lower ranking, surfacing only when they are substantially more relevant than in-team results.

**Shared query patterns per team.** Each team maintains its own set of saved search patterns, reflecting the concepts and vocabulary that matter to their domain. The payment team's patterns include "charge flow," "refund logic," and "subscription renewal." The auth team's patterns include "token validation," "session expiration," and "OAuth callback." These patterns are checked into the repo alongside the team's code, so new team members get them automatically.

**Cross-team discovery.** When a developer needs to understand code outside their team's ownership, explicit scope switching enables cross-team search without the noise of unscoped search. The developer searches the auth team's code intentionally, not accidentally.

For managers, the key insight is that service-aware configuration is not just a technical optimization — it is an organizational one. It respects team boundaries while enabling cross-team discovery. The configuration should mirror your team structure because the search problem is fundamentally about "find the code my team is responsible for" vs. "discover code another team owns."

---

### Multi-Language Monorepos

Many monorepos span multiple programming languages: a Python backend, a TypeScript frontend, a Go infrastructure service, a Java data pipeline. Each language has different naming conventions, different file structures, and different levels of semantic similarity.

The challenge for code intelligence is that a query like "user validation" might return results from all four languages, ranked by generic semantic similarity. The Python validator scores 0.87. The TypeScript form sanitizer scores 0.85. The Go input checker scores 0.83. The Java DTO validator scores 0.81. All are relevant. None is what the developer needs if they are working in Python.

The solution is the same as for service-aware search: partition by language (or by service, which often correlates with language) and apply context-dependent ranking. A developer working in a Python file gets Python results boosted. A developer working in a TypeScript file gets TypeScript results boosted.

For managers evaluating tools for multi-language monorepos, ask:

1. Does the tool support language-aware ranking?
2. Does the embedding model handle multiple languages, or does it use a separate model per language?
3. How does the tool handle cross-language searches (e.g., "how does the frontend call the backend API")?

Cross-language search is a genuinely hard problem. A tool that handles it well — surfacing the TypeScript frontend component that calls the Python backend endpoint — is significantly more valuable in a multi-language monorepo than a tool that treats each language as an isolated silo.

---

### Cross-Repository Search

Not every organization uses a monorepo. Many have dozens or hundreds of separate repositories, each containing a service, library, or application. The search problem here is different from the monorepo problem but equally important: a developer working in the API service needs to understand how the shared library handles serialization, and that code lives in a different repository.

Cross-repo search requires a unified index that spans multiple repositories. The implementation varies:

**Centralized index.** A single index is built from multiple repositories, with metadata tagging each chunk to its source repository. Queries search the unified index, and results indicate which repo they came from. This is the simplest approach and works well for organizations with fewer than 20 repositories.

**Federated index.** Each repository maintains its own index. A query fans out to multiple indexes and merges the results. This scales better but adds complexity in ranking — how do you compare similarity scores across indexes that may have been built with different configurations?

**Selective indexing.** Rather than indexing every repository, the team identifies the five to ten repositories that are most frequently cross-searched and indexes those. This is the pragmatic middle ground for organizations that have 50+ repositories but regularly search across only a subset.

For managers, the key question is: "How often do our developers need to search code that lives outside the repository they are working in?" If the answer is "rarely," single-repo search is sufficient. If the answer is "multiple times per day," cross-repo search is a significant productivity lever.

---

### Governance and Configuration Management

In large organizations, code intelligence configuration becomes a governance question. Who decides the service boundaries? Who maintains the saved search patterns? Who approves changes to the indexing configuration?

Three governance models:

**Centralized governance.** A platform team or developer experience team owns the code intelligence configuration for the entire organization. They define service boundaries, maintain shared search patterns, and manage the CI/CD indexing pipeline. This works well for organizations with strong platform teams and consistent practices across teams.

**Federated governance.** Each team owns the configuration for their services. The platform team provides the infrastructure (CI/CD pipeline, cloud sync) and sets standards (naming conventions for patterns, minimum index freshness), but individual teams define their own service boundaries and patterns. This works well for organizations with autonomous teams and diverse practices.

**Organic governance.** No formal governance. Teams adopt and configure code intelligence independently. This works for small organizations (fewer than 30 developers) where coordination costs are low and practices naturally converge.

The right model depends on your organization's size and culture. The important thing is to choose explicitly rather than defaulting to organic governance and discovering, six months later, that five teams have five incompatible configurations.

---

### Scale Considerations for Managers

As the codebase grows, several operational concerns become relevant:

**Index size.** A 10,000-file codebase produces an index of roughly 300-500 MB. A 50,000-file codebase produces 1.5-2.5 GB. This needs to fit on developer machines (or be streamed from a team server). Ensure the tool supports compressed indexes and incremental updates.

**Indexing time.** Initial indexing of a 50,000-file codebase takes 2-5 minutes on a modern laptop. Incremental indexing (after a git pull or merge) takes seconds. If initial indexing takes longer than 10 minutes, adoption will suffer because the first experience is slow.

**Query latency at scale.** Latency should remain under 50 milliseconds for any codebase size. Tools that use hierarchical indexing (coarse file-level search followed by fine chunk-level search) maintain consistent latency as the codebase grows. Tools that search linearly across all chunks will degrade.

**Deduplication.** Monorepos share code through internal libraries, generated files, and utility functions. Good tools deduplicate at the chunk level — identical functions are embedded once, regardless of how many files contain them. This reduces index size by 15-25% in typical monorepos.

| Codebase size | Expected index size | Initial index time | Incremental index time | Query latency (p50) |
|--------------|--------------------|--------------------|----------------------|-------------------|
| 1,000 files | 15-30 MB | 3-5 seconds | <1 second | <5 ms |
| 10,000 files | 300-500 MB | 30-60 seconds | 2-5 seconds | 5-10 ms |
| 50,000 files | 1.5-2.5 GB | 2-5 minutes | 5-15 seconds | 10-20 ms |
| 100,000+ files | 3-5 GB | 5-15 minutes | 15-30 seconds | 15-50 ms |

*Figure 2: Expected scale characteristics for code intelligence tools.*

---

### Exercise

> **Try This**
>
> Create a service map of your monorepo (or largest repository). In a document or diagram:
>
> 1. List every service or major component with its directory path
> 2. Identify the owning team for each service
> 3. Map shared libraries and which services depend on them
> 4. Count the total files per service
>
> This map is the input for service-aware configuration. If you do not have a monorepo, map the repositories your team works on most frequently and identify which ones would benefit from cross-repo search.

---

### Key Takeaways

- Monorepos break naive search through naming collisions, boundary blur, and flat score distributions
- Service-aware configuration partitions the search space and applies context-dependent ranking
- A 10-percentage-point improvement in top-1 accuracy translates to substantially fewer wasted searches
- Service boundaries should mirror team boundaries — the search problem is organizational, not just technical
- Scale considerations (index size, indexing time, query latency) are predictable and manageable up to 100K+ files

---

## Chapter 4: CI/CD Integration and Operational Overhead

### Chapter Overview

A code intelligence tool is only as good as the data it searches. If the index is stale, the results are stale, and developers stop trusting the tool. This chapter covers how to automate index freshness through CI/CD integration, what the operational cost looks like, and how to keep the overhead near zero — building on the CI/CD approach detailed in Episode 16 of the Code Intelligence series.

---

### The Staleness Problem

An index that is three days old is a historical document, not a search tool. It does not know about the new service your teammate added Tuesday. It does not reflect the refactored validator from Wednesday's PR. It returns results that are technically correct but practically misleading — the function exists, but its signature changed, or it moved to a different module, or it was deleted entirely.

Trust erodes fast. One stale result makes a developer doubt the next accurate one. Two stale results and they stop using search and go back to grep. The entire pipeline — semantic embeddings, hybrid retrieval, cross-encoder reranking — is worthless if the underlying data is old.

For managers, the staleness problem is the single most likely adoption killer. A tool that works perfectly but serves stale data is worse than no tool at all, because it wastes time with false confidence. The developer trusts the result, acts on it, discovers it is stale, and loses trust not just in the result but in the tool.

The fix is straightforward: reindex on every push to main.

---

### The Auto-Indexing Pipeline

Auto-indexing integrates code intelligence into your existing CI/CD pipeline. On every push to main (or every merge to main, depending on your workflow), the pipeline runs an incremental index and distributes the result to the team.

A complete GitHub Actions workflow:

```yaml
name: Reindex Codebase

on:
  push:
    branches: [main]
    paths-ignore:
      - '*.md'
      - '.github/**'

jobs:
  reindex:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - run: pip install code-mcp

      - run: pyckle index . --incremental

      - uses: actions/upload-artifact@v4
        with:
          name: pyckle-index
          path: .pyckle/
          retention-days: 7
```

This is the entire file. Save it, push it, and every merge to main triggers a fresh incremental index. The `--incremental` flag means only files that changed since the last index are reprocessed, keeping the step fast even as the codebase grows.

For teams using cloud sync, add one step after the index:

```yaml
      - run: pyckle sync --push
        env:
          PYCKLE_API_KEY: ${{ secrets.PYCKLE_API_KEY }}
```

With cloud sync, every developer's searches hit the fresh index automatically. No manual download. No stale results.

---

### Distribution Models

The fresh index is built in CI. Now it needs to reach developer machines. Three distribution models, each with different trade-offs:

**Model 1: Artifact download.** Developers pull the latest index using the CI platform's CLI. Simple, no additional infrastructure. The downside is that it is manual — developers have to remember to pull after each merge.

**Model 2: Cloud sync.** The CI step pushes the index to a cloud service. Developer machines pull the update automatically in the background when any search runs. Zero manual steps. The downside is cloud dependency for the sync feature (but search still works locally with whatever index is cached).

**Model 3: Git LFS.** The index is tracked as a Git LFS object and committed on each CI run. Developers get the fresh index on `git pull`. The downside is repository size management and LFS complexity.

| Distribution model | Manual steps | Additional infra | Freshness guarantee |
|-------------------|-------------|-------------------|-------------------|
| Artifact download | Pull after each merge | None | Only when pulled |
| Cloud sync | None | Cloud API account | Automatic, continuous |
| Git LFS | `git pull` | LFS storage | On every pull |

*Figure 3: Index distribution models compared.*

For teams of 10-30, cloud sync is the recommended approach. The zero-manual-step guarantee is worth the modest cloud dependency. For teams in air-gapped environments, artifact download or Git LFS is the fallback.

---

### Branch-Specific Indexing

For teams that work on long-lived feature branches, the question arises: should each branch have its own index?

On long-lived branches (two weeks or more), the codebase can diverge significantly from main. A developer on `feature/new-checkout` who searches against the main branch index will get results that do not reflect their current code. The checkout service they added does not appear. The validator they renamed still shows the old name.

Branch-specific indexing solves this by extending the CI workflow to index feature branches:

```yaml
on:
  push:
    branches:
      - main
      - 'feature/**'
```

Combined with cloud sync, each branch gets its own index. The developer's searches reflect their reality, not someone else's.

The cost is additional CI minutes. For a team with 5-10 active feature branches, this adds 5-10 indexing runs per day. At 1 minute each, it is 5-10 minutes of CI time — well within free-tier limits and negligible on paid plans.

---

### What It Costs

The operational cost of auto-indexing is minimal because incremental indexing is fast:

| Metric | Value |
|--------|-------|
| Typical incremental index time | 5-15 seconds |
| CI minimum billing unit | 1 minute |
| Pushes to main per day (small team) | 10-20 |
| Daily CI minutes consumed | 10-20 minutes |
| Monthly CI minutes consumed | 300-600 minutes |
| GitHub Actions free tier | 2,000 minutes/month |
| Cost on free tier | $0 |
| Cost on paid tier ($0.008/min) | $2.40-$4.80/month |

*Figure 4: Auto-indexing cost model for a small team.*

For most teams, auto-indexing fits entirely within the free tier of their CI platform. Even at scale — 100 pushes per day on a large team — the cost is a rounding error in the CI bill. The cost of not indexing (stale results, lost trust, adoption failure) is harder to quantify but easy to feel.

---

### Monitoring Index Health

Auto-indexing should be set-and-forget, but operational reality means things break. CI credentials expire. The index binary changes format with a tool update. The codebase grows past an index-time threshold.

Monitor three signals:

**1. Index age.** Track the timestamp of the most recent successful index. If the index is more than 24 hours old during active development, something is wrong — either the CI job is failing or nobody is pushing to main (which is a different problem).

**2. Indexing duration.** Track how long the incremental index takes. A sudden spike (from 10 seconds to 3 minutes) indicates a large change, a full reindex trigger, or a performance regression in the tool. Set an alert at 2x the rolling average.

**3. CI job success rate.** The reindexing job should succeed on virtually every run. A failure rate above 5% means the job needs attention — dependency issues, timeout problems, or configuration drift.

These signals are straightforward to implement as CI notifications. Most teams will never see an alert because the indexing step is simple and deterministic. But the monitoring exists for the one time it matters.

---

### The Complete Setup

After implementing auto-indexing, a team's code intelligence infrastructure looks like this:

1. **Local development:** The tool watches for file changes and updates the local index in real time (or near-real time). The developer always searches against their current working state.

2. **CI/CD:** Every push to main (and optionally to feature branches) triggers an incremental index. The fresh index is distributed to the team via cloud sync, artifact download, or Git LFS.

3. **Team sync:** Developers receive fresh indexes automatically. No manual steps. No stale results.

4. **Search:** Every query hits an index that reflects the current state of the code. The trust problem is solved at the infrastructure level, not at the individual level.

This is the operational foundation that makes everything else in this guide work. Without fresh data, the best search algorithm in the world produces results that mislead.

---

### Operational Runbook

When something goes wrong with auto-indexing, the symptoms are indirect. Nobody gets an alert that says "index is stale." Instead, a developer searches for code that was added yesterday and gets no results, or searches for a function that was renamed and gets the old name. The developer does not know the index is stale — they conclude the tool is bad.

Here is the troubleshooting runbook for the three most common operational issues:

**Issue 1: CI job is failing silently.**

Symptom: Index has not updated in 48+ hours during active development.

Diagnosis: Check your CI platform's job history. Look for failures in the reindex workflow. Common causes: dependency installation failure (pip version mismatch, package removed from PyPI), runner configuration change (Python version no longer available), or repository secrets expired (API key for cloud sync).

Fix: Re-run the failed job manually to see the error. Fix the root cause. Set up CI notifications (Slack, email) for job failures so you catch them immediately rather than discovering them when a developer complains.

Prevention: Pin dependency versions in the CI workflow. Use a lock file. Set a calendar reminder to rotate API keys before they expire.

**Issue 2: Index is building but results are stale.**

Symptom: The CI job succeeds, but developers still see old results.

Diagnosis: The index is building but not reaching developer machines. If using artifact download, nobody is downloading. If using cloud sync, the sync step might be failing or the client-side pull might be misconfigured. If using Git LFS, the push-back step might be silently failing.

Fix: Verify the distribution path end-to-end. Build the index in CI, confirm it reaches the distribution medium (artifact, cloud, LFS), then confirm a developer machine can pull it. The gap is almost always in distribution, not in indexing.

**Issue 3: Indexing time is increasing.**

Symptom: The reindex CI step that used to take 10 seconds now takes 2 minutes.

Diagnosis: Either the codebase has grown significantly (expected, manageable) or the `--incremental` flag is not working correctly and the tool is doing a full reindex on every push. Check the tool's output for "incremental" vs. "full" indicators.

Fix: If the codebase grew, the time increase is proportional and expected. If the tool is doing unnecessary full reindexes, check the index cache — it may have been invalidated by a configuration change, a tool update, or a corrupted cache file.

---

### Multi-CI Platform Considerations

Not every team uses GitHub Actions. The auto-indexing concept is platform-independent, but the implementation details change:

**GitLab CI:** Replace the GitHub Actions workflow with a `.gitlab-ci.yml` file. The steps are identical — checkout, install, index, distribute. GitLab CI uses a different artifact syntax and different secret management, but the pipeline logic is the same.

**Jenkins:** Create a Jenkinsfile with the same stages. Jenkins requires more setup (a Python environment on the build agent) but offers more control over execution. For enterprise environments that standardize on Jenkins, this is the natural fit.

**Bitbucket Pipelines:** Similar to GitHub Actions. Create a `bitbucket-pipelines.yml` file. Bitbucket's artifact management is less sophisticated than GitHub's, so cloud sync is the preferred distribution model.

**Self-hosted CI:** If your team uses a self-hosted CI server (common in security-sensitive environments), the same pipeline applies. The advantage is that the index never leaves your infrastructure at any point — the build happens on your server, the index is distributed within your network, and no third-party CI platform sees your code.

For managers, the key insight is that auto-indexing is a ten-line pipeline addition to whatever CI system you already use. It is not a new infrastructure component — it is a step in an existing workflow.

---

### Exercise

> **Try This**
>
> Set up auto-indexing for one repository. Steps:
>
> 1. Create the CI workflow file (use the YAML above as a starting point, adapt for your CI platform if not GitHub Actions)
> 2. Push it to main and verify the job runs successfully
> 3. Check the index output — confirm it includes the expected number of files and chunks
> 4. Pull the index to your local machine and run a search to verify freshness
>
> Time yourself. The entire setup should take less than 30 minutes. If it takes longer, document where the friction was — that feedback is valuable for tool evaluation.

---

### Key Takeaways

- Index staleness is the number one adoption killer — one stale result erodes trust in every subsequent result
- Auto-indexing on every push to main is the structural fix for staleness
- The operational cost is near zero: 10-20 CI minutes per day, well within free-tier limits
- Three distribution models (artifact, cloud sync, Git LFS) serve different team needs
- Monitor index age, indexing duration, and CI job success rate to catch problems before they affect the team

---

## Chapter 5: The 90-Day Implementation Plan

### Chapter Overview

This chapter translates every concept, framework, and technique from the previous four chapters into a concrete, week-by-week implementation plan. No theory. No "it depends." A timeline with milestones, owners, and decision gates that you can hand to your team on Monday morning.

---

### Phase 1: Evaluate (Days 1-30)

The first 30 days are about understanding your team's problem, evaluating your options, and making a decision. At the end of this phase, you will have data, a recommendation, and organizational buy-in to proceed.

**Week 1: Baseline Measurement**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 1 | Conduct context cost audit | EM + 3 developers | Spreadsheet: search time per developer |
| 2 | Classify recent queries as keyword vs. conceptual | EM | Keyword vs. conceptual query ratio |
| 3 | Identify top 10 "where does X live" questions from Slack | EM | List of common conceptual queries |
| 4-5 | Calculate current token costs (see *The Code Intelligence Buyer's Guide*) | EM | Token cost baseline spreadsheet |

At the end of week 1, you have four artifacts: a search time baseline, a query type distribution, a list of real conceptual queries, and a token cost baseline. These are the "before" measurements.

**Week 2: Tool Evaluation**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 6 | Complete the build vs. buy vs. embed decision matrix | EM + tech lead | Decision scorecard |
| 7-8 | Short-list 2-3 tools based on decision | EM | Short list with rationale |
| 9 | Complete security evaluation for each short-listed tool | EM + security lead | Security evaluation memos |
| 10 | Run each tool against the 10 conceptual queries from week 1 | Champion (senior dev) | Result comparison spreadsheet |

At the end of week 2, you have a short list of tools, security evaluations for each, and empirical data on how each handles your team's actual queries. The comparison spreadsheet — same queries, different tools — is the most persuasive artifact you can produce.

**Week 3: Decision and Buy-In**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 11-12 | Present findings to stakeholders (tech lead, skip-level, security) | EM | Decision meeting |
| 13 | Select tool | EM + stakeholders | Signed off decision |
| 14-15 | Set up tool for champion developer | Champion | Working installation + index |

The decision meeting should take 30 minutes. Present: the search time baseline (here is what it costs us today), the query classification (here is why grep is insufficient for 40%+ of our queries), the tool comparison (here is how the tools performed on our real queries), and the security evaluation (here is why this tool meets our requirements). Ask for approval to run a 2-week pilot.

**Week 4: Champion Preparation**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 16-18 | Champion uses tool daily, builds artifacts (Chapter 1 Phase 1) | Champion | Demo query, saved patterns, before-and-after example |
| 19 | Champion creates adoption plan document (Chapter 1 exercise) | Champion | Adoption plan with pilot group names |
| 20 | Install tool on pilot group machines, provide saved patterns | Champion | Pilot group ready |

At the end of week 4 (day 20), the pilot is ready to launch. The champion has proof artifacts, the pilot group is identified, and the tool is installed.

> **Key Insight**
>
> The evaluation phase feels slow. Four weeks of measurement and analysis before anyone besides the champion uses the tool. This is intentional. The measurement baseline is what makes the rest of the plan defensible. Without "before" numbers, you cannot demonstrate "after" improvements. The four weeks of evaluation pay for themselves in credibility when you present results at day 60.

---

### Phase 2: Pilot (Days 31-60)

The second 30 days are about proving value with a small group and preparing for rollout.

**Weeks 5-6: Pilot Execution**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 21-30 | Pilot group uses tool (3-5 developers, 2 weeks) | Pilot group | Usage data |
| 25 | Week 1 check-in with pilot group (brief, 1-on-1) | Champion | Friction log |
| 30 | Pilot debrief meeting (30 min, all pilot members) | Champion + EM | Go/no-go decision for rollout |

The pilot instruction (repeated from Chapter 1): "When you would normally grep, try the semantic search instead. If it is worse, tell me."

During the pilot, track:
- Number of queries per developer per day (from tool telemetry)
- Any reported friction or failures
- Whether the saved patterns are being used or if developers are writing ad-hoc queries

The debrief decision criteria:
- 3+ of 5 developers using it regularly: proceed to rollout
- 2 of 5: extend pilot by one week, fix identified friction
- 0-1 of 5: reconsider tool selection or timing

**Weeks 7-8: Rollout Preparation**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 31-33 | Set up auto-indexing in CI/CD (Chapter 4) | Champion + DevOps | Working CI workflow |
| 34-35 | Commit saved patterns and configuration to repo | Champion | Configuration in repo |
| 36 | Update onboarding doc with one-line instruction | Champion | Updated onboarding doc |
| 37-38 | Set up measurement infrastructure (Chapter 2) | Champion + EM | Telemetry configured, baseline recorded |
| 39-40 | Prepare rollout communication | EM | Email/Slack message to team |

The rollout preparation is the infrastructure work that makes adoption self-sustaining. After these two weeks, a new developer who clones the repo gets the saved patterns, runs one command to build the index, and starts searching. No champion needed. No setup guide. No meeting.

> **Warning**
>
> Do not skip the auto-indexing setup. It is tempting to roll out without CI/CD integration and "add it later." Later never comes. Meanwhile, indexes get stale, results degrade, and developers who had a good first experience have a bad second experience. Auto-indexing is not a nice-to-have — it is the foundation that prevents adoption regression.

---

### Phase 3: Rollout and Measure (Days 61-90)

The final 30 days are about team-wide deployment and demonstrating measurable results.

**Weeks 9-10: Team Rollout**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 41 | Send rollout communication to team | EM | Team notified |
| 42-45 | Support team members setting up (should be 1 command) | Champion | Team installed |
| 46-50 | First two weeks of team-wide usage | Team | Usage data accumulating |

The rollout communication should be two paragraphs:

Paragraph 1: What the tool does and why we adopted it (reference the pilot results — "three of five pilot developers reported finding code faster for conceptual queries").

Paragraph 2: How to set up (one command) and where to find saved patterns (they are already in the repo).

No meeting. No training session. No mandatory adoption. The tool is available. The patterns are in the repo. The index stays fresh. Developers who want it use it. Developers who do not, do not. The adoption curve (Chapter 1) predicts 20% immediate, 50% gradual, 30% holdout. This is normal.

**Weeks 11-12: Measurement and Reporting**

| Day | Activity | Owner | Output |
|-----|----------|-------|--------|
| 51-55 | Collect 30-day measurement data | EM | KPI dashboard |
| 56-58 | Conduct second time-to-context audit (compare to baseline) | EM + 3 developers | Before-and-after comparison |
| 59 | Run monthly developer survey (Chapter 2) | EM | Survey results |
| 60 | Compile 90-day report | EM | Report to stakeholders |

The 90-day report is the deliverable that closes the loop. It contains:

1. **Baseline vs. current:** Time-to-context, token cost, cycle time, review turnaround — whatever you measured in week 1, measure again. Present the delta.
2. **Adoption metrics:** How many developers are using the tool? What is the semantic-to-grep ratio? What does the adoption curve look like?
3. **Qualitative feedback:** The best anecdotes from the developer survey. "I searched for 'session expiration handling' and found three conflicting implementations across two services" is worth more than any number.
4. **Operational metrics:** Index freshness, CI job success rate, indexing time trend. These confirm the infrastructure is healthy.
5. **Recommendation:** Continue, expand to other teams/repos, or sunset. Based on data, not opinions.

---

### The Complete Timeline

| Week | Phase | Key Activity | Decision Gate |
|------|-------|-------------|---------------|
| 1 | Evaluate | Baseline measurement | - |
| 2 | Evaluate | Tool evaluation + security review | Short list approved |
| 3 | Evaluate | Decision meeting + tool selection | Tool selected, pilot approved |
| 4 | Evaluate | Champion preparation | Champion artifacts ready |
| 5-6 | Pilot | 2-week pilot (3-5 developers) | Go/no-go for rollout |
| 7-8 | Pilot | Rollout preparation (CI/CD, config, docs) | Infrastructure ready |
| 9-10 | Rollout | Team-wide deployment | - |
| 11-12 | Rollout | Measurement + 90-day report | Continue / expand / sunset |

*Figure 5: 90-day implementation timeline with decision gates.*

---

### Scaling Beyond the First Team

If the 90-day report shows positive results, the natural next step is expanding to other teams or repositories. The playbook is the same — Champion, Pilot, Rollout — but the timeline compresses because the infrastructure (CI/CD, cloud sync, measurement) is already in place.

A second-team rollout typically takes 30 days:
- Week 1: Identify champion on the new team, set up the tool, build team-specific saved patterns
- Week 2: Run a pilot with 3-5 developers on the new team
- Weeks 3-4: Roll out to the full team, configure service-aware settings for their codebase

Each subsequent team is faster because the organizational muscle memory exists. The champion on team 2 can reference team 1's results. The security evaluation is already done. The CI/CD infrastructure extends rather than rebuilding.

Within 6-12 months, a 50-person engineering organization can have code intelligence deployed across all teams with measurable productivity improvements on each.

---

### Beyond Day 90: The Long-Term Roadmap

The 90-day plan gets you to initial adoption. The six-month and twelve-month horizons extend the value:

**Months 4-6: Deepen integration.** Connect the semantic search backend to your code review tool (automated context retrieval for PR reviewers). Integrate with your incident response workflow (on-call engineers search for service owners and relevant code during incidents). Build an internal MCP server that provides code context to your team's AI assistant. Each integration multiplies the value of the index you have already built.

**Months 7-9: Expand coverage.** Index additional repositories. If you started with the main application, add the shared libraries, the infrastructure-as-code repository, and the documentation site (which often contains valuable context about why code is structured the way it is). Cross-repo search — searching across all indexed repositories from a single query — becomes available as coverage grows.

**Months 10-12: Optimize and measure.** By this point, you have enough usage data to identify which query patterns produce the best results and which produce poor results. Consider fine-tuning the embedding model on your codebase's specific vocabulary. Evaluate whether the tool's ROI justifies upgrading to a team or enterprise tier. Present the annual review to leadership with twelve months of measurement data — the strongest possible case for continued investment.

The long-term roadmap is not a commitment. It is a set of options that become available as the initial adoption matures. Each extension is justified independently by the value it adds, not by a predetermined plan.

---

### What Success Looks Like at Day 90

If the implementation went well, here is what your organization looks like at day 90:

- **50%+ of developers** use semantic search at least once per day
- **Time-to-context** has decreased by 20-40% from the week 1 baseline
- **Token costs** have decreased by 80-95% if you were previously context flooding
- **The index is always fresh** — auto-indexing runs on every push, distribution is automatic
- **New developers** find code on their first day by searching concepts, not filenames
- **Code reviewers** use semantic search to find cross-cutting concerns before approving PRs
- **The tool is invisible** — developers do not think about it any more than they think about grep. It is just how search works now

If the implementation did not go well, you have data about why. The baseline measurements from week 1 and the pilot debrief from week 6 identify exactly where the friction was. Maybe the tool did not perform well on your codebase. Maybe the adoption friction was too high. Maybe the team was not ready for a new tool during a high-pressure quarter. All of these are valid outcomes that the data makes visible.

Either way, you made a data-driven decision. You tried, measured, and concluded. That is how engineering organizations should evaluate tools.

---

### Common Failure Modes and Recovery

Even well-executed 90-day plans encounter setbacks. Recognizing common failure modes early lets you recover before the plan derails entirely.

**Failure mode 1: The champion leaves the team.**

This happens more often than managers expect. The champion moves to another team, goes on leave, or leaves the company during the 90 days. Without the champion, the pilot group loses their point of contact and the rollout loses its driver.

Recovery: Identify a backup champion in the pilot group during Phase 2. The developer who is most enthusiastic during the pilot — the one who sends you the "I found something cool" Slack messages — is your backup. Ensure they have the same artifacts (demo query, saved patterns, before-and-after) so they can step in without starting over.

**Failure mode 2: The pilot coincides with a crunch.**

The team enters crunch mode during the pilot period — a major release, an incident, a deadline acceleration. Developers drop everything non-essential, and a new search tool is non-essential.

Recovery: Pause the pilot. Do not cancel it — pause it. Tell the pilot group: "Focus on the release. We will resume the pilot on [date]." When you resume, the developers already have the tool installed. They just need to start using it again. The restart takes days, not weeks.

**Failure mode 3: The tool performs poorly on your codebase.**

The tool demos well on standard codebases but produces mediocre results on yours. Maybe your codebase has unusual naming conventions. Maybe it is heavily generated code. Maybe the embedding model was not trained on your primary language.

Recovery: This is why the pilot exists. If the tool is not working, identify the specific failure pattern — "it misses all Go code" or "it does not understand our naming convention of prefixing everything with pkg_" — and determine whether the failure is fixable (configuration change, model swap) or fundamental (the tool is not designed for your use case). If fundamental, evaluate the next tool on your short list. The 30 days of measurement are not wasted — they give you concrete failure examples to discuss with the next vendor.

**Failure mode 4: Adoption plateaus below 50%.**

The rollout happens, but adoption stays at 20-30% after four weeks. The immediate adopters are using it daily, but the gradual adopters never adopted.

Recovery: Diagnose the specific barrier. The three most common: (a) developers do not know the saved patterns exist — solve with a one-line reminder in the team channel, (b) the index is stale and developers had a bad first experience — solve by fixing the auto-indexing pipeline, (c) the tool's results are not notably better than grep for the queries this specific team runs — this may mean the tool is not right for this team's work, and that is a valid conclusion.

---

### Exercise

> **Try This**
>
> Create a 90-day project plan in your team's project management tool (Jira, Linear, Asana, or a spreadsheet). Using the timeline above:
>
> 1. Create 12 weekly milestones
> 2. Assign an owner to each major activity
> 3. Mark the three decision gates (week 2: short list, week 6: go/no-go, week 12: continue/expand/sunset)
> 4. Set calendar reminders for the two time-to-context audits (week 1 and week 11)
>
> Share the plan with your skip-level manager and your team. This is your implementation roadmap.

---

### Key Takeaways

- The 90-day plan has three phases: Evaluate (days 1-30), Pilot (31-60), Rollout (61-90)
- Week 1 baseline measurement is the most important activity — without "before" numbers, you cannot demonstrate "after" improvements
- Three decision gates prevent wasted effort: short list approval, pilot go/no-go, and 90-day continue/expand/sunset
- Auto-indexing setup during the pilot phase prevents the adoption regression that kills most tool rollouts
- Success at day 90 is 50%+ adoption with measurable improvements in time-to-context and token costs

---


## Conclusion

You now have a complete playbook for deploying AI code search — not as a one-time experiment, but as a durable engineering capability. That distinction matters. Most teams that fail at this don't fail because they picked the wrong tool. They fail because they treated adoption as an event rather than a system. What you've built through these chapters is the system.

Three threads run through everything covered here, and they're worth naming explicitly because they'll keep surfacing long after the 90 days are done.

The first is that trust is the actual product. The technology works. Semantic search over large codebases is solved. What isn't solved — what you have to actively build — is the confidence of the engineers who need to rely on it. That's why the Champion-Pilot-Rollout framework isn't a nicety. It's the mechanism that converts skeptics into advocates by giving them real results before you ask for real commitment. Every phase of that framework is designed around one question: how do we let engineers form their own judgment with low stakes? You don't win adoption by convincing people. You win it by making it easy to try.

The second thread is that integration depth determines staying power. Plenty of tools get installed and quietly disappear from workflows within a month. The ones that stick are the ones woven into how the team already operates — into CI/CD, into PR review, into the monorepo tooling that runs every day whether or not anyone is paying attention. That's not just a technical decision. It's a signal to the team about whether leadership considers this important enough to build around, or just nice enough to try. Surface-level integrations get surface-level adoption. Deep integrations become infrastructure.

The third thread is that rollout quality compounds. Teams that run a clean 90-day implementation — with actual metrics, real champions, and an honest retrospective — end up with a template they can reuse. The next tool they adopt takes half the effort. The one after that, less. The teams that skip the rigor get the opposite: a graveyard of half-deployed tools and a culture that's learned to wait out the next initiative. How you do this rollout shapes how future rollouts go.

Here's what to do Monday morning: identify one engineer on your team who already complains about codebase navigation. Not someone who is enthusiastic about AI tooling in general — someone who has a specific, recurring pain point. Get them access to the search tool this week. Give them one concrete task where the tool should help. Watch what they do. Don't set expectations. Don't give a tutorial. Just observe. What they struggle with, what surprises them, what they reach for instinctively — that's your real integration spec. Everything you build around the rollout should solve for that engineer's actual experience, not the idealized workflow you drew on a whiteboard.

The reason people don't apply what they've read is almost never confusion. It's scope paralysis. The full 90-day plan looks like a project that needs a project manager, stakeholder buy-in, and a dedicated sprint. So nothing happens. The way through that is to collapse the scope to the smallest possible true first step, which is why Monday morning matters more than Q3 planning. One engineer. One task. This week. That's a pilot of one, and it's enough to start learning. You can build from there. You cannot build from a plan that never starts.

The stakes here are straightforward, and I'd rather say them plainly than bury them in motivation.

If you act on this, you'll have a team that can navigate an unfamiliar codebase in minutes instead of days. Engineers will spend more time building and less time asking each other where things live. Onboarding will stop being a six-week archaeology project. Reviews will catch more, faster. The compounding effect of that across a year — in shipping velocity, in reduced interruption cost, in the quality of architectural decisions made by people who actually understand the codebase — is real and significant. These aren't soft benefits. They show up in cycle time, in incident response, in the confidence of engineers making changes they'd previously have been afraid to touch.

If you don't act on this — or if you act on it the way most teams do, meaning you install something, call it a pilot, and never run the retrospective — you'll stay where you are. Which is fine, until it isn't. The teams investing in this now are building a structural advantage in how fast they can move. That gap doesn't close on its own. Codebases grow. Teams scale. The cost of poor navigation compounds exactly the way the benefit of good navigation does, just in the wrong direction.

The framework is here. The 90-day plan is specific enough to execute. The only variable left is whether you start.
# Back Matter

---

## Appendix A: Glossary

| Term | Definition |
|------|-----------|
| Adaptive threshold | A dynamic similarity score cutoff that adjusts based on the score distribution of results for a given query, preventing both too many and too few results from being returned. |
| Context flooding | The practice of sending large volumes of code (often entire files or repositories) to an LLM context window, regardless of relevance. Results in high token costs and reduced accuracy. |
| Context routing | The practice of using semantic search to identify and deliver only the most relevant code chunks to an LLM or developer, reducing token costs and improving response accuracy. |
| Context-switching cost | The cognitive overhead of interrupting deep work to perform a different task (like searching for code), including the 15-23 minutes required to regain full focus afterward. |
| First-result relevance | The percentage of queries where the top-ranked result is the code the developer actually needed. A key quality metric for search tools, with a target of 70%+ for semantic queries. |
| Hybrid retrieval | A search approach that combines keyword matching (BM25) and semantic matching (embedding similarity) to return results that satisfy both exact-match and conceptual queries. |
| Incremental indexing | Updating only the changed portions of an index rather than rebuilding the entire index from scratch. Enables fast re-indexing on every code change. |
| Knowledge drain | The organizational loss of codebase understanding when developers leave a team, take leave, or rotate to different projects. Semantic search tools mitigate this by making codebase knowledge retrievable rather than personal. |
| Local-first | An architecture where data processing (indexing, embedding, search) happens on the developer's machine rather than on remote servers. Code never leaves the local filesystem. |
| MCP (Model Context Protocol) | A protocol that allows AI assistants to connect to external context sources (like code search indexes) to retrieve relevant information during conversations. |
| Monorepo | A single repository containing multiple services, libraries, or projects. Common in large engineering organizations. Creates unique challenges for code search due to naming collisions and scale. |
| Semantic search | Search based on meaning rather than keyword matching. Uses embedding models to find code that implements a concept, even when the naming conventions differ from the search query. |
| Semantic-to-grep ratio | The proportion of a developer's searches performed using semantic search versus keyword search (grep). An increasing ratio indicates growing adoption and trust in the semantic search tool. |
| Service-aware indexing | Partitioning a code index by service or component boundaries, enabling scoped search and context-dependent ranking within monorepos. |
| Similarity score | A numerical value (typically 0 to 1) indicating how semantically similar a code chunk is to a search query. Higher scores indicate closer semantic matches. |
| Time-to-context | The elapsed time from when a developer begins searching for code to when they have sufficient understanding to proceed with their task. The primary productivity KPI for code intelligence. |
| Vocabulary gap | The mismatch between the terms a developer uses in a search query and the terms used in the code implementation. Semantic search bridges this gap; keyword search cannot. |

---

## Appendix B: Tools & Resources

| Tool / Resource | URL | Purpose |
|----------------|-----|---------|
| Pyckle (code-mcp) | pyckle.co | Local-first semantic code search with hybrid retrieval and context routing |
| GitHub Actions | github.com/features/actions | CI/CD platform for automating indexing workflows |
| GitLab CI | docs.gitlab.com/ee/ci/ | CI/CD platform for automating indexing workflows |
| Jenkins | jenkins.io | Self-hosted CI/CD for enterprise auto-indexing pipelines |

---

## Appendix C: Further Reading

- **"Measuring Developer Productivity"** — Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, Jenna Butler (ACM Queue, 2021). The SPACE framework for developer productivity metrics — relevant to Chapter 2's measurement approach.

- **"Code Search, Decoded" series** (pyckle.co/blog). 20-episode technical series covering semantic search from first principles through advanced team workflows. Episodes referenced in this guide: Episode 16 (CI/CD auto-indexing), Episode 18 (monorepo search), Episode 20 (team adoption).

- **"The DORA Metrics"** — dora.dev. Industry-standard framework for measuring software delivery performance. Complementary to the KPIs in Chapter 2.

- **"The Code Intelligence Buyer's Guide"** — David Kelly Price (Pyckle, 2026). The companion book covering the evaluation side: what code intelligence is, how to evaluate tools, the economics of context flooding vs. semantic routing, and security and compliance considerations. Start there if you have not yet decided to adopt.

---

## About the Author

David Kelly Price is the founder of Pyckle, building AI context optimization tools for development teams. Background in AI/ML tooling, retrieval systems, and context routing for codebases. MBA in Finance — analytical rigor applied to technical problems.

---

## About Pyckle

Pyckle builds local-first code intelligence tools for development teams. The core product, code-mcp, provides semantic code search, hybrid retrieval, and context routing that runs entirely on the developer's machine. Code is indexed locally, embeddings are generated locally, and search queries are processed locally. Nothing leaves the machine unless the developer explicitly opts into team sync features.

The tool integrates with AI coding assistants via MCP (Model Context Protocol), providing semantically relevant code context to LLMs without flooding the context window. For teams, Pyckle offers shared search patterns, CI/CD auto-indexing, and usage analytics — all built on a local-first architecture where source code stays on developer machines and only metadata crosses the network boundary.

---

*Rolling Out AI Code Search — Version 1.0 — March 2026*
*Published by Pyckle (pyckle.co)*

*© 2026 Pyckle. All rights reserved. This guide may be shared freely for personal and educational use. Commercial reproduction or redistribution requires written permission. Contact kellyprice@pyckle.co.*

---


---

## Related Blog Posts

- [Configuration Should Travel with You](https://pyckle.co/blog/configuration-should-travel-with-you.html)
- [Your Team's Knowledge Lives in Multiple Places](https://pyckle.co/blog/your-teams-knowledge-lives-in-multiple-places-and-your-ai-only-sees-one.html)

---

*[Browse all free guides →](https://pyckle.co/books.html)*