Why the most important SEO decisions happen before you write code — architecture, the 2026 Core Web Vitals changes, AI-search visibility, and Information Gain.
A site goes live. Everyone’s proud of how it looks. And then… nothing. Weeks go by, traffic trickles in at a pace that would embarrass a brochure, and by the third month it finally clicks: search engines barely know the site exists.
Hey, my name is Roman Makuev. I lead the SEO and design team at Neon Team, and after fourteen years and more projects than I can neatly count, I ended up building my own SEO analysis tool, Algorithm, to make the work faster and more precise instead of guessing.
I’ve watched this play out more times than I’d like to admit. The frustrating part is that it’s almost never a design problem or a content problem in isolation. It’s a timing problem. Most teams treat SEO as something you bolt on after launch, when in reality the decisions that matter most — the ones that are painful and expensive to undo — all happen before anyone writes code.
When you build with SEO baked in from the start, you skip the bulk of the cleanup work that otherwise eats three to six months after launch. You also dodge the structural rewrites that quietly kill ranking potential. A site planned for search from day one indexes faster, climbs higher, and holds its position more reliably than one retrofitted in a panic. This is the approach my team takes on every build — we treat web development and SEO as one process rather than two phases, and we even write the first batch of blog content during development, scored by our own analyzer before a word goes live. (More on why that matters later — it’s not vanity, it’s defense against a domain-wide penalty most people don’t see coming.)
TL;DR
- The expensive SEO decisions (keyword strategy, architecture) happen before code. Retrofitting costs 3–6 months.
- Core Web Vitals thresholds changed in 2026 — and FID is dead. If your checklist still says “FID under 100ms,” it’s wrong.
- AI search (AI Overviews, ChatGPT, Perplexity) is now a real visibility channel. Only ~38% of AI Overview citations come from top-10 pages — down from 76% a year earlier.
- Google now scores your domain as a whole. One section of thin, rehashed content can drag down your money pages.
- The winning play isn’t more content. It’s Information Gain — bringing something to the index that wasn’t there before.
People tend to picture SEO and web development as two separate jobs handed to two separate teams: the developers build the thing, then the SEO people show up afterwards to “optimize” it. That hand-off is exactly where most ranking potential leaks away. In practice, SEO and web development are the same job viewed from two angles — every meaningful SEO decision is a build decision, and most build decisions quietly carry SEO consequences whether anyone meant them to or not.
Think about what actually gets decided in code. How the site is structured. How URLs are formed. Whether pages render in a way a crawler can read. How fast the thing loads on a mid-range phone. What markup describes the content to a machine. None of those are “add it later” tasks — they’re architecture, and architecture is the developer’s domain. By the time an SEO specialist is brought in post-launch to fix them, the cheap window has closed and you’re paying for rewrites instead of decisions. That’s why, when web development and SEO live in the same process, the work is mostly invisible: there’s no separate “SEO phase” because the right choices were already baked into how the site was built.
The point isn’t that developers must become SEO experts or vice versa. It’s that the two have to share a plan before anyone writes code — agree on the structure, the URL patterns, the rendering approach, the content the site needs to own. Get that right and SEO stops being a costly cleanup project and becomes a property of a well-built site. The rest of this guide is really just a breakdown of which of those decisions matter most, and when each one needs to be made.
It helps to stop thinking of SEO as a task and start thinking of it as a set of decisions spread across the project. There are three windows where those decisions land.
Before development. This is where your keyword strategy (the “semantic core”) and your site architecture get locked in. Change your mind later and the cost grows exponentially — you’re not editing a page, you’re re-pouring the foundation.
During development. URL patterns get encoded. You choose how pages render. Meta tags, schema, the whole technical skeleton — it all gets wired up here.
Before launch. Everything gets tested and audited. A problem caught in this window is a quick fix. The same problem caught a month after launch is an emergency, complete with re-crawling and lost ground.
Once you see it this way, SEO stops being the thing you cram in at the end. It’s threaded through the entire timeline.
| Phase | What gets decided | Cost of changing it later |
|---|---|---|
| Before development | Semantic core, site architecture | Highest — structural rewrite |
| During development | URLs, rendering, meta, schema | Medium — redirects, re-crawl |
| Before launch | QA, validation, audit | Low — quick fixes |
| After launch | Everything above, in panic mode | Emergency — lost rankings |
The semantic core is just the structured answer to one question: what is this site actually about, and what are people searching for when they need it? Here’s the process I use.
1. Market intelligence. Look at what competitors rank for, find the gaps they’ve left open, and spot seasonal patterns. Semrush, Ahrefs, AnswerThePublic and Google Trends each show you a different slice of this.
2. Collect keywords. Pull together your primary topics (around 5–15 core themes), the long-tail variations that reveal real intent (specific three-plus-word phrases), and the question-shaped queries — the “how to,” “what is,” “where to find” stuff. That last bucket matters more than ever in the AI-search era, and I’ll come back to why.
3. Cluster and tag by intent. Group keywords that mean roughly the same thing, then label each cluster by what the searcher actually wants: to learn (informational), to find a specific site (navigational), to compare (commercial), or to buy (transactional). Intent is the part most people skip, and it’s the part that makes everything downstream easier.
This is also the step where precision pays off, and where I lean on tooling rather than eyeballing it. Loose clustering — lumping “best X” with “what is X” — quietly wrecks your architecture, because you end up building one page for two different intents and it ranks for neither. We built our own clustering engine specifically to keep intent boundaries clean at scale, but the principle holds regardless of tool: a cluster is only a cluster if every keyword in it shares the same intent.
4. Map topics to the site. Every cluster should correspond to a real area of the site. This is the step that quietly dictates your URL structure and navigation, because it tells you which pages need to exist and how they relate.
5. Write it down. Build one reference sheet — topic pillar, primary keyword, search volume, difficulty, long-tail variations, target URL, intent — and share it with everyone touching the project. Designers, devs, writers. A semantic core that lives in one person’s head isn’t a strategy, it’s a liability.
The thing to resist is the urge to “figure out keywords later.” By the time development is underway, your architecture is already hardening around whatever assumptions you made. Do this first.
Your structure decides whether search engines can efficiently reach your content, which pages accumulate authority, and how quickly new pages get indexed. A few principles I won’t budge on:
Keep the hierarchy shallow and logical. Three to five primary categories, subcategories under those, individual pages under those. Aim to keep everything within three clicks of the homepage — pages three clicks deep get crawled far more thoroughly than ones buried six clicks down.
Keep URLs flat. Three levels deep, max. /resources/blog/seo-techniques is fine. /resources/guides/content/seo/techniques/best-practices is a maze.
Organize by theme, not by whim. Build pillar pages that cover a major topic comprehensively, then support each with cluster pages that go deep on subtopics. Interlink them deliberately — that’s how you signal to a search engine that you actually own a subject rather than just touching it.
Leave no page orphaned. Every page worth having should be reachable from navigation, footer, or an internal link. If nothing points to a page, search engines treat it like it doesn’t exist — and honestly, so do users.
A visual sitemap drawn before development becomes your blueprint. It’s worth the half hour.
A few URL habits that pay off forever:
on-page-seo, never on_page_seo)And the technical checklist that should be in place at launch:
robots.txt — Block the stuff crawlers don’t need (admin, staging, internal search results) while keeping important resources open. Point to your sitemap here. One addition for 2026 that wasn’t on anyone’s radar a couple years ago: decide deliberately how you treat AI crawlers like GPTBot, ClaudeBot, PerplexityBot and Google-Extended. Allowing them is how your content becomes eligible to be cited inside AI answers — more on that below.
sitemap.xml — List every page that matters, auto-generated through your CMS so it stays current.
Canonical tags — Self-referencing on unique pages, so you don’t accidentally split authority across duplicate URLs.
HTTPS — Non-negotiable. SSL certificate, clean HTTP→HTTPS redirects, and a Strict-Transport-Security header.
Mobile — Responsive is the floor, not a feature. Test on real devices: content reachable without zooming, tap targets at least 48px. Over half of all search traffic is mobile; a layout that “mostly works” on a phone is a layout that mostly loses.
Core Web Vitals — These are confirmed ranking factors, and several numbers people memorized a few years ago are now wrong. Here’s the current state:
| Metric | What it measures | “Good” threshold | Note |
|---|---|---|---|
| LCP (Largest Contentful Paint) | Loading speed | under 2.0s | Tightened from 2.5s in Google’s March 2026 update; 2.0–2.5s is now “needs improvement” |
| INP (Interaction to Next Paint) | Responsiveness | under 200ms | Replaced FID entirely in March 2024 — FID is dead, don’t optimize for it |
| CLS (Cumulative Layout Shift) | Visual stability | under 0.1 | Give every image, video, iframe and ad slot explicit dimensions |
The INP one trips people up most. If your checklist still references “First Input Delay under 100ms,” it’s two years out of date — Chrome dropped FID support entirely, and INP is stricter because it watches every interaction across a session, not just the first click.
One nuance worth knowing: these are measured from real users via the Chrome User Experience Report (field data), at the 75th percentile — not from a one-off lab test on your laptop. A page can score beautifully in PageSpeed Insights’ lab simulation and still fail in the field. You hit the targets through image compression, trimming CSS and JavaScript, caching, and a CDN.
And there’s a quieter reason to care, beyond the raw ranking factor: speed moves money. In a joint Deloitte/Google study across dozens of brands, a 0.1-second improvement in load time lifted retail conversions by around 8% and travel-booking conversions by around 10%. Core Web Vitals aren’t an abstract technical exercise — every saved millisecond shows up downstream.
Schema turns your plain text into structured data that search engines (and increasingly, AI models) can read without guessing. It’s what powers rich results, featured snippets, and the answers that show up in voice and AI search.
The types worth implementing on most sites:
Use JSON-LD. It’s clean, it lives separately from your HTML, and it’s the format Google explicitly prefers. Here’s the shape of it, so it’s concrete rather than abstract:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "SEO Starts Before the First Line of Code",
"datePublished": "2026-06-08",
"author": {
"@type": "Person",
"name": "Roman Makuev",
"url": "https://seo-algorithm.com/"
},
"image": "https://example.com/cover.png"
}
</script>
One rule that trips people up: your schema must match what’s actually visible on the page. Don’t mark up an FAQ that isn’t there. Before launch, run everything through Google’s Rich Results Test.
Here’s what’s genuinely new and what most older “SEO before development” guides don’t mention at all. Search isn’t just ten blue links anymore. A growing share of queries get answered directly — inside Google’s AI Overviews, or inside ChatGPT, Perplexity, Gemini and Claude — without the user ever clicking through. If your content can’t be read, understood, and cited by those systems, you’re invisible in a channel that’s quietly eating a chunk of your traffic.
And here’s the data point that should reframe how you think about it. Ahrefs analyzed 863,000 queries (Oct 2025 – Feb 2026) and found that only about 38% of citations in AI Overviews come from pages ranking in the top 10 — meaning roughly 62% come from outside it. A year earlier that figure was 76%. (Sources: Ahrefs, Search Engine Journal.) Translation: ranking #1 no longer guarantees you a seat in the AI answer. Being cited is becoming its own discipline, and it rewards depth and source-worthiness over pure position.
The reassuring part: there’s no separate dark art here. Google itself published a guide in May 2026 confirming that strong, traditional SEO — original content, a crawlable and well-structured site, a clear user experience — is still the foundation for showing up in AI Overviews and AI Mode. It also pushed back on some of the hype, treating tactics like llms.txt files and artificially “chunking” content as unproven. So treat the genuinely new stuff as a thin layer on solid fundamentals, not a replacement.
What that layer actually involves:
Answer the question early. AI systems extract facts. If the direct answer to a page’s core question is buried in paragraph nine, it won’t get pulled. Put a clear, self-contained answer in the first hundred words or so, and phrase headings as the questions people actually ask.
Make authority verifiable. This is where E-E-A-T stops being a buzzword and becomes a technical requirement. AI systems cite authors with a real digital footprint — a LinkedIn profile, mentions in the Knowledge Graph, external references. No expert profile, no seat in the AI answer. If you describe a case study but there’s nothing online to corroborate it (no author trail, no documents, no mentions), Google’s grounding checks treat it as unverified and your trust score takes the hit.
Lean even harder on structured data. LLMs rely on schema more than traditional crawlers do — it’s how they reliably extract facts instead of inferring them. The schema work above isn’t just for rich snippets anymore; it’s how you stay legible to AI.
Offer something that isn’t a commodity. When AI can summarize the generic version of any topic in two sentences, the only content worth citing is content that says something the model couldn’t generate on its own. Which brings us to the single most important shift of all.
If you take one idea from this whole piece, take this one.
Topic authority in 2026 is awarded for added value, not volume. Google has gotten ruthless about excluding rehashes of whatever’s already in the top 10. If an article doesn’t bring new data, a fresh case, a different angle, or a connection between ideas that wasn’t already out there, it doesn’t earn authority — it can sit at zero no matter how long or well-formatted it is. The term for this is Information Gain: how much new you added to the index, not how many words you produced.
This has a hard consequence most people underestimate: Google now evaluates the quality of your domain as a whole. A blog full of thin, rehashed, AI-spun articles published “for volume” doesn’t just fail to rank itself — it drags down the trust of your entire domain, including your service and money pages. I’ve audited sites where the services pages were genuinely good but couldn’t rank because a neglected blog of 60 low-effort posts had poisoned the host-level signals. The commercial keywords simply wouldn’t climb until the dead weight was dealt with.
This is exactly why my team writes real, scored content during development rather than spinning up filler later. Before anything publishes, we run it through our own analyzer — it flags AI-pattern density, checks whether the piece actually adds information versus paraphrasing competitors, and grades it against the genre. A page that scores as a rehash never goes live, because one weak cluster can cost the whole domain. The tooling is ours (Algorithm), but the discipline is portable: measure information gain before you publish, not after you’ve tanked.
A practical way to think about angle: instead of writing the hundredth “how to choose X,” write “why you probably don’t need X” — the contrarian, experience-based take that no scraper can assemble from the existing top 10. One article with a genuine insight will out-earn a hundred pages of AI water.

Your title tag is the one piece of on-page real estate that earns or loses the click before anyone reads a word, so I treat it as a tiny ad rather than a label. The pattern I keep coming back to is the keyword first, then the value, then a modifier — something like On-Page SEO Checklist: 10 Actionable Techniques [2026]. Keep it under sixty characters or so, put the part that makes someone curious up front, and never reuse the same title twice across the site. Stuffing it with keywords does the opposite of what people hope.
Meta descriptions are a different job. They won’t move your rankings on their own, but they decide whether the listing looks worth clicking, so I write them like a one-line pitch: the benefit, a quick why, and a nudge to act, all inside roughly 155–160 characters with the main keyword sitting in there naturally rather than jammed in.
Headings are where a lot of sites quietly trip themselves up. One H1 per page, carrying the primary keyword, and then H2s and H3s to break the thing into a shape that actually mirrors how the content flows. Jumping straight from an H2 to an H4 because it “looks right” confuses screen readers and crawlers alike, and it’s the kind of thing nobody notices until an audit flags it.
Then there’s internal linking, which I think is the most underused tool on this list. The model that works is pillar-and-cluster: a broad pillar page that owns the topic, surrounded by cluster pages that go deep on the narrow stuff, with every cluster pointing back up to the pillar and the pillar pointing back down to its clusters.
Pillar: "Content Marketing Strategy: Complete Guide"
├── Cluster: "Content Marketing for SaaS"
├── Cluster: "Content Calendar Creation"
└── Cluster: "Measuring Content ROI"
That web of links is how you tell a search engine you’ve actually covered the ground instead of just brushing against it. A few things make it work in practice: anchor text that describes where the link goes (so, not “click here”), the important links sitting high on the page where they get noticed, restraint — a handful of links that mean something beats a wall of them — and the habit of pointing your strongest existing pages at anything new, so the new stuff inherits a little of that authority.
Once a month is plenty. I’d watch organic traffic, the average position of your target keywords, click-through rate, how many pages are actually sitting in the index, and your Core Web Vitals baselines. And there’s one more worth adding lately: whether the AI tools mention you at all. The low-tech way to check is to open ChatGPT, Gemini, Perplexity and Copilot and ask them the questions your customers ask — then see if your name comes up, and whether what they say about you is even right.
For the rest, Search Console and Analytics 4 do most of the heavy lifting and they’re free, so there’s no excuse not to have both. Add a rank tracker — Semrush, Ahrefs, Moz, whichever you already pay for — and a speed tool like PageSpeed Insights or GTmetrix so you can see the gap between your lab numbers and what real users experience.
The loop itself is simple, and the discipline is in actually running it. Look at the trends. Find the openings — a page pulling lots of impressions but few clicks usually just needs a better title and description; one stuck around position eight to ten often needs more depth, the real-information kind rather than padding; a new cluster of queries showing up is a hint to build something for it. Make your changes, then leave them alone for four to six weeks before you decide whether they worked, because search rarely moves on the timeline you’d like.
None of this is a rule, but after enough projects the shapes are pretty consistent.
| Project size | Duration | Shape of the work |
|---|---|---|
| Small (10–20 pages) | 2–4 weeks | Core + architecture → URL strategy + templates → build against checklist → pre-launch audit → launch |
| Mid-size (50–200 pages) | 6–8 weeks | Planning + stakeholder alignment → architecture + templates → active dev → pre-launch prep → launch |
| Enterprise (500+ pages) | 12+ weeks | Comprehensive planning + competitive analysis → tech specs → two dev phases (schema in the second) → pre-launch + migration planning → phased launch with monitoring |
If your budget or patience only stretches to a few things, these are the ones I’d fight for. Get your semantic core down before development starts. Build an architecture that’s logical and shallow enough for a person to navigate without thinking. Make the mobile experience genuinely good, tested on a real phone rather than a resized browser window. Hit the current Core Web Vitals thresholds, not the ones from two years ago. And never let a rehash go live — because, as I keep saying, one weak cluster can pull the whole domain down with it.
Once those are handled and you’ve got room to push further, that’s when the rest earns its place: full schema coverage (Organization, BreadcrumbList and FAQ at the very least), tidy URLs and meta tags, the AI-search layer, an author profile that actually proves your E-E-A-T, and the unglamorous habit of revisiting all of it after launch instead of declaring victory.
Most of the failures I see come from the same short list. Someone treats SEO as a thing you do after launch. The URL structure ends up six levels deep and impossible to unwind later. The mobile version “mostly works,” which is another way of saying it mostly loses. The checklist still optimizes for FID, a metric that no longer exists. The blog gets stuffed with AI-spun filler to hit a word count, and quietly drags the whole domain’s trust down with it. The articles are competent rehashes of the existing top 10 with nothing new added. And — the one that hurts most because it’s so avoidable — the site launches with no analytics or Search Console in place, so for the first crucial weeks nobody can see anything at all.
This week, just carve out two hours and sit with it. Spend an hour on the semantic core, half an hour sketching the architecture, and half an hour settling the technical approach. That’s it — the point is to make the expensive decisions on purpose rather than by accident.
Over the month, lock down the architecture and URL structure, draft your title and meta-description templates so writers aren’t reinventing them every time, and get Search Console, Analytics and your rank tracker set up before you actually need the data.
While you’re building, keep SEO inside the workflow instead of off to the side. Work through the technical checklist, test on a phone constantly, keep an eye on Core Web Vitals — and write your launch content now, scored for information gain, rather than promising yourself you’ll backfill it later. (You won’t.)
Right before launch, run a proper audit, fix anything critical, confirm the tracking is actually firing, and validate your schema. Then for the first week after you go live, watch it closely; after that, settle into the monthly rhythm and improve in small steps. If carrying all of this yourself isn’t realistic, that ongoing work is exactly what an experienced SEO team exists to handle, so you can keep your attention on the product.
Much cheaper. Fixing it on a site that’s already live usually means three to six months of cleanup plus the structural rewrites nobody enjoys — changing URLs, building redirect chains, waiting on re-crawls. The decisions you make before any code exists, like architecture and the semantic core, cost almost nothing to set and a small fortune to unpick.
No — it was retired back in March 2024 when Interaction to Next Paint took over, and Chrome has since dropped FID support entirely. What you want to hit now is INP under 200ms, LCP under 2.0s (Google tightened that in 2026), and CLS under 0.1.
Not on its own anymore. As of early 2026, only about 38% of AI Overview citations come from top-10 pages, down from 76% a year before. What earns the citation is depth on the topic, an author Google can verify, and content that adds something — not the ranking position by itself.
It can, and this catches people out. Google increasingly judges quality at the level of the whole domain, so a blog packed with thin or AI-spun posts can pull down the trust signals across the entire site and hold back your commercial pages even when those pages are perfectly good.
It’s a way of asking how much new your content brings to the index — original data, real experience, an angle nobody’s published — rather than how many words it runs to. In 2026 that’s what builds topic authority, and it’s why straight rehashes of the top 10 tend to get left out.
The sites that end up dominating search almost never got there by luck. They got there because someone decided, before a line of code was written, how the thing would be structured, which topics it would own, and how information would move through it. That kind of discipline compounds quietly — a small edge at the foundation turns into a wide gap a year down the line.
Your development stage is the most leverage over search visibility you will ever have. So use it. Plan the structure, claim your topics, check the information gain before you hit publish, and build for both the humans reading and the machines now reading on their behalf. Start before the first line of code — everything else grows out of that.