Keyword and Content Checks

Updated 10 hours ago

A web server can return HTTP 200 OK while displaying an error message, a blank page, or completely wrong content. The status code is the server's report card for itself: "I received a request and sent a response." It says nothing about whether that response was correct, useful, or even contained words.

Keyword and content checks examine what actually came back.

The Lie of 200 OK

Many failure scenarios return 200 OK:

Application errors: Your app catches an exception and renders an error page. Users see "Database connection failed." The server sees a successful response—it sent HTML, job done.

Empty responses: A bug returns nothing. Zero bytes. The HTTP transaction completed successfully. Users see white.

Wrong content: A configuration error serves the homepage when users request checkout. Some page loaded, so the server reports success.

Partial failures: The HTML loads but JavaScript crashes, the shopping cart widget fails to render, or database-driven content appears as "Loading..." forever.

A 200 OK is the server saying "I did my job." It says nothing about whether the job was the right one.

Checking for Presence

The simplest validation: does specific text appear in the response?

Critical identifiers: Your company name, logo alt text, or unique page titles. If a misconfiguration serves Apache's default page or someone else's site entirely, this catches it.

Dynamic content markers: A product page should contain a price. A dashboard should show user data. A blog should display dates that aren't months old. These prove the dynamic parts are working.

Functionality indicators: "Add to Cart" buttons on e-commerce pages, result counts on search pages, form fields on checkout pages. Their presence proves features rendered.

The check passes only if the keyword appears. Simple, but it catches the scenarios where technically-successful responses contain nothing useful.

Checking for Absence

Equally powerful: verify certain text doesn't appear.

Error messages: "500 Internal Server Error", "Database connection failed", "Exception occurred", "undefined is not a function." Many applications wrap errors in 200 OK responses.

Debug leakage: Stack traces, internal file paths, SQL queries, environment variables. If debug mode accidentally activates in production, absence checks catch it.

Incomplete deployments: "Lorem ipsum", "TODO", "[PLACEHOLDER]", test data that shouldn't reach production.

The check fails if forbidden text appears anywhere. Negative validation.

Pattern Matching

Sometimes exact strings aren't enough:

Format validation: Dates should look like dates. Prices should have currency symbols. Email addresses should contain @. Regular expressions verify structure without hardcoding specific values.

Boundary conditions: Product counts should be positive. Percentages should be 0-100. Timestamps should be recent, not from 1970.

Multiple variants: A localized site might accept "Welcome" OR "Bienvenue" OR "Willkommen"—any one proves localization is working.

Regular expressions are powerful but demanding. Complex patterns have edge cases. Test them carefully.

Response Size as Signal

Content length reveals problems that keywords might miss:

Suspiciously small: Your homepage normally returns 50KB. Today it returns 500 bytes. Something broke, even though the status code looks fine.

Suspiciously large: Responses suddenly 10x larger than normal. A hack injecting spam links, a broken template dumping debug output, an error loop generating massive stack traces.

Define expected ranges: Homepage should be 45-55KB. Checkout should be 30-40KB. Responses outside these ranges warrant investigation even when content keywords match.

Size validation catches subtle problems without complex pattern matching.

Dynamic Content Challenges

Time-sensitive content: News sites and social feeds change constantly. Check that dates are recent (today, yesterday) rather than checking for specific headlines.

Personalized pages: Content varies by user. Verify personalization indicators ("Welcome back" appearing somewhere) rather than specific names.

Search results: Search for a term you know matches. Verify results appear and counts are positive. The specific results don't matter—functionality does.

Real-time data: Dashboards with live data should show data and recent timestamps. The values change; their presence and freshness indicate health.

API Response Validation

APIs need their own checks:

Schema validation: Required fields present, correct types, expected structure.

Empty array detection: An API claiming to return products shouldn't return [] when you have products.

Error field checking: Many APIs return 200 OK with {"error": "something broke"}. Check for "error": null or absent error objects.

Consistency checks: If one endpoint says you have 100 products and another returns an empty list, something's wrong regardless of status codes.

Avoiding False Alarms

Too specific: Checking for "Posted on January 15, 2025" fails tomorrow. Check for "Posted on" to verify the element exists.

Encoding mismatches: UTF-8 vs Latin-1 causes matching failures. Ensure your monitoring handles encoding correctly.

Whitespace sensitivity: Allow for extra spaces, tabs, newlines in patterns.

Case handling: Decide upfront whether "ERROR" and "error" should both trigger alerts.

Performance Considerations

Content checks cost more than status code checks:

Downloading full responses generates more traffic
Pattern matching consumes CPU
Complex regex on large pages can be slow

A reasonable strategy: check status codes every 30 seconds, content every 5 minutes. Comprehensive validation for critical pages, lighter checks elsewhere.

Combining Approaches

Effective monitoring layers multiple techniques:

Basic availability: TCP port open, HTTP status successful
Presence checks: Critical content appears
Absence checks: Error messages don't appear
Size validation: Response within expected range
Pattern matching: Structured data validates correctly

Each layer catches failures the others miss. Status codes catch server crashes. Content checks catch application failures. Size validation catches subtle corruption. Together they verify what users actually experience—not just what the server claims to have done.

Frequently Asked Questions About Keyword and Content Checks

Was this page helpful?

😔

🤨

😃