Effect OGP Integration

Partly generated by AI

Some part of this post has been generated by AI. It is reviewed by human, but the content may not reflect the actual linguistic proficiency of the author.

This document specifies the Open Graph Protocol (OGP) integration for the content pipeline.

Examples

Here's how the OGP integration works:

react-spectrum.adobe.com

React Aria Components

Craft world-class accessible components with custom styles.

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...

github.com

GitHub - anthropics/claude-code: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...

Purpose: Fetching OGP metadata at build time

Modern blogs are more than static pages—they are hubs that reference videos, tweets, papers and other rich resources. Showing a preview card for each external URL greatly improves user experience. Yet fetching metadata at render time is slow and unreliable.

By fetching all metadata of referenced external contents at build time and storing them locally, the application never performs additional network requests for OGP data. This will significantly improve the user experience of any frontend application.

Architecture description

1. Bird's-eye view

Diagram

Code (d2)

The system follows a layered architecture using Effect's service pattern:

LinkMetadataService: High-level API providing a simple get(url) method
MetadataCache: Caching layer with TTL-based expiration (60 days)
MetadataKvs: SQLite-based key-value storage
MetadataFetcher: HTTP client with HTML parsing capabilities

These services live only during the build. At runtime the blog simply reads the pre-fetched metadata already embedded in each post.

2. Data model

The LinkMetadata schema is defined in src/schemas/link-metadata.ts and captures essential OGP fields:

interface LinkMetadata {
  canonical?: string // Canonical URL from og:url, twitter:url, or <link rel="canonical">
  title?: string // From og:title, twitter:title, or <title>
  description?: string // From og:description, twitter:description, or meta description
  image?: string // From og:image or twitter:image (resolved to absolute URL)
  imageAlt?: string // From og:image:alt
  imageWidth?: number // From og:image:width (parsed as integer)
  imageHeight?: number // From og:image:height (parsed as integer)
  siteName?: string // From og:site_name
  ogType?: OgType // From og:type
}

Supported OgType values include:

"article", "website", "book", "profile"
"video.movie", "video.episode", "video.tv_show", "video.other"

The cache stores metadata wrapped in an envelope with creation timestamp:

interface Envelope<T> {
  createdAt: Date
  data: T
}

The system uses OgpMetadataFromHtml transformer that parses raw HTML meta tags and provides intelligent fallbacks from Open Graph to Twitter Card metadata, ensuring robust data extraction.

3. Service interfaces

LinkMetadataService (src/dev/link-metadata/layer.ts) provides a simple API:

interface LinkMetadataService {
  readonly get: (url: string) =>
  Effect<Option<LinkMetadata>, KvsError | MetadataSchemaError>
}

MetadataCache (src/dev/link-metadata/cache.ts) handles persistence:

interface MetadataCache {
  get: (url: string) => Effect<Option<Envelope<LinkMetadata>>, KvsError | MetadataSchemaError>
  set: (url: string, metadata: LinkMetadata) => Effect<void, KvsError | MetadataSchemaError>
}

MetadataFetcher (src/dev/link-metadata/fetcher.ts) handles HTTP and parsing:

interface MetadataFetcher {
  fetch: (url: string) => Effect<LinkMetadata, FetchError | ParseError>
}

The get method implements smart caching:

Diagram

Code (d2)

The service returns Option<LinkMetadata> and never throws. On fetch failure with stale cache, it returns the stale data. This ensures the pipeline continues even if external providers are down.

4. Storage layer

The storage uses a generic key-value store abstraction implemented with SQLite (src/dev/service-utils/sqlite-kvs.ts):

interface Kvs<V> {
  get: (key: string) => Effect<Option<V>, KvsError>
  set: (key: string, value: V) => Effect<void, KvsError>
  has: (key: string) => Effect<boolean, KvsError>
  clear: () => Effect<void, KvsError>
  keys: () => Effect<string[], KvsError>
}

SQLite was chosen because:

Single-file, zero-config, battle-tested
Native JSON column support for structured queries
Works identically on developer laptops, CI runners, and production
Located at data/og.sqlite (git-ignored)

The implementation uses:

@effect/sql-sqlite-node for database operations
JSON serialization for Envelope<LinkMetadata> values
Effect Schema for type-safe encoding/decoding
Single metadata table with url (TEXT) and data (JSON) columns
Automatic table creation on first use

5. Fetcher implementation

The MetadataFetcher (src/dev/link-metadata/fetcher.ts) implements robust HTML metadata extraction:

HTTP handling:

Maximum 5 redirects
10-second timeout
1 MiB response size limit
Up to 5 retries with exponential backoff
Only accepts HTTP(S) URLs
Uses @effect/platform HTTP client

HTML parsing with html-rewriter-wasm:

Extract Open Graph tags (og:*)
Fall back to Twitter Card tags (twitter:*)
Extract <title> element and meta description
Extract canonical URL from <link rel="canonical">
Resolve all relative URLs to absolute URLs using base URL
Parse numeric values (image dimensions) with proper validation

Error handling:

type FetchError
  = | { _tag: "InvalidUrl" }
    | { _tag: "NetworkError", error: unknown }
    | { _tag: "Timeout" }
    | { _tag: "TooLarge" }

URL resolution:

Uses URL constructor for robust relative-to-absolute URL conversion
Handles edge cases like malformed URLs gracefully
Validates image URLs before including in metadata

All errors are typed, allowing the pipeline to gracefully handle failures without crashing the build.

6. Integration with Remark

The remarkLink plugin (src/dev/unified/remarkLink.ts) integrates OGP fetching into the Markdown pipeline:

Processing flow:

Collect all link directives and definition nodes during AST traversal (see examples below)
- Text directives: Create external links with styling
- Definition nodes: Insert link cards after the original definition (preserving the definition)
Parse URLs using ExternalUrlParser to classify YouTube vs generic links
Batch fetch metadata with concurrency limit of 5
Transform directives based on URL type and directive type:
- YouTube URLs: Generate embedded iframe players with proper dimensions
- Generic URLs: Create rich link cards with OGP metadata

Directive examples:

::link[https://react-spectrum.adobe.com/react-aria/components.html]

::link[https://github.com/anthropics/claude-code]

Definition examples:

Check out [React Aria][react-aria] and [Claude Code][claude-code] for more
information.

[react-aria]: https://react-spectrum.adobe.com/react-aria/components.html
[claude-code]: https://github.com/anthropics/claude-code

URL classification (src/schemas/external-url.ts):

YouTube detection: Supports youtube.com/watch, youtu.be, youtube.com/embed, m.youtube.com formats
Generic fallback: All non-YouTube URLs treated as generic external sources

Link card generation:

YouTube URLs → Embedded iframe with width="560" height="315"
Generic URLs → Rich preview cards with title, description, image, and site name
Failed metadata fetches → Graceful fallback to simple external links
All external links get target="_blank" rel="noopener noreferrer"

Definition node processing:

Definition nodes ([label]: url) are detected during AST traversal
URLs are validated against ExternalUrlParser (YouTube or generic external sources)
Original definition nodes are preserved in the markdown output
Rich link cards are inserted as new nodes immediately after each definition
This allows both traditional reference-style links and rich previews to coexist
Processing happens in reverse order to maintain correct node indices during insertion

Integration points:

src/dev/post-pipeline.ts:104 - remarkLink plugin with OGP runtime
src/dev/collections-pipeline.ts:163-170 - Layer setup with SQLite and HTTP clients

The plugin runs only at build time. All metadata is embedded in the post JSON, eliminating runtime network requests.

7. Deployment & caching strategy

Local development:

Database location: data/og.sqlite (git-ignored)
Persistent across dev server restarts
60-day TTL minimizes redundant fetches
Automatic database creation on first link processing

CI/CD (GitHub Actions):

Cache database with actions/cache using data/og.sqlite as cache key
TTL ensures most builds use cached data
Reduces external API rate limit risks
Graceful fallback to stale data if external sites are down

Production (Deno Deploy):

Database copied to build output during pnpm build
Read-only access at runtime
No runtime fetching or writes
All metadata pre-embedded in post JSON files

Credits

The initial idea was drafted with Gemini 2.5 Flash and then refined with OpenAI o3. After completing the initial implementation, this document has been updated accordingly.