Effect OGP Integration

Partly generated by AI

Some part of this post has been generated by AI. It is reviewed by human, but the content may not reflect the actual linguistic proficiency of the author.

This document specifies the Open Graph Protocol (OGP) integration for the content pipeline.

Examples

Here's how the OGP integration works:

Purpose: Fetching OGP metadata at build time

Modern blogs are more than static pages—they are hubs that reference videos, tweets, papers and other rich resources. Showing a preview card for each external URL greatly improves user experience. Yet fetching metadata at render time is slow and unreliable.

By fetching all metadata of referenced external contents at build time and storing them locally, the application never performs additional network requests for OGP data. This will significantly improve the user experience of any frontend application.

Architecture description

1. Bird's-eye view

Markdown postsCollections Pipelineextract external linksLinkMetadataServiceMetadataCacheSQLite KVSMetadataFetcherInternetPost JSON artefacts

The system follows a layered architecture using Effect's service pattern:

  • LinkMetadataService: High-level API providing a simple get(url) method
  • MetadataCache: Caching layer with TTL-based expiration (60 days)
  • MetadataKvs: SQLite-based key-value storage
  • MetadataFetcher: HTTP client with HTML parsing capabilities

These services live only during the build. At runtime the blog simply reads the pre-fetched metadata already embedded in each post.

2. Data model

The LinkMetadata schema is defined in src/schemas/link-metadata.ts and captures essential OGP fields:

interface LinkMetadata {
canonical?: string // Canonical URL from og:url, twitter:url, or <link rel="canonical">
title?: string // From og:title, twitter:title, or <title>
description?: string // From og:description, twitter:description, or meta description
image?: string // From og:image or twitter:image (resolved to absolute URL)
imageAlt?: string // From og:image:alt
imageWidth?: number // From og:image:width (parsed as integer)
imageHeight?: number // From og:image:height (parsed as integer)
siteName?: string // From og:site_name
ogType?: OgType // From og:type
}

Supported OgType values include:

  • "article", "website", "book", "profile"
  • "video.movie", "video.episode", "video.tv_show", "video.other"

The cache stores metadata wrapped in an envelope with creation timestamp:

interface Envelope<T> {
createdAt: Date
data: T
}

The system uses OgpMetadataFromHtml transformer that parses raw HTML meta tags and provides intelligent fallbacks from Open Graph to Twitter Card metadata, ensuring robust data extraction.

3. Service interfaces

LinkMetadataService (src/dev/link-metadata/layer.ts) provides a simple API:

interface LinkMetadataService {
readonly get: (url: string) =>
Effect<Option<LinkMetadata>, KvsError | MetadataSchemaError>
}

MetadataCache (src/dev/link-metadata/cache.ts) handles persistence:

interface MetadataCache {
get: (url: string) => Effect<Option<Envelope<LinkMetadata>>, KvsError | MetadataSchemaError>
set: (url: string, metadata: LinkMetadata) => Effect<void, KvsError | MetadataSchemaError>
}

MetadataFetcher (src/dev/link-metadata/fetcher.ts) handles HTTP and parsing:

interface MetadataFetcher {
fetch: (url: string) => Effect<LinkMetadata, FetchError | ParseError>
}

The get method implements smart caching:

ClientLinkMetadataServiceMetadataCacheMetadataFetcherInternet get(url)get(url)cached datacheck TTL (60 days)fetch(url) [if stale]HTTP GETHTMLmetadataset(url, metadata)Option<LinkMetadata>

The service returns Option<LinkMetadata> and never throws. On fetch failure with stale cache, it returns the stale data. This ensures the pipeline continues even if external providers are down.

4. Storage layer

The storage uses a generic key-value store abstraction implemented with SQLite (src/dev/service-utils/sqlite-kvs.ts):

interface Kvs<V> {
get: (key: string) => Effect<Option<V>, KvsError>
set: (key: string, value: V) => Effect<void, KvsError>
has: (key: string) => Effect<boolean, KvsError>
clear: () => Effect<void, KvsError>
keys: () => Effect<string[], KvsError>
}

SQLite was chosen because:

  • Single-file, zero-config, battle-tested
  • Native JSON column support for structured queries
  • Works identically on developer laptops, CI runners, and production
  • Located at data/og.sqlite (git-ignored)

The implementation uses:

  • @effect/sql-sqlite-node for database operations
  • JSON serialization for Envelope<LinkMetadata> values
  • Effect Schema for type-safe encoding/decoding
  • Single metadata table with url (TEXT) and data (JSON) columns
  • Automatic table creation on first use

5. Fetcher implementation

The MetadataFetcher (src/dev/link-metadata/fetcher.ts) implements robust HTML metadata extraction:

HTTP handling:

  • Maximum 5 redirects
  • 10-second timeout
  • 1 MiB response size limit
  • Up to 5 retries with exponential backoff
  • Only accepts HTTP(S) URLs
  • Uses @effect/platform HTTP client

HTML parsing with html-rewriter-wasm:

  1. Extract Open Graph tags (og:*)
  2. Fall back to Twitter Card tags (twitter:*)
  3. Extract <title> element and meta description
  4. Extract canonical URL from <link rel="canonical">
  5. Resolve all relative URLs to absolute URLs using base URL
  6. Parse numeric values (image dimensions) with proper validation

Error handling:

type FetchError
= | { _tag: "InvalidUrl" }
| { _tag: "NetworkError", error: unknown }
| { _tag: "Timeout" }
| { _tag: "TooLarge" }

URL resolution:

  • Uses URL constructor for robust relative-to-absolute URL conversion
  • Handles edge cases like malformed URLs gracefully
  • Validates image URLs before including in metadata

All errors are typed, allowing the pipeline to gracefully handle failures without crashing the build.

6. Integration with Remark

The remarkLink plugin (src/dev/unified/remarkLink.ts) integrates OGP fetching into the Markdown pipeline:

Processing flow:

  1. Collect all link directives and definition nodes during AST traversal (see examples below)
    • Text directives: Create external links with styling
    • Definition nodes: Insert link cards after the original definition (preserving the definition)
  2. Parse URLs using ExternalUrlParser to classify YouTube vs generic links
  3. Batch fetch metadata with concurrency limit of 5
  4. Transform directives based on URL type and directive type:
    • YouTube URLs: Generate embedded iframe players with proper dimensions
    • Generic URLs: Create rich link cards with OGP metadata

Directive examples:

::link[https://react-spectrum.adobe.com/react-aria/components.html]
::link[https://github.com/anthropics/claude-code]

Definition examples:

Check out [React Aria][react-aria] and [Claude Code][claude-code] for more
information.
[react-aria]: https://react-spectrum.adobe.com/react-aria/components.html
[claude-code]: https://github.com/anthropics/claude-code

URL classification (src/schemas/external-url.ts):

  • YouTube detection: Supports youtube.com/watch, youtu.be, youtube.com/embed, m.youtube.com formats
  • Generic fallback: All non-YouTube URLs treated as generic external sources

Link card generation:

  • YouTube URLs → Embedded iframe with width="560" height="315"
  • Generic URLs → Rich preview cards with title, description, image, and site name
  • Failed metadata fetches → Graceful fallback to simple external links
  • All external links get target="_blank" rel="noopener noreferrer"

Definition node processing:

  • Definition nodes ([label]: url) are detected during AST traversal
  • URLs are validated against ExternalUrlParser (YouTube or generic external sources)
  • Original definition nodes are preserved in the markdown output
  • Rich link cards are inserted as new nodes immediately after each definition
  • This allows both traditional reference-style links and rich previews to coexist
  • Processing happens in reverse order to maintain correct node indices during insertion

Integration points:

  • src/dev/post-pipeline.ts:104 - remarkLink plugin with OGP runtime
  • src/dev/collections-pipeline.ts:163-170 - Layer setup with SQLite and HTTP clients

The plugin runs only at build time. All metadata is embedded in the post JSON, eliminating runtime network requests.

7. Deployment & caching strategy

Local development:

  • Database location: data/og.sqlite (git-ignored)
  • Persistent across dev server restarts
  • 60-day TTL minimizes redundant fetches
  • Automatic database creation on first link processing

CI/CD (GitHub Actions):

  • Cache database with actions/cache using data/og.sqlite as cache key
  • TTL ensures most builds use cached data
  • Reduces external API rate limit risks
  • Graceful fallback to stale data if external sites are down

Production (Deno Deploy):

  • Database copied to build output during pnpm build
  • Read-only access at runtime
  • No runtime fetching or writes
  • All metadata pre-embedded in post JSON files

Credits

The initial idea was drafted with Gemini 2.5 Flash and then refined with OpenAI o3. After completing the initial implementation, this document has been updated accordingly.