Skip to content

Add Markdown footnote support to from_markdown#56

Open
MattFisher wants to merge 3 commits into
ma2za:mainfrom
MattFisher:add-footnote-support
Open

Add Markdown footnote support to from_markdown#56
MattFisher wants to merge 3 commits into
ma2za:mainfrom
MattFisher:add-footnote-support

Conversation

@MattFisher

Copy link
Copy Markdown

Parse standard Markdown footnotes (text[^label] references and [^label]: definition lines) into Substack's footnoteAnchor inline nodes and footnote blocks.

Footnotes are numbered by order of first reference and labels may be numeric or named.

Also adds Post.footnote_anchor() and Post.footnote() helpers for building footnotes manually, plus tests.

Parse standard Markdown footnotes (`text[^label]` references and
`[^label]: definition` lines) into Substack's footnoteAnchor inline nodes
and footnote blocks. Footnotes are numbered by order of first reference and
labels may be numeric or named. Also adds Post.footnote_anchor() and
Post.footnote() helpers for building footnotes manually, plus tests.
@ma2za

ma2za commented Jun 18, 2026

Copy link
Copy Markdown
Owner

I found a regression in the footnote pass: it runs before/after Markdown parsing at document scope, so footnote-like text inside code can be removed or rewritten.

Could you add regression coverage for these cases before merge?

def test_footnote_definition_inside_fenced_code_stays_code():
    post = make_post()
    post.from_markdown("```\n[^1]: not a footnote\n```")
    content = body_content(post)
    assert len(content) == 1
    assert content[0]["type"] == "codeBlock"
    assert content[0]["content"][0]["text"] == "[^1]: not a footnote"


def test_footnote_reference_inside_fenced_code_stays_text():
    post = make_post()
    post.from_markdown("```\ncode [^1]\n```\n\n[^1]: note")
    content = body_content(post)
    assert content[0]["type"] == "codeBlock"
    assert content[0]["content"][0]["text"] == "code [^1]"


def test_footnote_reference_inside_inline_code_stays_text():
    post = make_post()
    post.from_markdown("`code [^1]`\n\n[^1]: note")
    content = body_content(post)
    assert content[0]["type"] == "paragraph"
    assert content[0]["content"][0]["text"] == "code [^1]"
    assert content[0]["content"][0]["marks"] == [{"type": "code"}]

The first test currently fails on this branch with content[0]["type"] == "footnote" instead of codeBlock. The fix should keep footnote extraction out of fenced code blocks and skip injecting anchors into inline code-marked text nodes.

Footnote definition extraction now skips fenced code blocks, and anchor
injection skips codeBlock nodes and text marked as inline code. This fixes
footnote-like text inside code being removed or rewritten. Adds regression
tests for fenced and inline code cases.
@MattFisher

Copy link
Copy Markdown
Author

Good catch, thanks. Fixed in 698e098:

  • Footnote definition extraction now tracks fenced code blocks and skips them, so a [^1]: ... line inside ... stays in the code block.
  • Anchor injection now skips codeBlock nodes entirely and any text node carrying an inline code mark, so [^1] references inside code are left as literal text.

All three regression tests you provided are included and pass (full post suite green: 66 passed).

Footnote definitions can now contain multiple paragraphs (a blank line
followed by an indented block); previously a second paragraph leaked into
the post body and only the first was kept. Extraction preserves paragraph
breaks and Post.footnote() splits blank-line-separated content into
multiple paragraph nodes (verified accepted/rendered by Substack). Adds
regression tests.
@MattFisher

MattFisher commented Jun 26, 2026

Copy link
Copy Markdown
Author

Heads-up: as footnote handling has grown here (fenced/inline-code edge cases, multi-paragraph definitions), this is starting to feel a bit complex, like we're edging toward re-implementing a Markdown parser.

So I prototyped an alternative approach in #61 that uses markdown-it-py (a well-established CommonMark parser) plus its footnote plugin. It ends up significantly simpler: from_markdown() shrinks dramatically and footnote handling largely comes for free from the plugin. It does add two dependencies and makes a couple of (CommonMark-correct) behaviour changes, which I've described there.

Wasn't planned up front, so purely for your consideration, but it might be worth merging #61 instead of this one and getting the footnote handling for free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants