fixed indexing of external posts (#2983)

This should fix several issues with indexing external posts, including
#1828.

In short, I found that the issue with indexing was that the index
builder was receiving 'empty' documents. To fix that, I'm setting the
document content to be the post content as retrieved from the rss feed
or the text extracted from the external page.

I've tested with various blog sources and it seems to be working as
expected now.
This commit is contained in:
Juan Carlos Niebles 2025-01-26 18:35:13 -08:00 committed by GitHub
parent 15fc779e7e
commit b50db2e713
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -62,6 +62,7 @@ module ExternalPosts
doc.data['description'] = content[:summary]
doc.data['date'] = content[:published]
doc.data['redirect'] = url
doc.content = content[:content]
site.collections['posts'].docs << doc
end
@ -90,8 +91,12 @@ module ExternalPosts
parsed_html = Nokogiri::HTML(html)
title = parsed_html.at('head title')&.text.strip || ''
description = parsed_html.at('head meta[name="description"]')&.attr('content') || ''
body_content = parsed_html.at('body')&.inner_html || ''
description = parsed_html.at('head meta[name="description"]')&.attr('content')
description ||= parsed_html.at('head meta[name="og:description"]')&.attr('content')
description ||= parsed_html.at('head meta[property="og:description"]')&.attr('content')
body_content = parsed_html.search('p').map { |e| e.text }
body_content = body_content.join() || ''
{
title: title,