fixed indexing of external posts (#2983)
This should fix several issues with indexing external posts, including #1828. In short, I found that the issue with indexing was that the index builder was receiving 'empty' documents. To fix that, I'm setting the document content to be the post content as retrieved from the rss feed or the text extracted from the external page. I've tested with various blog sources and it seems to be working as expected now.
This commit is contained in:
parent
15fc779e7e
commit
b50db2e713
@ -62,6 +62,7 @@ module ExternalPosts
|
||||
doc.data['description'] = content[:summary]
|
||||
doc.data['date'] = content[:published]
|
||||
doc.data['redirect'] = url
|
||||
doc.content = content[:content]
|
||||
site.collections['posts'].docs << doc
|
||||
end
|
||||
|
||||
@ -90,8 +91,12 @@ module ExternalPosts
|
||||
parsed_html = Nokogiri::HTML(html)
|
||||
|
||||
title = parsed_html.at('head title')&.text.strip || ''
|
||||
description = parsed_html.at('head meta[name="description"]')&.attr('content') || ''
|
||||
body_content = parsed_html.at('body')&.inner_html || ''
|
||||
description = parsed_html.at('head meta[name="description"]')&.attr('content')
|
||||
description ||= parsed_html.at('head meta[name="og:description"]')&.attr('content')
|
||||
description ||= parsed_html.at('head meta[property="og:description"]')&.attr('content')
|
||||
|
||||
body_content = parsed_html.search('p').map { |e| e.text }
|
||||
body_content = body_content.join() || ''
|
||||
|
||||
{
|
||||
title: title,
|
||||
|
||||
Loading…
Reference in New Issue
Block a user