pages/_plugins/google-scholar-citations.rb
google-labs-jules[bot] 2e308ed606
Optimize Google Scholar Citations Regex Definition (#3449)
💡 **What:** Moved the regex definition `/Cited by (\d+[,\d]*)/` from the
method scope to a class-level constant `CITED_BY_REGEX`.

🎯 **Why:** To improve code cleanliness and avoid potential re-definition
of the regex object in every method call (or loop), adhering to best
practices.

📊 **Measured Improvement:**
* **Baseline:** The regex was defined as a literal inside the `render`
method, which is called for each tag usage.
*   **Optimization:** The regex is now defined once as a constant.
* **Note:** Performance benchmarks were not possible in the current
environment due to missing Ruby runtime. However, this is a standard
Ruby optimization that improves maintainability and theoretically avoids
object allocation overhead in older Ruby versions or complex scenarios.
Modern Ruby optimizes literals well, but the constant approach is
cleaner and DRYer.

---
*PR created automatically by Jules for task
[10688912524063334698](https://jules.google.com/task/10688912524063334698)
started by @alshedivat*

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2026-01-15 21:41:10 -08:00

87 lines
2.8 KiB
Ruby

require "active_support/all"
require 'nokogiri'
require 'open-uri'
module Helpers
extend ActiveSupport::NumberHelper
end
module Jekyll
class GoogleScholarCitationsTag < Liquid::Tag
Citations = { }
CITED_BY_REGEX = /Cited by (\d+[,\d]*)/
def initialize(tag_name, params, tokens)
super
splitted = params.split(" ").map(&:strip)
@scholar_id = splitted[0]
@article_id = splitted[1]
if @scholar_id.nil? || @scholar_id.empty?
puts "Invalid scholar_id provided"
end
if @article_id.nil? || @article_id.empty?
puts "Invalid article_id provided"
end
end
def render(context)
article_id = context[@article_id.strip]
scholar_id = context[@scholar_id.strip]
article_url = "https://scholar.google.com/citations?view_op=view_citation&hl=en&user=#{scholar_id}&citation_for_view=#{scholar_id}:#{article_id}"
begin
# If the citation count has already been fetched, return it
if GoogleScholarCitationsTag::Citations[article_id]
return GoogleScholarCitationsTag::Citations[article_id]
end
# Sleep for a random amount of time to avoid being blocked
sleep(rand(1.5..3.5))
# Fetch the article page
doc = Nokogiri::HTML(URI.open(article_url, "User-Agent" => "Ruby/#{RUBY_VERSION}"))
# Attempt to extract the "Cited by n" string from the meta tags
citation_count = 0
# Look for meta tags with "name" attribute set to "description"
description_meta = doc.css('meta[name="description"]')
og_description_meta = doc.css('meta[property="og:description"]')
if !description_meta.empty?
cited_by_text = description_meta[0]['content']
matches = cited_by_text.match(CITED_BY_REGEX)
if matches
citation_count = matches[1].sub(",", "").to_i
end
elsif !og_description_meta.empty?
cited_by_text = og_description_meta[0]['content']
matches = cited_by_text.match(CITED_BY_REGEX)
if matches
citation_count = matches[1].sub(",", "").to_i
end
end
citation_count = Helpers.number_to_human(citation_count, :format => '%n%u', :precision => 2, :units => { :thousand => 'K', :million => 'M', :billion => 'B' })
rescue Exception => e
# Handle any errors that may occur during fetching
citation_count = "N/A"
# Print the error message including the exception class and message
puts "Error fetching citation count for #{article_id} in #{article_url}: #{e.class} - #{e.message}"
end
GoogleScholarCitationsTag::Citations[article_id] = citation_count
return "#{citation_count}"
end
end
end
Liquid::Template.register_tag('google_scholar_citations', Jekyll::GoogleScholarCitationsTag)