💡 **What:** Moved the regex definition `/Cited by (\d+[,\d]*)/` from the method scope to a class-level constant `CITED_BY_REGEX`. 🎯 **Why:** To improve code cleanliness and avoid potential re-definition of the regex object in every method call (or loop), adhering to best practices. 📊 **Measured Improvement:** * **Baseline:** The regex was defined as a literal inside the `render` method, which is called for each tag usage. * **Optimization:** The regex is now defined once as a constant. * **Note:** Performance benchmarks were not possible in the current environment due to missing Ruby runtime. However, this is a standard Ruby optimization that improves maintainability and theoretically avoids object allocation overhead in older Ruby versions or complex scenarios. Modern Ruby optimizes literals well, but the constant approach is cleaner and DRYer. --- *PR created automatically by Jules for task [10688912524063334698](https://jules.google.com/task/10688912524063334698) started by @alshedivat* Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
87 lines
2.8 KiB
Ruby
87 lines
2.8 KiB
Ruby
require "active_support/all"
|
|
require 'nokogiri'
|
|
require 'open-uri'
|
|
|
|
module Helpers
|
|
extend ActiveSupport::NumberHelper
|
|
end
|
|
|
|
module Jekyll
|
|
class GoogleScholarCitationsTag < Liquid::Tag
|
|
Citations = { }
|
|
CITED_BY_REGEX = /Cited by (\d+[,\d]*)/
|
|
|
|
def initialize(tag_name, params, tokens)
|
|
super
|
|
splitted = params.split(" ").map(&:strip)
|
|
@scholar_id = splitted[0]
|
|
@article_id = splitted[1]
|
|
|
|
if @scholar_id.nil? || @scholar_id.empty?
|
|
puts "Invalid scholar_id provided"
|
|
end
|
|
|
|
if @article_id.nil? || @article_id.empty?
|
|
puts "Invalid article_id provided"
|
|
end
|
|
end
|
|
|
|
def render(context)
|
|
article_id = context[@article_id.strip]
|
|
scholar_id = context[@scholar_id.strip]
|
|
article_url = "https://scholar.google.com/citations?view_op=view_citation&hl=en&user=#{scholar_id}&citation_for_view=#{scholar_id}:#{article_id}"
|
|
|
|
begin
|
|
# If the citation count has already been fetched, return it
|
|
if GoogleScholarCitationsTag::Citations[article_id]
|
|
return GoogleScholarCitationsTag::Citations[article_id]
|
|
end
|
|
|
|
# Sleep for a random amount of time to avoid being blocked
|
|
sleep(rand(1.5..3.5))
|
|
|
|
# Fetch the article page
|
|
doc = Nokogiri::HTML(URI.open(article_url, "User-Agent" => "Ruby/#{RUBY_VERSION}"))
|
|
|
|
# Attempt to extract the "Cited by n" string from the meta tags
|
|
citation_count = 0
|
|
|
|
# Look for meta tags with "name" attribute set to "description"
|
|
description_meta = doc.css('meta[name="description"]')
|
|
og_description_meta = doc.css('meta[property="og:description"]')
|
|
|
|
if !description_meta.empty?
|
|
cited_by_text = description_meta[0]['content']
|
|
matches = cited_by_text.match(CITED_BY_REGEX)
|
|
|
|
if matches
|
|
citation_count = matches[1].sub(",", "").to_i
|
|
end
|
|
|
|
elsif !og_description_meta.empty?
|
|
cited_by_text = og_description_meta[0]['content']
|
|
matches = cited_by_text.match(CITED_BY_REGEX)
|
|
|
|
if matches
|
|
citation_count = matches[1].sub(",", "").to_i
|
|
end
|
|
end
|
|
|
|
citation_count = Helpers.number_to_human(citation_count, :format => '%n%u', :precision => 2, :units => { :thousand => 'K', :million => 'M', :billion => 'B' })
|
|
|
|
rescue Exception => e
|
|
# Handle any errors that may occur during fetching
|
|
citation_count = "N/A"
|
|
|
|
# Print the error message including the exception class and message
|
|
puts "Error fetching citation count for #{article_id} in #{article_url}: #{e.class} - #{e.message}"
|
|
end
|
|
|
|
GoogleScholarCitationsTag::Citations[article_id] = citation_count
|
|
return "#{citation_count}"
|
|
end
|
|
end
|
|
end
|
|
|
|
Liquid::Template.register_tag('google_scholar_citations', Jekyll::GoogleScholarCitationsTag)
|