pages/SEO.md
George 25b758805c
Added copilot instructions, AGENTS.md, improved README files (#3486)
This pull request introduces several documentation improvements and adds
comprehensive Copilot and agent instruction files to the al-folio
repository. The most significant changes are the addition of
repository-wide and path-specific Copilot instructions, updates to agent
documentation to reference these instructions, and improvements to the
documentation structure and clarity regarding file purposes and
workflows.

**Copilot and Agent Instruction Enhancements:**

- Added a new `.github/copilot-instructions.md` file providing detailed,
repository-wide setup, build, CI/CD, troubleshooting, and file format
guidance for Copilot coding agents.
- Introduced `.github/instructions/bibtex-bibliography.instructions.md`
with specific instructions for editing and validating BibTeX files,
including custom keywords, formatting rules, and integration with
Jekyll-Scholar.
- Updated agent documentation files (`customize.agent.md`,
`docs.agent.md`) to reference the new Copilot instruction files and
explain their purpose and usage for both repository-wide and
path-specific scenarios.
[[1]](diffhunk://#diff-15864f2655921f50a97689076e3b8feba0da320463750845be6a76eb2e30bfe4L57-R65)
[[2]](diffhunk://#diff-15864f2655921f50a97689076e3b8feba0da320463750845be6a76eb2e30bfe4L107-R122)
[[3]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L49-R51)
[[4]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L106-R137)

**Documentation Structure and Clarity Improvements:**

- Clarified and reorganized the documentation file list in agent files,
removing references to deprecated or merged files (e.g.,
`MAINTENANCE.md`, `ACCESSIBILITY.md`) and updating descriptions to
reflect current usage.
[[1]](diffhunk://#diff-15864f2655921f50a97689076e3b8feba0da320463750845be6a76eb2e30bfe4L57-R65)
[[2]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L21-R21)
[[3]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L106-R137)
- Enhanced documentation on the purpose and application of each
documentation file, and added detailed explanations of Copilot
instruction files and their role in project development.

**Workflow and Validation Updates:**

- Updated references and descriptions for GitHub Actions workflows in
agent documentation to include the Copilot environment setup and clarify
pre-commit and CI/CD requirements.
[[1]](diffhunk://#diff-15864f2655921f50a97689076e3b8feba0da320463750845be6a76eb2e30bfe4L57-R65)
[[2]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L49-R51)
- Corrected references for accessibility guidance, now directing users
to `TROUBLESHOOTING.md` instead of the removed `ACCESSIBILITY.md`.

These changes collectively improve the onboarding experience for both
human contributors and AI agents, ensuring consistent adherence to
project conventions and reducing errors.

**References:**  

[[1]](diffhunk://#diff-227c2c26cb2ee0ce0f46a320fc48fbcbdf21801a57f59161b1d0861e8aad55f5R1-R253)
[[2]](diffhunk://#diff-6fd2827fb8d9c2dd6dc973572201853487ecbbd1120b00425d4f1c21dfdcf35fR1-R174)
[[3]](diffhunk://#diff-15864f2655921f50a97689076e3b8feba0da320463750845be6a76eb2e30bfe4L57-R65)
[[4]](diffhunk://#diff-15864f2655921f50a97689076e3b8feba0da320463750845be6a76eb2e30bfe4L107-R122)
[[5]](diffhunk://#diff-15864f2655921f50a97689076e3b8feba0da320463750845be6a76eb2e30bfe4L550-R562)
[[6]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L21-R21)
[[7]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L49-R51)
[[8]](diffhunk://#diff-961a46180ce568ce43c20bf7129dc5e4926a9aa4e0d7bc19926ca5ee3ff95cd0L106-R137)

---------

Signed-off-by: George Araújo <george.gcac@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-28 22:55:05 -03:00

13 KiB
Raw Blame History

SEO Best Practices Guide

This guide helps you optimize your al-folio website for search engines so your research and work are discoverable.

Overview

SEO (Search Engine Optimization) makes your website discoverable on Google, Bing, and other search engines. For academics, this means:

  • Your research becomes discoverable when people search for your work
  • Your CV/bio appears in search results
  • Your publications rank higher
  • More citations and collaborations

al-folio includes SEO basics, but you can optimize further.


Basic SEO Setup

Sitemap and Robots

al-folio auto-generates a sitemap.xml and robots.txt for you. These tell search engines what pages exist.

Verify they exist:

  • Visit https://your-site.com/sitemap.xml Should show an XML list of pages
  • Visit https://your-site.com/robots.txt Should show instructions for search engines

If they're missing:

  1. Check _config.yml has a valid url
  2. Rebuild: bundle exec jekyll build
  3. Check _site/ directory has both files

No configuration needed al-folio handles this automatically.


Site URL and Metadata

Ensure _config.yml has correct metadata:

title: Your Full Name or Site Title
description: > # Brief description (1-2 sentences)
  A description of your research and expertise.
  This appears in search results.
author: Your Name
keywords: machine learning, research, academia, etc.
url: https://your-domain.com
lang: en

All fields are important for SEO. Avoid leaving fields blank.


Enabling Open Graph (Social Media Previews)

What is Open Graph?

When someone shares your page on Twitter, Facebook, LinkedIn, etc., Open Graph controls what preview appears.

Without Open Graph:

  • Generic title
  • No image
  • Ugly preview

With Open Graph:

  • Your custom title
  • Your custom image (photo, diagram, etc.)
  • Custom description
  • Professional preview

Enable in al-folio

Open Graph is disabled by default. To enable:

  1. Edit _config.yml:

    serve_og_meta: true # Change from false to true
    og_image: /assets/img/og-image.png # Path to your image (1200x630px recommended)
    
  2. Create your OG image:

    • Size: 1200x630 pixels
    • Format: PNG or JPG
    • Content: Your name/logo + key info
    • Save to: assets/img/og-image.png
  3. Commit and deploy

  4. Test it:

Per-page OG images:

Add to the frontmatter of a blog post or page:

---
layout: post
title: My Research Paper
og_image: /assets/img/paper-diagram.png
---

Schema.org Markup

What is Schema.org?

Schema.org is structured data that tells search engines what kind of content is on your page:

  • "This is a Person" (your bio page)
  • "This is a Publication" (your paper)
  • "This is a BlogPosting" (your article)

Benefits:

  • Rich snippets in search results
  • Better knowledge graph information
  • Schema validation helps Google understand your site

Enable in al-folio

Enable in _config.yml:

serve_schema_org: true # Change from false to true

That's it! al-folio automatically marks up:

  • Author info (Person schema with name, URL, photo)
  • Blog posts (BlogPosting schema with date, title, description)
  • Publications (CreativeWork/ScholarlyArticle schema)

What Gets Marked Up

Homepage (Person):

  • Your name, photo, description
  • Links to your profiles (LinkedIn, GitHub, etc.)

Blog posts (BlogPosting):

  • Title, date, author, description
  • Content
  • Publication date and modified date

Publications (ScholarlyArticle):

  • Title, authors, abstract
  • Publication date, venue
  • URL and PDF links

Search Console Setup

Google Search Console

Google Search Console lets you monitor how your site appears in Google search results.

Setup:

  1. Go to Google Search Console
  2. Add your website:
    • Click "URL prefix"
    • Enter your site URL: https://your-domain.com
  3. Verify ownership (choose one method):
    • HTML file upload Download file, add to repository root
    • HTML tag Copy meta tag to _config.yml → redeploy
    • Google Analytics If you already use Google Analytics
    • DNS record Advanced (if you own the domain)

Add to _config.yml:

google_site_verification: YOUR_VERIFICATION_CODE

(Replace YOUR_VERIFICATION_CODE with the code from Search Console.)

Monitor in Search Console:

  • Performance Which queries bring traffic, your ranking position
  • Coverage Any indexing errors
  • Enhancements Schema.org validation
  • Sitemaps Your sitemap status

Bing Webmaster Tools

Similar to Google Search Console but for Bing search:

  1. Go to Bing Webmaster Tools
  2. Add your site
  3. Verify (usually auto-verifies if you verified Google)
  4. Add to _config.yml:
    bing_site_verification: YOUR_BING_CODE
    

Note: Bing commands are optional but recommended. Check both console dashboards regularly.


Publication Indexing

Google Scholar

Goal: Get your publications listed on Google Scholar so they show up in scholar search results.

Google Scholar auto-crawls:

  • Your website automatically (if publicly accessible)
  • Your publications page if it has proper markup
  • PDFs linked from your site

To improve Scholar indexing:

  1. Ensure BibTeX has proper format:

    @article{mykey,
      title={Your Paper Title},
      author={Your Name and Co-Author},
      journal={Journal Name},
      year={2024},
      volume={1},
      pages={1-10},
      doi={10.1234/doi}
    }
    
  2. Add PDFs to BibTeX:

    @article{mykey,
      # ... other fields ...
      pdf={my-paper.pdf}  # File at assets/pdf/my-paper.pdf
    }
    
  3. Submit to Google Scholar (optional):

  4. Wait 3-6 months Google Scholar takes time to index


DBLP (Computer Science)

If your research is computer science related:

  1. Go to DBLP
  2. Search for yourself or your papers
  3. If missing, Submit via DBLP (requires account)
  4. DBLP will verify and add your work

arXiv

If you have preprints:

  1. Go to arXiv.org
  2. Submit your preprint
  3. Once listed, arXiv automatically indexes it across search engines

Add arXiv link to BibTeX:

@article{mykey,
  # ... other fields ...
  arxiv={2024.12345}  # arXiv ID
}

Content Optimization

Page Titles and Descriptions

Every page needs a title and description. These show in search results.

In _config.yml:

title: Jane Smith - Computer Science Researcher
description: >
  Academic website of Jane Smith, focusing on machine learning and AI ethics.  

In page/post frontmatter:

---
layout: post
title: Novel Deep Learning Architecture for Climate Modeling
description: A new approach to improving climate model accuracy with deep learning
---

Checklist:

  • Title under 60 characters (so it doesn't get cut off)
  • Description 120-160 characters
  • Include your name in the site title
  • Include keywords naturally

Heading Structure

Use proper HTML heading hierarchy for both SEO and accessibility:

# H1: Main Page Title

Use one H1 per page, usually your blog post or page title

## H2: Section Heading

### H3: Subsection

### H3: Another subsection

## H2: Another Section

Benefits:

  • Search engines understand your content structure
  • Screen readers can navigate better
  • Visitors can scan your content

Image Optimization

For SEO:

  • Use descriptive filenames: neural-network-architecture.png (not img1.png)
  • Add alt text (also helps accessibility):
    ![Neural network showing three layers with training accuracy of 95%](assets/img/neural-network.png)
    

For performance:

  • Optimize image file size (use tools like TinyPNG)
  • Use modern formats (WebP instead of large JPGs)
  • Responsive images (different sizes for mobile vs desktop)

Internal Linking

Link between your own pages strategically:

See my [publication on climate AI](./publications/) or my [blog post on neural networks](/blog/2024/neural-networks/).

Benefits:

  • Search engines crawl through your links
  • Users discover more of your content
  • Distributes "authority" across your site

RSS Feed for Discovery

al-folio auto-generates an RSS feed at /feed.xml.

Why RSS matters:

  • Content aggregators pick up your posts
  • Researchers can subscribe to your updates
  • Improves discoverability

Ensure your feed works:

# In _config.yml
title: Your Site
description: Your site description
url: https://your-domain.com # MUST be complete URL

Test your feed:

  • Visit https://your-site.com/feed.xml
  • Should show XML with your recent posts
  • Try subscribing in a feed reader (Feedly, etc.)

Performance & Mobile

Search engines favor fast, mobile-friendly sites.

Check your site:

  • Use Google PageSpeed Insights
  • Enter your site URL
  • Review recommendations
  • al-folio already optimizes for performance, but you can improve further:
    • Compress images
    • Minimize CSS/JS (enabled by default)
    • Use lazy loading (already enabled)

Mobile optimization:

  • al-folio is responsive by default
  • Test on phones/tablets
  • Ensure buttons are large enough to tap
  • Check readability on small screens

SEO Checklist

Before considering your site "SEO optimized":

Basic Setup:

  • _config.yml has title, description, author, url
  • Sitemap accessible at /sitemap.xml
  • robots.txt accessible at /robots.txt
  • Mobile-friendly (test on phone)

Search Console:

  • Google Search Console linked
  • Bing Webmaster Tools linked (optional but recommended)
  • No major indexing errors
  • Sitemaps submitted

Schema/Open Graph:

Content:

  • Every page has unique title (under 60 chars)
  • Every page has description (120-160 chars)
  • Blog posts have proper dates
  • Images have descriptive alt text
  • Headings follow proper hierarchy

Publications:

  • BibTeX entries have proper format
  • PDFs linked from BibTeX
  • Submitted to Google Scholar (optional)
  • Indexed on DBLP or arXiv (if applicable)

Performance:

  • Site loads under 3 seconds (check PageSpeed)
  • No broken links (use lighthouse or similar)
  • RSS feed works (check /feed.xml)

Resources


Next Steps:

  1. Enable Open Graph and Schema.org in _config.yml
  2. Set up Google Search Console and Bing Webmaster Tools
  3. Optimize your page titles and descriptions
  4. Add alt text to images and PDFs to your BibTeX
  5. Monitor search console regularly for indexing issues

Your research will be more discoverable with these optimizations! 🔍