Niclas Nilsson home

Scraping holidays

Calendar problems again. Maybe I’m a bad at searching but Twitter didn’t help either. In any case, I couldn’t seem to find a decent iCal calendar with swedish holidays and observances for 2009. Many of them missed things like Midsummer Eve, which is pretty important to an average swede, and non of them contained what I wanted. timeanddate.com does and it can be configures to include different levels of details for your specific country, but they don’t provide iCal versions. I of course quickly became bored looking through broken calendars and programming is fun, it’s was time for some scraping.

gem install hpricot (a great html parser), and off we go.

Replace the months with whatever they’re called in your language, and replace the url with your configured url at timeanddate.com (and mind the year placeholder in the url if you want to get several years):

require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'icalendar'
require 'date'
require 'active_support'
include Icalendar

cal = Calendar.new

months = %w[jan feb mar apr maj jun jul aug sep okt nov dec]

# Create a map { "jan" => 1, ... }
months = months.zip((1..12).to_a).flatten
months = Hash[*months]

is_date = lambda { |line| line =~ /\d+\s.*/ }
to_text = lambda { |e| e.to_plain_text }

(2009..2015).each do |year|
  url = "http://timeanddate.com/calendar/custom.html?year=#{year}&country=21&lang=sv&hol=825&moon=on"

  doc = Hpricot(open url)

  # scrape the dates and descriptions
  dates, descs = (doc/"td.smtop").
    map(&to_text).
    partition(&is_date)

  # for each date and description pair...
  dates.zip(descs).each do |date, desc|
    day, month = date.split
    month = months[month]
    date = Date.new(year.to_i, month.to_i, day.to_i)

    cal.event do
      dtstart       date
      dtend         date + 1
      summary       desc
    end
  end
end

puts cal.to_ical

and run it with

ruby scaping-holidays.rb > holidays.ics

Here is my file with swedish holidays 2009-2015 and you import it in the same way as the previous post on weeks in ical.

I wish you nice future holidays!