BlogHer’s getting ready for the Monday launch of a new surfing guide. As one of two contributing editors in the Technology & Web domain, I’ll be covering women blogging about tech.
misbehaving.net has one of the best blogrolls around listing women tech bloggers. I figure it’s a great place to start in finding the targets of my attention. I wrote a little Python program that converts the blogroll into an OPML file. It’s conceptually easy to do that, but there were a few steps to get to a useable OPML file:
- Take the misbehaving.net blogroll, which involves a JavaScript call to a PHP BlogRolling function, and make a PHP page that gets the blogroll into the displayed HTML source rather than loading the blogroll items after the browser already has the page. That way I could grab the HTML source and feed it into my program. I can imagine a JavaScript function that goes through the DOM looking for blogroll-like items and feeds them into my Python back-end… but that’s for the future.
- Parse the HTML fragment, grab the name and URL of each blog, and then use Mark Pilgrim’s ultra-liberal feed finder to find a feed–any feed–for each blog. My program just uses the first one found. Related note: if you want to direct all your subscribers to a certain feed, like your Feedburner one, you need to take an active role in making it happen. Rachel at cre8d design tells how.
- Make an OPML file out of the results. That’s easy. OPML is so simple and yet so powerful. I think I’m going to start dreaming in outlines.
- Validate the file.
My OPML file validated after a few tweaks. Woo hoo! So I should be able to just go load it into Bloglines or any other OPML-aware feed reader right? Not so fast. I was able to get Bloglines to load some toy files but not the file containing the 100+ misbehaving blogs for which I found feeds. Instead, it made a new unnamed empty folder for each feed, leaving me to tediously check them off one by one for deletion. Happily, Nick Bradbury’s Feed Demon and Dave Winer’s OPML Editor both read my file with no complaints. And Feed Demon even exported it into a format that Bloglines was willing to read.
I’d like to make this file available as a reading list except (1) I have no idea how to make it a reading list besides sticking it at a URL and regularly updating it, (2) I have no good way of making regular updates to it, and (3) I want to know why Bloglines didn’t like it. Nick Bradbury has some ideas about where aggregators fall down in trying to import OPML so I can take a look at that. But more importantly, I’m not sure what reading list I want to make available and to what end. Do I want to publish a reading list containing every woman who’s ever written about technology? That’d be ridiculous and it’s no way to address information overload. I need to give this project a bunch more thought. I took Britt Parrott’s advice to start a new project in the middle [via Signal vs. Noise], and jumped right into the refreshing OPML waters. I knew I needed a way of following as many women tech bloggers as I could track down and this was a great place to start.
What I’m thinking is that a huge vacuum exists in the area of useful blog directories and reading lists. Top Ten Sources has a good idea in using human editors to find the best sources in various areas, but I think their business model is slightly off–they’ve chosen an organization-centric approach rather than plugging into the power of the people. More on that later. [Update: here’s my critique of Top Ten Sources.]
UPDATE: I didn’t initially make the file available because I just wanted to talk about it. But why not offer the Feed Demon exported XML file. Here it is. Go nuts. It has a Feed Demon beta feed in it just because I happened to be subscribed to it at the time I did the export. It succesfully imported into Bloglines for me. I didn’t check this one for OPML validity or try to import it into anything else.
UPDATE: In response, Dave Winer says, “A reading list is more than a compendium, I think it’s got to be curated the way an art director chooses paintings for an art exhibit.” Yes indeed. This file is a directory, not a reading list. But I hope it will form the basis for my own women-tech-bloggers-of-note reading list to be published later. And it certainly gives me a lot of content to consider for BlogHer’s surfing guide.

14 Comments
You have a lot of links in this posting, so I apologize if I missed the appropriate one, but can you provide the finished OPML file? I’d like to import it into Google Reader.
You didn’t read my post wrong… I didn’t actually link to the file. I just wanted to talk about it, but why not put it out there. So I updated the blog post to include a link to the OPML file.
It went into Newsgator perfectly! That’s so awesome!!
Marshall, great! I don’t think it’s all that useful right now… it’s a huge number of feeds to suddenly start paying attention to. Still, interesting as a beginning exploration.
BlogBridge may be worth a look. It’s an aggregator with (open source, free, cross platform - I work on it) pretty comprehensive OPML and Reading List support. You can both subscribe to whole reading lists and as of yesterday, you can take a collection of feeds you like and with a single check in a box, publish it out as OPML. So both halves of the equation. Thought you might be intererested.
http://www.blogbridge.com/archives/2006/01/how_does_one_mi.php
http://www.blogbridge.com/archives/2006/01/announcing_blog.php
Thanks for the info, Pito, I’ve been meaning to check out BlogBridge. I heard it supported reading lists but didn’t know much more about it. Sounds like a good fit with the stuff I’m doing.
Nice to see the list.
Here is a link to it (live) in FOAF -
http://www.w3.org/2000/06/webdata/xslt?xslfile=http%3A%2F%2Fdannyayers.com%2Fsvn%2Fpragmatron%2Fxslt%2Fopml2blogroll.xsl&xmlfile=http%3A%2F%2Fwww.annezelenka.com%2Fmisbehaving.xml&transform=Submit
Might not paste well in these comments so here’s the trick - your OPML plus:
http://dannyayers.com/svn/pragmatron/xslt/opml2blogroll.xsl
fed through the W3C’s XSLT service:
http://www.w3.org/2001/05/xslt
The result can be directly merged with any other RDF about the people on the list, e.g. their FOAF Personal Profile docs. (Burningbird’s the blogher to ping on this).
Note that the result uses the blogger’s name as the title of the blog - suboptimal, but there isn’t a lot more can be done with the source data. Indeed it would be possible to go a lot more sophisticated that this on the same basic information, only it’s all very ambiguous in OPML.
In fact I’ve got a variation on the XSLT about (opml2skosroll.xsl, same host/path as above) that will express category info using the SKOS vocabulary, but this doesn’t add anything to your list (it uses the “title” attribute on a parent outline element, the FeedDemon has “text” instead - I forget the specific OPML source I wrote the XSLT around).
Dreaming in outlines? Hierarchies are *so* 20th century. You probably already dream in Webs
Anne,
Cool list. Check out http://www.opmlworkstation.com. One can maintain opml/reading lists there. The next version will allow you to permit public editing of OPML files (wiki-like) - so that more people can add to your OPML list.
As your list gets longer, you can spilt it into multiple OPML files - each covering a category of woman bloggers (each list potentially maintained by a different person), and have an OPML file that points to them.
BTW, I found your list in OPML Search (www.opmlsearch.com) when looking up Amanda Williams
http://www.opmlsearch.com/?Search=Amanda+Williams
You can read or browse your OPML there.
Feedback on the tools welcome!
-Bela
Danny - I’ve just started thinking about FOAF and am thrilled to see how I can link this project to that. I need to drink a large cup of coffee before I can begin to comprehend the remainder of what you just said.
And yes, hierarchies are so 20th century–disco era even.
Bela - Thanks for the pointers. I will check those out. I figured there were tons of tools for manipulating and managing OPML that I didn’t know about it. This has been a great way to learn about them.
Barb over at the Social Software blog has some helpful contributions to the discussion of how this could be used: http://socialsoftware.weblogsinc.com/2006/01/27/where-are-the-women-in-tech-they-live-in-an-opml-file/
Anne, FOAF, cool!
What I forget to mention was that (if I remember correctly) the FOAF version of the feedlist should be enough to use with the Chumpalogica.
At first glance, for simple aggregation FOAF plus RDF store might look over-engineered compared to say using plain-XML tools. But this general setup isn’t really more complex, the data model is already defined and libraries are available of the shelf to do much of the work.
What’s more the shared data model enables a whole lot more (my own bits of related play are around SparqlSphere).
btw, the FOAF irc channel is a friendly place.
While you’re looking in this space, you might want to check out XFML.
Thanks for making the file available, imported into Google Reader with no problems: Any feeds I was already subscribed to just had the tag ‘misbehaving’ added to them.
I’m up to about 680 subscriptions altogether, and probably need to do some trimming soon.
Wow, Michael, 680? I’m at around 150 and it’s more than I can take… but I’m still using Bloglines. I think I could manage more with a different reader. How do you like Google’s? I’m glad the file imported okay. I am still trying to decide what to do with the project, but boy is Python a great help. I’m loving it. Thanks for your evangelism.
I don’t read them all, at least, not all the time.
I find it convenient to occasionally look at the feeds labeled ‘web’, or ‘policy’, or ‘python’ (for example), just to see what’s going on in that area, and without feeling obligated to read everything that’s under that label.
Still, with 680 feeds, I need to trim a few with a low S/N ratio.