Tech Stuff

Yahoo! Pipes Updated

Gone

Yahoo discontinued the service as of September 30, 2015.

Not Quite Ready For Prime Time

On the surface of it, it seems like a totally cool idea. Function blocks that you drag, drop and “pipe” together on a whiteboard to grab RSS or Atom feeds, web pages, or whatever; grab the pieces you want, transform them, combine them, and sort them into something new.

The reality, alas, is not-quite-there yet. Despite being on-line for something like five years, the pipes system is prone to backups and clogs. The worst is the “I can’t save” problem.

This is usually accompanied by the “frozen block” problem, where you drag a block onto the whiteboard but then can’t move it or connect it. The problem for the first is usually to just wait; it’s probably actually saved, but the response isn’t coming back. In time it unclogs. The latter is more serious, and often involves backing out of the project altogether, then going back in.

The project I’ve been working in is pretty ambitious. An all-purpose “what’s the bad news” feed which combines the crime blotters of two local police departments’ web sites, two feeds from community boards, the National Weather Service’s feed for our area, a traffic report, and a transit report.

Now, grabbing RSS feeds and combining them is what Pipes does best. But the two local police department’s web pages were another matter. They’re done up in non-standard HTML, and a lot of trial-and-error was involved to try and capture the items that I want when the people who are coding the pages don’t do it the same way every day. It’s probably broken right now, come to think of it. And that blocked pipe problem can be pretty infuriating when you’re in a rapid-fire “try this then let’s try that” mode.

Speaking of nonstandard, the biggest problems I ran into were when Pipes didn’t see the Microsoft Office tags embedded in the text; that is, the parser saw them, but the debugger didn’t.

In this example, I had a space in between the quotation mark and the date. This made it impossible to sort on date, because the system wants to interpret it as a specific date/time format, and the space wasn’t part of it.

Running a regexp to strip out the space was getting me nowhere. I kept seeing ” 2010-04-etc.” and couldn’t get rid of it. Then I ran a regexp to take out any non-word spaces.

There’s that awful MS Office-style HTML. It was there, but the debugger refused to show it to me. Taking out anything that looked like a tag — despite the character-entity for the > — worked.

Still other bits of weirdness that the debugger wasn’t much help with. I was trying to filter so that only posts with words like “crime,” “cop,” “arrest” and so on would appear. But I kept getting this one:

Yes, I’m sure it’s lovely. But what’s that got to do with suburban crime? After publishing the Pipe and looking at the source code of the result, I found the problem:

Aha! There it is. So, a three-letter word is probably not the best filter you can use.

But again, these are easy. The interface continually hanging up on me is what made it hard. In time, I started to get close, but it took days.

Then…

Finally, a working feed.

The running version is here, along with the RSS of everything bad that might affect us here in the M/SO area.

Tom

Tom McGee has been building web sites since 1995, and blogging here since 2006. Currently a senior developer at Seton Hall University, he’s also a freelance web programmer and musician. Contact him if you have the need for a blog, web site, redesign or custom programming!

2 thoughts on “Yahoo! Pipes Updated”

  1. It does some simple things really well.

    I had a feed where the title of every post was something like “Item #233” or “Item #234.” The name of the item itself was the first sentence of the body of the post. So I was able to slice-and-dice it pretty easily so that this became this.

Leave a Reply

Your email address will not be published. Required fields are marked *