Tech Stuff

Migrating a Biggish Site With Power Tools

The Montrose Park Historic District Association is a local group here in South Orange dedicated to protecting and publicizing a beautiful neighborhood of stately 19th century mansions.

Their old web site had been designed in 2001 and was looking rather outdated. A lot of the things that it did weren’t being done anymore (merchandise sales, newsletters) and several things that they wanted to do today, it didn’t.

Let me admit, somewhat sheepishly, I underestimated how much work was going to be involved in porting all of this over. In this post, I’ll try to explain some of the “power tools” I used to take all of this stuff and successfully port it over into a new site. With the exception of Excel, they’re all free; but I imagine with a little extra research I could have discovered a way to not need even that.

The biggest piece of the site content is the 540-some posts of the “District Inventory,” descriptions of the many homes that make up the district, including maps and images for each. I batted around ideas for ways of streamlining the process so that instead of weeks of data input I could get away with hours. Here are some of the things I did, using:

  • Excel
  • A text editor with regular express support
  • SQL queries
  • Apache Open Office Calc
  • More Excel

Not that I’m a huge Microsoft fan or anything, but Excel offers one key calculation function that I couldn’t have done without. And while I was at it, the CONCATENATE function.

Here’s part of the new home page, by the way, I used the Make theme, along with the Make Plus plugin, for the format. It was really nice right out of the box.

A slider, a page builder, and more.

Post Content

An earlier volunteer helpfully entered a hundred or so addresses into the demo WordPress site that was to become the starting point for me. Great as a proof-of-concept, but still many to go.

With a stroke of incredible luck, a Word Doc was available that contained all the content. As a former Perl hacker I recognized immediately that I could work on this and make it into what I needed — to import the rest, and modify what was there already.

Here’s an example of how a basic block of content for one property looked.

470 Berkeley Avenue                                      Block 506            Lot 1
    Key
    Outbuildings:  1 stylistically similar detached carriage house (C)
470 Berkeley Avenue is a 2 1/2 story, 5 bay, rectangular plan, brick, Neoclassical-influenced, residential building.  Constructed c. 1920, the slate-clad, side gambrel roofed house is articulated by a colossal order, fluted Ionic column-supported full front porch with mutule-supported entablature and balustrade above.  Three round-arched, pilastered dormers with lancet upper sashes ornament the slate roofline.  The fenestration on the facade consists of 9/1 windows with brick lintels featuring stone keystones and sills.  The projecting enclosed portico features a segmentally arched brick surround, with a leaded fanlight and matching sidelights.  Above the portico entablature is a wrought iron balcony.  At one side of the house is a one story, set back sun porch, and at the back of the house, is a cross gambrel wing.   This Neoclassical house is located at the corner of Montrose and Berkeley Avenues, in an estate setting.

The nature of WordPress is such that, left alone, every post excerpt was going to begin with Block: xxx Lot: yyy. Not exactly a grabber. I wanted to use the Block and Lot numbers, as well as the “Outbuildings” section (which includes “Key,” “Contributing” or “Non-contributing” options), but as custom fields to be displayed in a sidebar using the Advanced Custom Fields Widget plugin.

Proceeding to copy and paste blocks like the above in bunches of eight or 10 into Win8 Code Writer app which supports regular expressions — not for the fainthearted or disorganized, but which fortunately I can do.


By the way the Win8 Code Writer app is, of course, a Windows 8 application. So it’s flaky, unreliable, prone to crashing and often just weird. Save your work often, or you’ll get something like the following.

Here is how some of the regexes looked.

\s*Key\s*\n\s*Outbuildings:\s*(.*) => \n$1\n
\s*Non-Contributing\s*\n\s*Outbuildings:\s*(.*) -> \n\n$1
\s*Contributing\s*\n\s*Outbuildings:\s*(.*) -> $1\n\n
\s*Block\s(\d*)\s*Lot\s(.*) -> \n$1\n$2

NB: the => separates the “find” from the “replace with.”

I set aside a bunch of them in a separate text file so I could better copy-and-paste them in. Particularly after the text editor crashed once. Did I mention it’s flaky? The source data was irregular in structure, sometimes with extra linefeeds or single-quotes that needed to be ferreted out and corrected. I found that a small batch of them was easier to visually scan before I imported it in. Even then, several times I had to go into the back-end and remove new categories that didn’t belong, or empty or incomplete posts.

Little by little, you can build a CSV file out of these.

post_title,Block,Lot,"Contributing Outbuildings","Key Outbuildings","Non-Contributing Outbuildings",post_content,post_category,post_type,post_thumbnail

"5 Stanley Road","704","14","1 stylistically similar detached garage (C)","","","5 Stanley Road is a 2 1/2 story, 3 bay, square plan, stuccoed, eclectic, residential building. Built c. 1910, the house is dominated by a hipped roof with matching dormers and overhanging, scroll bracket-supported eaves. Other ornamentation includes splayed lintels with keystones, a Doric column-supported portico and a matching porte-cochere.","Stanley Road","post",""

"251 Raymond Avenue","1105","13","1 stylistically similar detached garage (C)","","","251 Raymond Avenue is a 2 1/2 story, 3 bay, irregular plan, Shingle style, wood frame, residential building.  Built c. 1900, the wood shingle-clad house features an intersecting gable roof, multiple dormers and arched gabled extensions.  Diamond-paned and leaded windows, as well as massive brick chimneys add to the picturesque quality of the building.  The house was built for Caroline F. Kirby.","Raymond Avenue","post","http://mphda.org/wp-content/uploads/2014/10/251_Raymond_Ave_1000.jpg"

NB: linefeeds added for clarity.

Notice how during the import you can include custom fields and post thumbnails. Unfortunately, geotags didn’t seem possible — that would need to be dealt with later on.

Really Simple CSV Importer

I experimented with a couple of plugins for this, and I ended up with this one. It was the clearest and handled errors more gracefully (and informatively) than others.

By the way, don’t try and create CSV files with Excel. For some of the first efforts I tried copying-and-pasting batches of these into spreadsheet cells and saving them as CSV, but Microsoft’s format for this supposedly standard format is nonstandard. I know, shocking. Apache Open Office does it the right way.

Still, it was easier in the end to take the plain text and manage it with regular expression replacements.

Then I needed to deal with already-entered items

Again, I wasn’t about to go into a hundred different posts and move this information around. Instead, a SQL query to find all posts where the post_content started with “Block:”, and filtering out drafts. Once I had that list, from a SELECT ID, post_content query, I could drop it into Code Writer and have at it.

I was able to search for:

"(\d*)","Block\s(\d*)\nLot\s(\d*)\n([^:]*):\s*(.*)

And replace it with:

INSERT INTO wpprefix_postmeta (post_id,meta_key,meta_value) ($1,'Block','$2');\nINSERT INTO wpprefix_postmeta (post_id,meta_key,meta_value) ($1,'Lot','$3');\nINSERT INTO wpprefix_postmeta (post_id,meta_key,meta_value) ($1,'$4','$5');\n

And end up with something like:

INSERT INTO wpprefix_postmeta (post_id,meta_key,meta_value) VALUES (1878,'Block','1103');
INSERT INTO wpprefix_postmeta (post_id,meta_key,meta_value) VALUES (1878,'Lot','12');
INSERT INTO wpprefix_postmeta (post_id,meta_key,meta_value) VALUES (1878,'Contributing Outbuildings','1 stylistically similar detached garage (C)');

309 Grove Road is a 2 story, 3 bay, irregular plan, brick, eclectic residential building.  The c. 1920 house is articulated by a polychromatic slate-clad side gable roof with a cross gable.  The fenestration consists of multi-paned casement windows, and round-arched windows in the side porch and cross gable apes.  Bracketed vergeboards detail the roofline.

Which is then easy to run a different regular expression to change the last block to:

UPDATE wpprefix_posts SET post_content='309 Grove Road is a 2 story, 3 bay, irregular plan, brick, eclectic residential building. The c. 1920 house is articulated by a polychromatic slate-clad side gable roof with a cross gable. The fenestration consists of multi-paned casement windows, and round-arched windows in the side porch and cross gable apes. Bracketed vergeboards detail the roofline.' WHERE ID=1878 LIMIT 1;

Fortunately there were no apostrophes in this group, otherwise I would have needed to escape them.

So that gave me a big chunk of SQL queries which I could simply copy-and-paste into the PHPMyAdmin SQL tab and run.

Mapping with the WP Geo Plugin

This easy-to-use plugin works with Google Maps to let you embed maps by post and by category. I wanted to do all the streets as categories, and have the posts be organized that way. And in so doing, allow for maps like this to appear as a sidebar widget in each:

There’s also a widget, where each pin is a link to the individual post for that address

In addition to having another, specific, map in each:

A map like this appears at the end of every geo-tagged post

The problem with mapping

The idea of going into each of 500 posts and adding in an address was not appealing to me. More appealing was figuring out a way to automate it.

WP GEO adds four entries to the postmeta table, _wp_geo_title (the tag attached to the pin), _wp_geo_latitude and _wp_geo_longitude (self-explanatory), and _wp_geo_map_settings. The latter is by default an empty serialized array, ‘a:0:{}’.

Already the shape of making the data predictable is in place, and if you can do that you can have it write queries for you. But where to get longitude and latitude for a list of addresses?

Of all places, a crime blog had the solution. This awesome method worked perfectly on the first try.

After running a query to get post ID and title — which was the address! — and dropping them into columns A and B of an Excel Spreadsheet, I made column C the full address:

(C3) =CONCATENATE(B3;",";$G$1)

Yielding something like, “64 Fielding Court, South Orange, NJ 07079”, when cell $G:$1 contains “, South Orange, NJ 07079”. Then in column D use the Police Analyst function:

GoogleGeocode(C3)

To give you a comma-separated listing of latitude and longitude. These are easily split with some of Excel’s text functions (though a true “split” or “explode” would be better, no?):

=LEFT(D3;FIND(",";D3)-1)
=RIGHT(D3;LEN(D3)-FIND(",";D3))

Also stored in separate columns as fixed values, are the “keys” that we’ll use in the wpprefix_postmeta table:

_wp_geo_title
_wp_geo_latitude
_wp_geo_longitude

For columns F, G, H, I then, we’ll glue all of these together into sets of four queries:

=CONCATENATE("INSERT INTO wpprefix_postmeta (post_id, meta_key, meta_value) VALUES (";A3;",'";$B$2;"','";B3;"');")
=CONCATENATE("INSERT INTO wpprefix_postmeta (post_id, meta_key, meta_value) VALUES (";A3;",'";$E$2;"','";E3;"');")
=CONCATENATE("INSERT INTO wpprefix_postmeta (post_id, meta_key, meta_value) VALUES (";A3;",'";$F$2;"','";F3;"');")
=CONCATENATE("INSERT INTO wpprefix_postmeta(post_id, meta_key, meta_value) VALUES (";A3;",'_wp_geo_map_settings','a:0:{}' );")

Each yields something like this:

INSERT INTO wpprefix_postmeta (post_id, meta_key, meta_value) VALUES (1845,'_wp_geo_title','64 Fielding Court');

Which you can again copy-and-paste in bunches into phpmyadmin’s SQL tab. Two days work reduced to two hours.

Here’s an example of how one of the posts turned out.

Summary

Using the right set of tools with a little care in organizing your data — whether from an outside source or by querying data that exists — you can parse that existing data into queries that generate new data. Importing a big site with lots of content is still a big job, but instead of having a big group of people working on it for a week you can do it with fewer hands in less time.

Tom
Tom McGee has been building web sites since 1995, and blogging here since 2006. Currently a senior developer at Seton Hall University, he's also a freelance web programmer and musician. Contact him if you have the need for a blog, web site, redesign or custom programming!

Leave a Reply

Your email address will not be published. Required fields are marked *