Migrating from TypePad isn’t too difficult – they provide the standard MT export file, which most platforms readily accept. The one big challenge my latest migration project involved was getting the images out. Since the client is a domain user, their images hosted at TypePad will stop working once the domain is switched over and there is no way in the TypePad control panel to export them.
The solution I came up with was to use php on the new destination (a dedicated server) and parse the export file itself, denoting when I’m in a body section, and using a regular expression to try and find any image tags that point to the domain being relocated. For each TypePad hosted image that was found, the same directory structure was created locally, and then the file was copied from the TypePad server to the local matching directory structure.
It took quite a while to run, since the client had been on TypePad since 2004, but the result was wonderful – all I have to do is relocate the created directory to the webspace and not only have all the images been released from they TypePad prison, but all the existing links will still work correctly. The one caveat is that the base image directory started with a period, which is not a problem as long as you understand that Unix variants hide all files and folders that start with a period. Use ls -la to see what is normally hidden.
Code is below the fold…
<?php
// Initialize Variables
$count = 0;
$title = "";
$status = "";
$body = 0;
$body_text = "";
$export = "mt-export.txt";
$domain = "http://somedomain.com"; // Be sure to set this to an appropriate value
// Open import file
$fh = fopen($export, "r") or die("Cannot Open export file '$export'");
// Process import file post by post
while($buf = fgets($fh)) {
// If we are in a post body, look for images
if($body == 1) {
// Do all the processing once we have the entire post body in $body_text
if(substr($buf,0,5) == "-----") {
preg_match_all("/<img.*?src=(\"|')".preg_quote($domain)."(.*?)(\"|')/",
$body_text, $matches);
$images = $matches[2];
// Process all images found by the regular expression
foreach($images as $image) {
$data = pathinfo($image);
$path = $data['dirname'];
$file = $data['basename'];
// Create the directory structure
// (if the path is deep, this may need some tweaking)
if(!is_dir($path)) {
mkdir($path);
}
// Copy image to local server
$remote = $domain . $image;
if(!copy($remote, $image)) {
print " - Copy failed for $image\n";
}
}
// Reset body variables
$body = 0;
$body_text = "";
// Add more body content to $body_text
} else {
$body_text .= $buf;
}
// Post status and look for the start of the next post body section
} else {
if(substr($buf,0,6) == "TITLE:") {
preg_match("/^TITLE: (.*)/", $buf, $matches);
$title = $matches[1];
$count++;
} else if(substr($buf,0,7) == "STATUS:") {
preg_match("/^STATUS: (.*)/", $buf, $matches);
$status = $matches[1];
if($status == "Publish") {
print "Processing: $title ($count)\n";
}
} else if(substr($buf,0,5) == "BODY:") {
$body = 1;
}
}
}
// Clean Up
fclose($fh);
?>
As always, this code is provided free of charge and comes without warranty or guarantee. If you do not understand what this code is doing, you had probably best not be running it.
Leave a comment