Scraping Blog Titles

Recently I wanted to get a list of every topic published on a certain blog. I didn’t want to scrap the entire site, just the titles. It turned out to be easy to do with a couple of bash commands.

First create a temporary directory

% mkdir tmp
% cd tmp

Go to the blog and scroll to the last page in the index. The URL will be something like http://www.example.com/page/n where n is the number of the page. Take a note of n because you need it.

Now write a quick bash script to grab all of those pages. Remember to replace n with the number you found above

% for ((i=1;i<=n;i++)); do
%  echo $i
%  wget -O "$i.html" "http://www.example.com/page/$i"
% done

If you check the directory you’ll now have a bunch of files from 1.html to n.html. View the source for one of the pages and find something unique that appears in every title line. For example, on this blog all titles lines start with <h1 class="entry-title"> so I could use entry-title. With another quick line of bash I can now grab all of the titles.

% rm -f titles.txt
% for ((i=1;i<=50;i++)); do
%  echo $i
%  grep "entry-title" $i.html >> titles.txt
% done

Remember to replace n with the number of pages you downloaded and entry-title with the thing you are grabbing.

Lastly, you can then convert it to a tab delimited file for importing into Excel.

% cat titles.txt | sed s/.*href=\"// | sed s/\"\>/\\t/ | sed s/\<\\/.*//

Using Twitter Bootstrap with standard Ruby on Rails form helpers

I’ve been using Rails and Bootstrap for a while now. To make them play happy I’ve previously used simple_form which made life easy. For a recent project I wanted to see how far I could get using the standard Rails form helpers. It turns out that with a little custom CSS you can get a long way.

div.field_with_errors {
    display: inline;
}
.field_with_errors + .help-block,
.field_with_errors .control-label,
.field_with_errors .radio,
.field_with_errors .checkbox,
.field_with_errors .radio-inline,
.field_with_errors .checkbox-inline {
    color: #a94442;
}
.field_with_errors .form-control {
    border-color: #a94442;
    -webkit-box-shadow: inset 0 1px 1px rgba(0, 0, 0, .075);
    box-shadow: inset 0 1px 1px rgba(0, 0, 0, .075);
}
.field_with_errors .form-control:focus {
    border-color: #843534;
    -webkit-box-shadow: inset 0 1px 1px rgba(0, 0, 0, .075), 0 0 6px #ce8483;
    box-shadow: inset 0 1px 1px rgba(0, 0, 0, .075), 0 0 6px #ce8483;
}
.field_with_errors + .input-group-addon {
    color: #a94442;
    background-color: #f2dede;
    border-color: #a94442;
}
.field_with_errors + .form-control-feedback {
    color: #a94442;
}

This solution hasn’t been fully tested but for most form inputs it seems to work correctly. The only major issue is that it won’t highlight prefixed add on’s. If you’re using those then I would recommend removing the css for .field_with_errors + .input-group-addon so that suffixed add on’s also don’t highlight.

Every business application is a CRM

Over the last year I’ve probably looked over 50 business applications across a variety of categories and platform. During that time a pattern has emerged. Every business application is a CRM.

Alright, that might be a slight exaggeration. There are certainly some applications that offer unique functionality. But the bulk of them, regardless of platform or target audience, seem to offer the same basically functionality that looks strikingly similar to a CRM.

  1. Dashboard
  2. Customer management
  3. Calendar/Scheduling
  4. Inventory
  5. Invoicing, Payments and Customer Accounts
  6. Maps
  7. Sales automation (email and/or SMS)
  8. Reporting

Throw a bit of wrapping around this an you have a recipe for a generic application that could be targeted at any market.

Starting out with Maven

Until recently I’ve really only used Java for private projects. This allowed me to get away with selecting the right project type from Eclipse or NetBeans and manually dealing with dependencies. A few months ago that changed and I’m now building more projects using Java.

Anyone who has worked on a project with multiple developers knows that you’ll go crazy without a build script and dependency management. I’d previously used Ant for build scripts but everyone I spoke to said I should use Maven instead because it also handled dependency management.

Starting a new project with Maven was reasonable painless. At first I was a little hesitant that I would need to create complicated XML files and directory structures from scratch but turns out that Maven comes with a generator that creates the relevant directories and files. for you. To start a new project you just run the following command line:

mvn archetype:generate -D archetypeGroupId=org.apache.maven.archetypes -DgroupId=com.example.app -DartifactId=example

You’ll need to replace com.example.app with the namespace for your project and example with the name of the project. Maven will create a new directory with the project name and then generate the relevant directory structure and files below that.

One thing that initially confused me after running this command is that Maven displays hundreds of lines of output and asks you enter a number. What it’s asking you is which archetype (template) to use when generating the application skeleton. Something I found useful is that you can type in a string and it will display a list of just those archetypes that contained that contain it.

For example: If I wanted to build a gwt application I’d do this.

Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : gwt
Choose archetype:
1: remote -> com.dyuproject.protostuff.archetype:basic-gwt-webapp (webapp archetype using protostuff, json and gwt)
2: remote -> com.dyuproject.protostuff.archetype:simple-gwt-webapp (webapp archetype using protobuf, json and gwt)
3: remote -> net.kindleit:gae-archetype-gwt (Archetype for creating maven-gae projects that uses GWT for the view)
4: remote -> net.sf.mgp:maven-archetype-gwt (An archetype which contains a sample Maven GWT project.)
5: remote -> org.codehaus.mojo:gwt-maven-plugin (Maven plugin for the Google Web Toolkit.)
6: remote -> org.codehaus.sonar.archetypes:sonar-gwt-plugin-archetype (Maven archetype to create a Sonar plugin including GWT pages)
7: remote -> org.geomajas:geomajas-gwt-archetype (Geomajas GWT application archetype)
8: remote -> org.ops4j.pax.web.archetypes:wab-gwt-archetype (-)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): :

Once you have selected your archetype there are a few other straight forward questions and it will generate the new project for you. You can compile your new project immediately by changing into the project directory and typing:

mvn compile

All of the dependencies are automatically download for you. While I’ve still got a lot to learn about Maven it was a lot easier than I was expecting.

Cleaning the blog

I started this blog 5 years ago (January 30, 2007). Back then it was full of technical ramblings, mostly about Linux and PHP, which reflected where my life was at the time. While I still use both technologies that’s less reflective of who I am today.

Professionally I’ve moved from developer to CTO. Personally I’ve found an interest in business, startups, marketing and making money online. If you’re familiar with Rob Walling then you’ll understand the word micropreneur. For everyone else think lifestyle business or 4 hour work week.

Earlier today I realised that while I’ve only posted once in the last 18 months the blog is still getting a reasonable amount of traffic to seriously out of date content. In fairness to those people I’ve decided to pull most of the content from this site immediately while I decide if I want to keep it going. Google should figure this out shortly and stop the traffic completely.

jQuery UI sortable

Recently I’ve been playing with a list of lists using the jQuery UI sortable component. Graphically it looks something like:

  • List 1
    • Item 1
    • Item 2
    • Item 3
  • List 2
    • Item 5
    • Item 6
    • Item 7
  • List 3
    • Item 7
    • Item 8
    • Item 9

The HTML for this is pretty simple

<ul  class="lists">
    <li>
        <h1>List 1</h1>
        <ul id="list_1" class="list">
            <li id="item_1">Item 1</li>
            <li id="item_2">Item 2</li>
           <li id="item_3">Item 3</li>
        </ul>
    </li>
    <li>
        <h1>List 2</h1>
        <ul id="list_2" class="list">
            <li id="item_4">Item 4</li>
            <li id="item_5">Item 5</li>
            <li id="item_6">Item 6</li>
        </ul>
    </li>
    <li>
        <h1>List 3</h1>
        <ul id="list_3" class="list">
            <li id="item_7">Item 7</li>
            <li id="item_8">Item 8</li>
            <li id="item_9">Item 9</li>
        </ul>
    </li>
</ul>

I want the user to be able to re-order the lists, the items in each list and move items from one list to another. 18 lines of Javascript later I had it handling this perfectly and displaying alerts telling me where items were moved to, the list they were in and if the lists were re-ordered.

$(function() {
    $(".lists").sortable({
        forcePlaceholderSized: true,
        stop: function(event, ui) {
            var list = ui.item.children('ul').attr('id').replace('list_', '');
            window.alert('Moving list ' + list + ' to position ' + ui.item.index());
        },
    });
    $(".list").sortable({
        connectWith: ".list",
        forcePlaceholderSized: true,
        stop: function(event, ui) {
            var item = ui.item.attr('id').replace('item_', '');
            var list = ui.item.parent().attr('id').replace('list_', '');
            window.alert('Moving item ' + item + ' to list ' + list + ' position ' + ui.item.index());
        },
    });
});

I love jQuery and jQuery UI.

Auto Increment in SugarCRM

Earlier today I posted about creating a DateTime Picker in SugarCRM. A second problem I had was creating an auto increment field. This turned out to be slightly more difficult than the datetime picker. Again you need to edit your vardefs.php but this time you add the following to your field.

'auto_increment' => true,

You then need to manually alter the table structure so the id field is a unique index instead of a primary key then add your new column to the database as an auto_increment field and the tables primary key. I also found that if you try to install the module on another system it will fail because it can’t create the table properly. You can solve that by creating the table then installing the module.

Date/Time Picker in SugarCRM

Recently I needed to add a date/time field to a custom SugarCRM module. Sadly the module builder doesn’t include support this and the documentation is pretty bad. Eventually I managed to solve my problem and the solution is suprisingly simple.

After adding a date field to the form I edited my vardefs.php file. You’ll find the file in custom/modulebuilder/packages/package_name/modules/module_name/. Find the field and change it’s type to datetime.

Next you need to change your views. They’re located in custom/modulebuilder/packages/package_name/modules/module_name/metadata. In my case I wanted to the edit view to show a datepicker plus a time combo. I found the field and added

'type' => 'Datetimecombo'

After that I deployed the package and my module was now saving as a date/time and in the edit view I could set both the date (using a date picker) and time (using drop down lists).

API equals dollars

Lately I’ve been looking at a number of SaaS providers covering a range of areas. It amazes me how many of them have no API or only a reporting API. If you’re thinking of building a SaaS start-up then you should be thinking about creating an API that allows your customers to do everything they can with your user interface.

Service providers with this API have an obvious advantage when it comes to migration and integration but they also have a more subtle and more important advantage. When reviewing potential providers I looked at those with an API before those without. Your SaaS may be the best but without this API are potential customers even considering you?

Unit tests

I just read a post asking if my unit test take too long. In it the author suggests that 5 minutes is long and asks if anyone has solved this problem. This reminded me of a discussion I had with some developers about 12 months ago about unit testing in which my argument was simply that unit tests need to be comprehensive and not necessarily quick.

There are many projects where the unit tests take several hours to run. This shouldn’t matter during development when you’re probably only interested in a few unit tests as most test tools provide a way to filter the tests that are run. You only need to run the entire test suite prior to committing or during continuous integration.

Having said all of that I can recommend using memory tables if your database supports them. The operations are generally a lot faster as the database doesn’t need to write to disk.