The 57 Lamest Tech Moments of 2010

By CmdrTaco, SlashdotDecember 21, 2010 at 09:45AM

harrymcc writes “When it comes strange blunders, failed dreams, pointless legal wrangling, and other embarrassments, the technology industry had an uncommonly busy 2010. I compiled a list of the most notable examples–including the lost iPhone prototype, the short life of Microsoft’s Kin, the end of Google Wave, the McAfee security meltdown, a depressingly long list of lawsuits over mobile patents, and much more.”

Read more of this story at Slashdot.

Firefox 4 beta 8 now available for download

By Lee Mathews, Download SquadDecember 21, 2010 at 08:30AM

While the main download link hasn’t updated yet at getfirefox.com/beta, the latest beta release of Firefox 4 is ready for download from Mozilla’s FTP servers. With Beta 7 already being feature complete, Beta 8 is all about squashing bugs and shining up the browser in anticipation of its release. It was a little slow in the coming, but frankly we’re just glad to see FF4b8 surface before time ran out on 2010.

Check out the full build notes here, or click one of the following links to download:

Other languages and builds are available on the Mozilla FTP server. Full release notes can be found on the Mozilla wiki.

Firefox 4 beta 8 now available for download originally appeared on Download Squad on Tue, 21 Dec 2010 08:30:00 EST. Please see our terms for use of feeds.

Permalink | Email this | Comments

YQL: Using Web Content For Non-Programmers

By Christian Heilmann, Smashing Magazine FeedDecember 21, 2010 at 08:08AM

Advertisement in YQL: Using Web Content For Non-Programmers
 in YQL: Using Web Content For Non-Programmers  in YQL: Using Web Content For Non-Programmers  in YQL: Using Web Content For Non-Programmers

Building a beautiful design is a great experience. Seeing the design break apart when people start putting in real content, though, is painful. That’s why testing it as soon as possible with real information to see how it fares is so important. To this end, Web services provide us with a lot of information with which to fill our products. In recent years, this has been a specialist’s job, but the sheer amount of information available and the number of systems to consume it makes it easier and easier to use Web services, even for people with not much development experience.

On Programmable Web, you can find (to date) 2580 different application programming interfaces (or APIs). An API allows you to get access to an information provider’s data in a raw format and reformat it to suit your needs.

Programmable in YQL: Using Web Content For Non-Programmers

The Trouble With APIs

The problem with APIs is that access to them varies in simplicity, from just having to load data from a URL all the way up to having to authenticate with the server and give all kinds of information about the application you want to build before getting your first chunk of information.

Each API is based on a different idea of what information you need to provide, what format it should be in, what data it will give back and in what format. All this makes using third-party APIs in your products very time-consuming, and the pain multiplies with each one you use. If you want to get photos from Flickr and updates from Twitter and then show the geographical information in Twitter on a map, then you have quite a trek ahead.

Simplifying API Access

Yahoo uses APIs for nearly all of its products. Instead of accessing a database and displaying the information live on the screen, the front end calls an API, which in turn gets the information from the back end, which talks to databases. This gives Yahoo the benefit of being able to scale to millions of users and being able to change either the front or back end without disrupting the other.

Because the APIs have been built over 10 years, they all vary in format and the way in which you access them. This cost Yahoo too much time, which is why it built Yahoo Pipes — to ease the process.

Pipes-500px in YQL: Using Web Content For Non-Programmers
Large view

Pipes is amazing. It is a visual way to mix and match information from the Web. However, as people used Pipes more, they ran into limitations. Versioning pipes was hard; to change the functionality of the pipe just slightly, you had to go back to the system, and it tended to slow down with very complex and large conversions. This is why Yahoo offers a new system for people’s needs that change a lot or get very complex.

YQL is both a service and a language (Yahoo Query Language). It makes consuming Web services and APIs dead simple, both in terms of access and format.

Retrieving Data With YQL

The easiest way to access YQL is to use the YQL console. This tool allows you to preview your YQL work and play with the system without having to know any programming at all. The interface is made up of several components:

Yqlcons in YQL: Using Web Content For Non-Programmers
Large view

  1. The YQL statement section is where you write your YQL query.
    YQL has a very simple syntax, and we’ll get into its details a bit later on. Now is the time to try it out. Enter your query, define the output format (XML or JSON), check whether to have diagnostics reporting, and then hit the “Test” button to see the information. There is also a permalink; click it to make sure you don’t lose your work in case you accidentally hit the “Back” button.
  2. The results section shows you the information returned from the Web service.
    You can either read it in XML or JSON format or click the “Tree view” to navigate the data in an Explorer-like interface.
  3. The REST query section gives you the URL of your YQL query.
    You can copy and paste this URL at any time to use it in a browser or program. Getting information from different sources with YQL is actually this easy.
  4. The queries section gives you access to queries that you previously entered.
    You can define query aliases for yourself (much as you would bookmark websites), get a history of the latest queries (very useful in case you mess up) and get some sample queries to get started.
  5. The data tables section lists all the Web services you can access using YQL.
    Clicking the name of a table will in most cases open a demo query in the console. If you hover over the link, you’ll get two more links — desc and src — which give you information about the parameters that the Web service allows and which show the source of the data table itself. In most cases, all you need to do is click the name. You can also filter the data table list by typing what you’re looking for.

Using YQL Data

By far the easiest way to use YQL data is to select JSON as the output format and define a callback function. If you do that, you can then copy and paste the URL from the console and write a very simple JavaScript to display the information in HTML. Let’s give that a go.

As a very simple example, let’s get some photos from Flickr for the search term “cat”:

select * from flickr.photos.search where text="cat"

Type that into the YQL console, and hit the “Test” button. You will get the results in XML — a lot of information about the photos:

Yql-statement in YQL: Using Web Content For Non-Programmers
Large view

Instead of XML, choose JSON as the output format, and enter myflickr as the callback function name. You will get the same information as a JSON object inside a call to the function myflickr.

Yql-statement2 in YQL: Using Web Content For Non-Programmers
Large view

You can then copy the URL created in the “REST query” field:

Rest-query in YQL: Using Web Content For Non-Programmers
Large view

Write a JavaScript function called myflickr with a parameter data, and copy and paste the URL as the src of another script block:

<script>
  function myflickr(data){
    alert(data);
  }
</script>
<script src="https://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20tex
t%3D%22cat%22&format=json&env=store%3A%2F%2Fdatatables.org
%2Falltableswithkeys&callback=myflickr"></script>

If you run this inside a browser, the URL you copied will retrieve the data from the YQL server and send it to the myflickr function as the data parameter. The data parameter is an object that contains all the returned information from YQL. To make sure you have received the right information, test whether the data.query.results property exists; then you can loop over the result set:

<script>function myflickr(data){
  if(data.query.results){
    var photos = data.query.results.photo;
    for(var i=0,j=photos.length;i<j;i++){
      alert(photos[i].title);
    }
  }
}</script>
<script src="https://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20text%3D%22cat%22
&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
callback=myflickr"></script>

You can easily get the structure of the information and know what is loop-able by checking the tree view of the results field in the console:

Datatree in YQL: Using Web Content For Non-Programmers

Right now, all this does is display the titles of the retrieved photos as alerts, which is nothing but annoying. To display the photos in the right format, we need a bit more — but no magic either:

<div id="flickr"></div>
<script>function myflickr(data){
  if(data.query.results){
    var out = '<ul>';
    var photos = data.query.results.photo;
    for(var i=0,j=photos.length;i<j;i++){
      out += '<li><img src="https://farm' + photos[i].farm +
             '.static.flickr.com/' + photos[i].server + '/' + photos[i].id +
             '_' + photos[i].secret + '_s.jpg" alt="' + photos[i].title +
             '"></li>';
    }
    out += '</ul>';
  }
  document.getElementById('flickr').innerHTML = out;
}</script>
<script src="https://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20text%3D%22cat%22&
format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
callback=myflickr"></script>

Flickrphotos in YQL: Using Web Content For Non-Programmers

Put this into action and you’ll get photos of cats, live from Flickr and without having to go through any painful authentication process.

The complexity of the resulting HTML for display differs from data set to data set, but in essence the main trick remains the same: define a callback function, write it, copy and paste the URL you created in the console, test that data has been returned, and then go nuts.

Using YQL To Reuse HTML Content

One other very powerful use of YQL is to access HTML content on the Web and filter it for reuse. This is usually called “scraping” and is a pretty painful process. YQL makes it easier because of two things: it cleans up the HTML retrieved from a website by running it through HTML Tidy, and it allows you to filter the result with XPATH. As an example, let’s retrieve the list of my upcoming conferences and display it.

Go to https://icant.co.uk/ to see my upcoming speaking engagements:

Upcoming in YQL: Using Web Content For Non-Programmers

You can then use Firebug in Firefox to inspect this section of the page. Simply open Firebug, click the box with the arrow icon next to the bug, and move the cursor around the page until the blue border is around the element you want to inspect:

Fb in YQL: Using Web Content For Non-Programmers
Large view

Right-click the selection, and select “Copy XPath” from the menu:

Xpath in YQL: Using Web Content For Non-Programmers
Large view

Go to the YQL console, and type in the following:

select * from html where url="https://icant.co.uk" and xpath=''

Copy the XPath from Firebug into the query, and hit the “Test” button.

select * from html where url="https://icant.co.uk" and
xpath='//*[@id="travels"]'

Select-statement in YQL: Using Web Content For Non-Programmers
Large view

As you can see, this gets the HTML of the section that we want inside some XML. The easiest way to reuse this in HTML is by requesting a format that YQL calls JSON-P-X. This will return a simple JSON object with the HTML as a string. To use this, do the following:

  1. Copy the URL from the REST field in the console.
  2. Add &format=xml&callback=travels to the end of the URL.
  3. Add this as the src to a script block, and write this terribly simple JavaScript function:
<div id="travels"></div>
<script>function travels(data){
  if(data.results){
    var travels = document.getElementById('travels');
    travels.innerHTML = data.results[0];
  }
}</script>
<script src="https://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Ficant.co.uk%22%20
and%20xpath%3D'%2F%2F*%5B%40id%3D%22travels%22%5D'&
diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
format=xml&callback=travels"></script>

The result is an unordered list of my events on your website:

Yql-demo in YQL: Using Web Content For Non-Programmers

Debugging YQL Queries

Things will go wrong, and having no idea why is terribly frustrating. The good news with YQL is that you will get error messages that are actually human-readable. If something fails in the console, you will see a big box under the query telling you what the problem was:

Select-statement2 in YQL: Using Web Content For Non-Programmers
Large view

Furthermore, you will see a diagnostics block in the data returned from YQL that tells you in detail what happened “under the hood.” If there are any problems accessing a certain service, it will show up there.

Select-statement3 in YQL: Using Web Content For Non-Programmers
Large view

YQL Syntax

The basic syntax of YQL is very easy:

select {what} from {source} where {conditions}

You can filter your results, cut the information down only to the bits you want, paginate the results and nest queries in others. For all the details of the syntax and its nuances, check the extensive YQL documentation.

YQL Examples

You can do quite amazing things with YQL. By nesting statements in parentheses and filtering the results, you can reach far and wide across the Web of data. Simply click the following examples to see the results as XML documents. Copy and paste them into the console to play with them.

This is just a taste of the power of YQL. Check out some of my presentations on the subject.

YQL’s Limits

YQL has a few (sensible) limits:

  • You can access the URL 10,000 times an hour; after that you will be blocked. It doesn’t matter in our case because the blocking occurs per user, and since we are using JavaScript, this affects our end users individually and not our website. If you use YQL on the back end, you should cache the results and also authenticate to the service via oAuth to be allowed more requests.
  • The language allows you to retrieve information; insert, update and delete from data sets; and limit the amount of data you get back. You can get paginated data (0 to 20, 20 to 40 and so on), and you can sort and find unique entries. What you can’t do in the YQL syntax is more complex queries, like “Get me all data sets in which the third character in the title attribute is x,” or something like that. You could, however, write a JavaScript that does this kind of transformation before YQL returns the data..
  • You can access all open data on the Web, but if a website chooses to block YQL using the robots.txt directive, you won’t be allowed to access it. The same applies to data sources that require authentication or are hosted behind a firewall.

There Is More To YQL

This article covers how to use YQL to access information. If you have an interesting data set and want it to become part of the YQL infrastructure, you can easily do that, too. We’ll cover that in the next article.

Documentation and Related Links

(al)(vf)


© Christian Heilmann for Smashing Magazine, 2010. | Permalink | Post a comment | Add to del.icio.us | Digg this | Stumble on StumbleUpon! | Tweet it! | Submit to Reddit | Forum Smashing Magazine
Post tags: , ,

Getting Paid to Tell Lies: Mystery Shopping as a Frugal Hack

By Donna Freedman, Get Rich Slowly – Personal Finance That Makes Sense.December 21, 2010 at 06:00AM

This post is from new GRS staff writer Donna Freedman. Donna writes the Living With Less personal finance column for MSN Money, and writes about frugality and intentional living at Surviving And Thriving.

Two persistent rumors about mystery shopping:

  • It’s a scam.
  • It’s not a scam — and you can get rich doing it!

Allow me to set these rumors to rest:

  • Mystery shopping is not a scam. (Well, sometimes it is. More on that later.)
  • You won’t get rich, but you can make a little extra — plus get free stuff.
  • You should never pay for mystery shopping info.

My daughter Abby has been doing “shops” for a decade, and I’ve done them off and on for six years. We’ve gotten free steaks, hotel rooms, oil changes, booze, pet food, lodging, nights at a casino, rental cars, and molten chocolate cake.

Not that it’s all frou-frou stuff: Shops exist for things like vision exams, oil changes, vitamins, pet checkups, bone-density testing. You could get paid to drink microbrews, test-drive a sports car, visit an amusement park, or shop for groceries.

As a mystery shopper, you’re paid to be the eyes and ears of a restaurant, a specialty store, a hotel. That means legwork. You have to fill out multi-page reports (and if you do it wrong, they won’t pay you).

But the more shops you do, the better you get at it — and the more likely that shop providers will call and offer you first pick.

Don’t ever pay for it
Recently I received e-mails from two different companies asking me to mystery-shop. Both told me to “register” with my bank-account, credit-card and Social Security numbers. Riiiight.

Two legitimate sources for mystery shopping are Volition.com and an industry group called the Mystery Shopping Providers Association. Both have lists of companies that offer jobs by region.

Don’t expect to be sent to a pricey steakhouse right off the bat. You’ll have to take lower-end gigs (fast food, coffee shop) to prove yourself. In fact, you may even have to do a make-believe “sample shop” to prove you can write in the style they request.

About that style: “The waiter was really good” doesn’t mean much. What made him good? Did he offer to start you off with something to drink? Did he check back during your meal to make sure everything was OK? Did he tell you the tofu is made in-house?

A specific list of things to watch for will be provided to you. Very specific, as in “Did your glass ever become less than half-full?” or “Did the rental car agent offer you the satellite radio option?”

You may be asked to follow a script, i.e., become a good liar. For example, my daughter was told to go to a pet-food store and say she had two dogs. At the time, she didn’t own so much as a goldfish.

At times you’ll be sent to places normally out of your league. Once, Abby was sent to a pricey-chic clothing store and allowed up to $30 worth of purchases in addition to the shop payment. The only two things she could get were a keychain or a pair of $27 bikini underpants (honest!).

10 reasons that mystery shopping is a frugal hack

  1. Unemployed or underemployed? Sign up for as many restaurant shops as you can.
  2. Seeking new specs? I’ve seen vision-care shops that reimburse $100.
  3. Got kids? Take them to the water park or a pizza joint on someone else’s dime.
  4. Hotel shops are a mini-staycation. The pool, the room service (you will probably be required to order it), the novelty of having a telephone in the bathroom…OK, so I’m easily amused. You might be, too.
  5. Parents aging? Look for shops of retirement homes/assisted living facilities. If nothing else, this might give you an idea of where not to put Mom and Dad.
  6. Too broke to date? Invite that special someone to a night at the casino.
  7. Living the car-free life? Use auto-rental shops to visit a friend who lives two towns over. Go to the pet-food warehouse and buy giant sacks of kibble. Hit the warehouse club for six months’ worth of toilet paper.
  8. Want to treat a friend? Take him out for a glass of wine.
  9. Need to send money to a relative, either as a gift or repayment of a loan? Watch for shops that pay you to get money orders.
  10. Want to go downtown? Find one or two parking-garage shops. This means not just parking reimbursement but an extra $10 or $20 toward whatever you want to do. (A frugal culture hack, maybe?)

What’s in it for you?
For most restaurant shops, the fee is small or even nonexistent — you’re generally in it for the free lunch. For others, the pay range is generally $10 to $35 plus product reimbursement.

However, I’ve earned as much as $95 for a banking assignment that took several visits. Some shops have bonuses because they’re due immediately (another shopper flaked out, leaving the provider in the lurch).

Three things you’ll need:

  • A separate e-mail account. The notifications can pile up pretty quickly. Check it regularly because the popular shops go fast.
  • A watch that times in minutes and seconds.
  • A PayPal account. Some companies reimburse only electronically.

Does all this sound like a lot of work? It is and it isn’t. How many times have you raved to your pals about a great restaurant or groused about a crummy one? This time you’ll be doing it in writing. It’s like Yelp, except that you get paid (or at least fed) to do it.

If you’re strapped for cash but still love the mall, then mystery shopping gives you a little money to feed your hobby. The temptation to keep spending might be strong, however, so consider choosing assignments that don’t have anything to do with your weaknesses (sports cars, nickel slots, really cute shoes).

Nobody gets rich doing this. But it’s a way a little cash to do things you might want to do anyway, such as having lunch out or wandering through a nice store. Or marveling at $27 underpants.


Related Articles at Get Rich Slowly – Personal Finance That Makes Sense.:

Law Professor Explains How Even When A Site Copies An Entire Article, It May Still Be Fair Use

By Mike Masnick, Techdirt.December 21, 2010 at 01:35AM

Berkeley law professor Jason Schultz has filed an excellent amicus brief in one of the many Righthaven lawsuits, pointing out that using an entire article does not preclude fair use (pdf), and then going on to explain why the use of an entire article in this particular case (which Righthaven brought against the Center for Intercultural Organizing) was almost certainly fair use. Basically, Righthaven has taken the approach that if an entire article is being used, then there can be no fair use. However, as Schultz points out, that’s not at all what copyright law says:


A fair-use inquiry balances four statutory factors…. Righthaven,
however, asks this Court to ignore those traditional factors and embrace an inflexible, one-factor
test that prohibits a fair-use finding whenever an entire copyrighted work is used. That approach
finds no support in the text and purposes of the Copyright Act and the cases interpreting it.
Indeed, the Supreme Court, the Ninth Circuit, and this Court have all found the use of entire
copyrighted works to be consistent with the fair-use doctrine. Those rulings recognize that
copyright law balances two important public interests: promoting creative expression and
encouraging the use of copyrighted works for socially beneficial purposes.

It is a common misconception that using an entire work means there’s no fair use defense. We’ve repeatedly pointed out cases where courts have found fair use, even if an “entire” work was being used. But, still, we get commenters all the time who argue that there’s no such thing as fair use if you use an entire work. Schultz, in his brief, highlights many more examples, including the explanations of why each case was still deemed as fair use. From there, he goes on and runs through the four factors in this particular case, and explains why it should be considered fair use as well. It will be interesting to see how the judge rules, because that could impact many other Righthaven cases as well.

Permalink | Comments | Email This Story



Database of Private SSL Keys Published

By CmdrTaco, SlashdotDecember 20, 2010 at 09:53AM

Trailrunner7 writes “A new project has produced a large and growing list of the private SSL keys that are hard-coded into many embedded devices, such as consumer home routers. The LittleBlackBox Project comprises a list of more than 2,000 private keys right now, each of which can be associated with the public key of a given router, making it a simple matter for an attacker to decrypt the traffic passing through the device. Published by a group called /dev/ttyS0, the LittleBlackBox database of private keys gives users the ability to find the key for a specific router in several different ways, including by searching for a known public key, looking up a device’s model name, manufacturer or firmware version or even giving it a network capture, from which the program will extract the device’s public certificate and then find the associated private SSL key.”

Read more of this story at Slashdot.

Scientists attempt to predict flu spread, give ZigBee radios to 700 high school students

By Sean Hollister, Engadget RSS FeedDecember 20, 2010 at 07:43AM

This is the Crossbow TelosB wireless remote platform, and it did an important job for science in January of last year — it monitored the close proximity interactions among 788 students and staff at one US high school to track a virtual flu. After collecting over 762,000 sneeze-worthy anecdotes among the module-toting teachers and teens, Stanford researchers ran 788,000 simulations charting the path the virus might take and methods the school might try to keep it in line. Sadly, the scientists didn’t manage to come up with any easy answers, as virtual vaccination seemed to work equally well (or poorly) no matter who got the drugs, but that if only we could actually monitor individuals in real life as easily as in a study, prevention would be much easier. But who will bell the cat, when it’s so much less political to ionize?

Scientists attempt to predict flu spread, give ZigBee radios to 700 high school students originally appeared on Engadget on Mon, 20 Dec 2010 08:43:00 EDT. Please see our terms for use of feeds.

Permalink Medgadget  |  sourceStanford University  | Email this | Comments

Most Popular Hive Fives of 2010 [Best Of 2010]

By Jason Fitzpatrick, LifehackerDecember 17, 2010 at 08:00PM

Most Popular Hive Fives of 2010Every week we put out a Hive Five Call for Contenders and ask you a simple question: Which is best? From DVD rippers to web hosts and everything in between, here’s a look back at the most popular Hive Fives of 2010.

Photo remixed from an original by Matt Katzenberger

The Hive Five gives us a chance to put interesting topics before the Lifehacker readership, see what’s popular, and then round up the top five contenders for a vote. Sometimes the winner seems obvious from the start but the real value of the Hive Five isn’t finding out the absolute most popular tool around, it’s finding out four other very solid options you may have been unaware of.

Five Best DVD-Ripping Tools

Most Popular Hive Fives of 2010
You’ve got DVDs and you’ve got media servers and portable devices that need to be fed with fresh media. DVD-ripping tools bridge the gap and help you turn your optical media into files you can enjoy without a DVD player.

Five Best Windows 7 Tweaking Applications

Five Best Windows 7 Tweaking Applications
Windows 7 brought numerous improvements over previous incarnations of Windows-especially for those making the leap from Windows XP to Windows 7. People still love to tweak and customize their operating system, no matter how many improvements it contains. The Windows Tweaking Hive Five was one of our most popular for the entire year, testifying to the love the Lifehacker readership has for customization.

Five Best Netbook Operating SystemsMost Popular Hive Fives of 2010

Netbooks are inexpensive, popular, and prime targets for tweaking and custom operating systems. From Windows, Linux, and OS X as well as custom netbook-centric packages all made an appearance in this popular Hive.

Five Best Email Clients

Most Popular Hive Fives of 2010
Despite the popularity of web-based email clients more than a few desktop clients made a solid showing in the email Hive Five. If you’re looking for something beyond your web-client that isn’t Outlook, it’s worth taking a peek.

Five Best Start Pages

Most Popular Hive Fives of 2010
The start page is the first thing you see when you open your browser or launch a new tab. If you’re tired of looking at your browser’s default page, the start pages Hive Five offers a variety of alternative pages.

Five Best Music Streaming Services

Most Popular Hive Fives of 2010
You like tunes? You like tunes delivered to you by the magic of the internet wherever you are? Check out the music streaming Hive Five for Lifehacker readers’ five favorite music streaming services.

Five Best Computer Diagnostic Tools

Five Best Computer Diagnostic Tools
Just because computers get easier and easier to use doesn’t mean they don’t need a checkup now and then. Load up your tool bag with five great tools so that when trouble strikes-and it will-you’ll be ready to diagnose the problem.

Five Best Public BitTorrent Trackers

Most Popular Hive Fives of 2010
If you’re looking to set sail for the Isle of Sharing you’ll need some directions. Torrent trackers help you find new files and direct your client to the swarm of people out there sharing them. Check out the five best public BitTorrent trackers to find the files you’re looking for.

Five Best VPN Tools

Most Popular Hive Fives of 2010
Virtual Private Network software allows you to join together far flung networks and the computers on them as though they were all sitting together in the same office. Whether you need to link your entire office to another office across the country on your home network to your buddy’s across town, you’ll find a tool to help in this Hive Five.

Five Best Online File Sharing Services

Five Best Online File Sharing Services
BitTorrent is great for sharing popular files, a dedicated server is great for sharing private files, but what about the times you just want to shoot a big file from your computer to a remote one? When you want to share files with no fuss and in private the five solutions in the file sharing Hive have you covered.

Five Best Tools for Managing Your Multi-Monitor Setup

Five Best Tools for Managing Your Multi-Monitor Setup
You’ve got monitors and lots of them. Grab some apps to maximize your bountiful screen real estate; don’t let any of those precious pixels go to waste just idling away. This Hive Five includes tools that extend your task bar, manage your wallpapers, and enhance window and monitor management.

Five Best Offline Backup Tools

Five Best Offline Backup Tools
If you’re not backing up your data you’re playing a dangerous game-all disks die, it’s just a matter of when. Hop into the Offline Backup Hive Five and grab an application or two to help you wrangle offline backups and ensure your data is secure.

Five Best Personal Web HostsFive Best Personal Web Hosts

Massive enterprise-level web hosting solutions are overkill for private and small-time web sites. In this particular Hive we took a look at the best personal web hosts and what made them well suited for adventures in personal web mastering.

Five Best Places to Buy Cheap Textbooks

Five Best Places to Buy Cheap Textbooks
Who wants to spend a fortune on textbooks? Nobody, which is why the Cheap Textbooks Hive Five was so popular. If you’re in school, have kids in school, or are returning to school yourself, you can save a boatload of cash by shopping on textbook web sites.

Five Best File Encryption Tools

Five Best File Encryption Tools
Unless you enjoy people climbing in your windows and snatching your data up, you better be encrypting it. From whole disk encryption to encrypted volumes and portable files, the Encryption Tools Hive Five has everything you need to make sure your tax returns, diary, and extensive Sailor Moon collection is safe from prying eyes.

Five Best Music Discovery Services

Five Best Music Discovery Services
Gone are the days of relying on a local DJ or record shop employee to introduce you to new music. If you’re in the mood for new music make sure to check out the Music Discovery Services Hive Five to see where you fellow readers discover new artists.


Curious what Hive topics were popular last year? You can check out the most popular Hive Fives of 2009 here.

How To Build A Basic Web Crawler To Pull Information From A Website (Part 2)

By James Bruce, MakeUseOfDecember 17, 2010 at 12:31PM

build a webcrawlerThis is part 2 in a series I started last time about how to build a web crawler in PHP. Previously I introduced the Simple HTML DOM helper file, as well as showing you how incredibly simple it was to grab all the links from a webpage, a common task for search engines like Google.

If you read part 1 and followed along, you’ll know I set some homework to adjust the script to grab images instead of links.


I dropped some pretty big hints, but if you didn’t get it or if you couldn’t get your code to run right, then here is the solution. I added an additional line to output the actual images themselves as well, rather than just the source address of the image.

<?php
include_once('simple_html_dom.php');
$target_url = "https://www.tokyobit.com";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach($html->find('img') as $img)
{
echo $img->src."<br />";
echo $img."<br/>";
}
?>

This should output something like this:

build a webcrawler

Of course, the results are far from elegant, but it does work. Notice that the script is only capable of grabbing images that are on the content of the page in the form of <img> tags – a lot of the page design elements are hard-coded into the CSS, so our script can’t grab those. Again, you can run this through my server and if you wish at this URL, but to enter your own target site you’ll have to edit the code and run on your own server as I explained in part 1. At this point, you should bear in mind that downloading images from a website is significantly more stress on the server than simply grabbing text links, so do only try the script on your own blog or mine and try not to refresh lots of times.

Let’s move on and be a little more adventurous. We’re going to build upon our original file, and instead of just grabbing all the links randomly, we’re going to make it do something more useful by getting the post content instead. We can do this quite easily because standard WordPress wraps the post content within a <div class=”post”> tag, so all we need to do is grab any “div” with that class type, and output them – effectively stripping everything except the main content out of the original site. Here is our initial code:

<?php
include_once('simple_html_dom.php');
$target_url = "https://www.tokyobit.com";

$html = new simple_html_dom();

$html->load_file($target_url);
foreach($html->find(‘div[class=post]‘) as $post)
{
echo $post.”<br />”;
}

?>

You can see the output by running the script from here (forgive the slowness, my site is hosted at GoDaddy and they don’t scale very well at all), but it doesn’t contain any of the original design – it is literally just the content.

Let me show you another cool feature now – the ability to delete elements of the page that we don’t like. For instance, I find the meta data quite annoying – like the date and author name – so I’ve added some more code that finds those bits (identified by various classes of div such as post-date, post-info, and meta). I’ve also added a simple CSS style-sheet to format the output a little. Daniel covered a number of great places to learn CSS online if you’re not familiar with it.

As I mentioned in part 1, even though the file contains PHP code, we can still add standard HTML or CSS to the page and the browser will understand it just fine – the PHP code is run on the server, then everything is sent to the browser, to you, as standard HTML. Anyway, here’s the whole final code:

<head>
<style type=”text/css”>
div.post{background-color: gray;border-radius: 10px;-moz-border-radius: 10px;padding:20px;}
img{float:left;border:0px;padding-right: 10px;padding-bottom: 10px;}
body{width:60%;font-family: verdana,tahamo,sans-serif;margin-left:20%;}
a{text-decoration:none;color:lime;}
</style>
</head>

<?php
include_once(‘simple_html_dom.php’);

$target_url = “https://www.tokyobit.com”;

$html = new simple_html_dom();

$html->load_file($target_url);
foreach($html->find(‘div[class=post]‘) as $post)
{
$post->find(‘div[class=post-date]‘,0)->outertext = ”;
$post->find(‘div[class=post-info]‘,0)->outertext = ”;
$post->find(‘div[class=meta]‘,0)->outertext = ”;
echo $post.”<br />”;
}

?>

You can check out the results here. Pretty impressive, huh? We’ve taken the content of the original page, got rid of a few bits we didn’t want, and completely reformatted it in the style we like! And more than that, the process is now automated, so if new content were to be published, it would automatically display on our script.

build a webcrawler

That’s only a fraction of the power available to you though, you can read the full manual online here if you’d like to explore it a little more of the PHP Simple DOM helper and how it greatly aids and simplifies the web crawling process. It’s a great way to take your knowledge of basic HTML and take it up to the next dynamic level.

What could you use this for though? Well, let’s say you own lots of websites and wanted to gather all the contents onto a single site. You could copy and paste the contents every time you update each site, or you could just do it all automatically with this script. Personally, even though I may never use it, I found the script to be a useful exercise in understanding the underlying structure of modern internet documents. It also exposes how simple it is to re-use content when everything is published on a similar system using the same semantics.

What do you think? Again, do let me know in the comments if you’d like to learn some more basic web programming, as I feel like I’ve started you off on level 5 and skipped the first 4! Did you follow along and try yourself, or did you find it a little too confusing? Would you like to learn more about some of the other technologies behind the modern internet browsing experience?

If you’d prefer learning to program on the desktop side of things, Bakari covered some great beginner resources for learning Cocoa Mac OSX desktop programming at the start of the year, and our featured directory app CodeFetch is useful for any programming language. Remember, skills you develop programming in any language can be used across the board.


Follow MakeUseOf on Twitter. Includes cool extras.


 

 

Similar MakeUseOf Articles

How to Stay Secure Online [Video]

By Adam Dachis, LifehackerDecember 17, 2010 at 10:00AM

How to Stay Secure OnlineIn light of recent events, security has been a serious priority for all of us. Although there is no 100% full-proof plan, there are ways to greatly improve your online security and plan for the worst. Here are our recommendations.

The Bad News: Nobody’s Safe

How to Stay Secure OnlineDuring the Summer of my freshman year in high school, I worked at a grocery store as a bag boy and saved up for a laptop. At the end of the Summer I was finally able to buy one. Nowadays laptops are commonplace, but this was back when they were pretty rare. I loved it, and I put my life into that computer. A year later I set it up to print in the computer lab and ran into the other room for 30 seconds. When I returned, the laptop was gone. I was amongst people I trusted and gone for under a minute but, still, it didn’t matter: the laptop was gone. I thought I’d somehow get it back, but it didn’t take long to realize that wasn’t going to happen. But, ultimately, it wasn’t the laptop I wanted back. I quickly realized all my personal information—all my secrets—were in the hands of someone I’ll never find. Someone gained the potential to know the darkest parts of my life and I’ll never know who they are. This experience taught me two things:

  1. No matter how safe you think you might be, something bad can always happen.
  2. The only way to ensure your private information always remains private and in your control is if it never leaves your own head.

The internet and reality aren’t much different, in that sense. There is plenty of, if not more, risk in the real world than there is on the web, but we’re just more accustomed to dealing with it. The online world is still very young and so we’re learning to protect ourselves as we go along. Nonetheless, like with anything, there is no surefire protection. The web is imperfect. We are imperfect. Ultimately, no site is un-hackable. A person or group with enough knowledge and determination can bring nearly any site down. That said, we can certainly try our best to protect ourselves and be prepared for worst-case scenarios.

Create Strong, Resilient Passwords

How to Stay Secure Online
There are several ways to keep remarkably strong passwords, but every strategy has a point of weakness and a level of inconvenience that you’re going to have to accept. We’re going to go over a method that we feel is all-around the best way to go, but include a few variations along the way so you can decide what suits you best.

Create Strong, Secure Passwords that Even You Don’t Know

How to Stay Secure OnlineWhen it comes to our own, individual online security we put a lot of trust in our password managers. Password managers keep track of your passwords on multiple sites so you never need to remember your password when it’s time to log in. This way you can memorize your one master password and never have to worry about remembering any of the others. This is enormously convenient, but what’s more important is the added security benefits. A good password manager can help generate incomprehensible passwords, store them in its database, and decode them locally, only one your machine, when it needs to enter them into the web site. You can use a password manager to generate a unique, complex password for every site you visit. Each site will have a different password, you’ll have no idea what any of them are, and all you’ll have to do is remember the one master password you set for it.

How to Stay Secure OnlineWhile there are a number of good password managers out there, like KeePass and 1Password, our favorite is LastPass. LastPass offers incredibly wide support for several operating systems, web browsers, and mobile phones. It’s also completely free, remarkably secure, and comes with many features to help you stay as protected as possible. Since you’re likely not without a few passwords at this point in your life online, LastPass can help you audit and update your passwords to make them more secure.

But what about creating a secure master password?

While all the passwords LastPass (or your other password manager) will generate will be about as strong as they can be, you want to have a strong master password as well. While your password manager can generate one for you, often times it’s going to be too hard to remember and too inconvenient to type (especially on a mobile phone). If you don’t mind the extra work for the extra security, your best bet is to have the most secure password you can have. If you want something you’re sure you won’t forget, Mozilla offers an easy way to create a strong password you’ll be able to remember:

How to Stay Secure Online

If you’re not in the mood for a cute strong password public service announcement, the concept goes something like this:

  • Pick a phrase you can remember with a number in it, like “A bird in the hand is worth two in the bush.”
  • Change that number (in this case, “two”) to its numerical equivalent: A bird in the hand is worth 2 in the bush
  • Condense the phrase by only using the first letter of each word: Abithiw2itb
  • Add some special characters you can remember: #Abithiw2itb!

How to Stay Secure OnlineDoing this gives you all the characteristics of a good, strong password: lowercase and capital letters, at least one number, special characters, and a combination of those things that basically makes no sense when you look at it and turns out to be longer than eight total characters.

While we recommend generating complex passwords with your password manager, you can use this same technique to create unique passwords for individual sites. You can take the password and add a suffix specific to each web site. Sticking with out example, let’s say you wanted to use this password for Lifehacker. Just add :L1feh@cker, :Lh, or whatever you’ll be able to remember to the end of the password: #Abithiw2itb!:Lh. This way you can type your complex password as you normally would and just append your abbreviation for the site you’re logging into. This method is a little easier, but it’s not impossible for someone to figure out. Ideally you’ll want to let your password manager handle your password generation for you, but if that’s just not for you then this method is a reasonable alternative.

How to Stay Secure Online If at any point you’re not sure about your password’s security, head on over to How Secure Is My Password? to get an approximation of how long it would take to crack using an average desktop computer. Our example (#Abithiw2itb!) would take about seven billion years, which seems pretty good. If you’re satisfied with the password you’ve derived, you’ve got your new master password. If you’re not, keep trying and checking.

Keep Your Other Information Protected

Your passwords are not the only kind of important information you don’t want floating around the internet, and chances are you have a few gadgets you wouldn’t want to fall into the wrong hands. Fortunately there are quite a few ways you can

Protecting Your Credit Cards

How to Stay Secure OnlineIf you shop online, your credit card number has been entered into at least one web site. While this is unavoidable, and just about as safe as using your credit card out in the real world, the fact still remains that your number could be intercepted and used to make unauthorized purchases. One easy way around that problem is using temporary credit card numbers. While not every bank offers this service, if yours does you might want to take advantage of it. If you’re making a purchase online—especially at a site you don’t trust—you just generate a unique credit card number that will expire after its first use. This is also extremely helpful if you sign up for a trial and want to prevent automatic re-billing.

Keeping Your Mobile Technology Secure

How to Stay Secure OnlineThere really isn’t any assurance your technology won’t get stolen someday. As previously mentioned, it happened to me in less than a minute. Fortunately there are a number of tools to keep your laptops and mobile phones secure from tampering, or at least initiate a remote data wipe in the event of a breach.

One of our favorite tools is Prey, which is a free tool (for up to three devices) that can help you track and (potentially) recover your stolen laptop or Android smartphone. If you’re looking for a solution for your iOS device, Apple now offers find my iPhone for free. If you’re not using an iPhone 4, it is still possible to enable the free Find My iPhone, but it’ll take a little bit of extra work. Once you get it up and running, you’ll be able to remotely locate your iPhone, send it a message, and wipe your personal data. To get started, you can download Find My iPhone in the iTunes App Store. Despite the name, it’ll work with any iOS device (but GPS and 3G service certainly help).


That just about wraps it up for guide to online security. With so many options out there, it’s hard to cover the entire spectrum. If you feel we’ve missed something or have some good tips, please share them in the comments. Thanks for reading, and stay safe!

You can contact Adam Dachis, the author of this post at adachis@lifehacker.com. You can also follow him on Twitter and Facebook.