View Full Version : Search by distance?
I like to use eSyndiCat to list restaurants. Users should be able to search by distance.
I have seen geo databases for sale. Is it possible to use that data for a range search? Or do you have any other ideas in providing such a feature?
cyque
10-20-2006, 05:37 PM
Jerx,
I've never implemented such a feature, but am considering doing so (in the future) for a project that I am working on.
I believe what you need is a zip code database that contains the GPS coordinates. There are algorithms out there that can generate the "straight line" distance between GPS coordinates. So, a user puts in their zip code and you calculate the closest zipcodes by GPS coordinates and then order your results by those zip codes. It is not 100% accurate because it uses "straight line" distance and not actual road mileage, but it is pretty close. It also can't tell which restaurant is closer if they both have the same zip code.
To do this properly and quickly, the script could be pretty advanced (and the database large). I would think you would almost HAVE to precalculate the distance between every single zip code in the database and store it. This way you would have to crunch numbers on thousands of zip codes on each request.
-Cyque
Vincent Wright
10-21-2006, 09:31 AM
I remember writing a script for calculating a distance between two zip codes. But it never came to my mind to cache the results, i.e. precalculate the distance between each two zip codes. Brilliant idea!
The only problem is the size of the database. I'm not sure how many zip codes are there. But assuming there are 50,000 zip codes you have to precalculate 50,000 x 50,000 distances (which itself may be a resource consuming task). This results in 2,500,000,000 distances, or 2,500,000,000 records in the database.
I'm not sure if MySQL is capable of handling such amounts of data QUICKLY.
I think the only way to know this is to test it.
Michele
10-22-2006, 05:55 PM
Wouldn't be as exact a thing as the zip code stuff, but maybe (assuming we are talking US locations) divide each state into several regions and present the results in city order. While this wouldn't be an exact science, you could provide a map of some sort of each state and create a much smaller, albeit a bit less precise system.
If you have ever looked at America's Job Bank (at least in the past) have done something like this.
Please let us know what you come up with as I have a project floating in the back of my mind that could benefit from a similar feature.
cyque
10-25-2006, 04:57 PM
Vincent,
I think later versions of MySQL can handle it, especially if you optimize the database properly. However, I have never had a db that big so I don't know for sure.
However, if you DON'T cache it, you would have to calculate all of those EVERY TIME someone requested a page to ensure that you had the closest one. You could get a little more advanced with your algorithm to speed it up. Zip code prefixes (the first two digits) are state specific, I believe. If you are trying to find things closes to an address in Alabama, you could do AL and the states surrounding it and keep "circling out" until you found the number of nearest results you desire. For instance, look at all the items in alabama, georgia, ten, florida, mississippi to find the nearest ten items. If you have ten, then you are done. If you need more, look at kentucky, louisiana, etc... This could speed it up, but would still not be as elegant as cacheing them.
I would also rather have a more time consuming task run in the background instead of doing it at run-time, slowing the interface. Also, as far as time goes, calculating all of these once would be less time consuming over a long period of time than doing partial calculation every request.
If you cache the zip code data like this:
unqueID (bigint) fromZip (string) toZip (strong) distance (long)
You then have a zip code and want to find the closest zip codes. To find the nearest ten zip codes you simply need a mysql query some like this (not tested):
SELECT * FROM distancecache WHERE ((fromZip=$fzip) OR (toZip=$fzip)) AND ((fromZip=$tzip) OR (toZip=$tzip)) ORDER BY distance ASC LIMIT 0,10
I did the from and to zip "or statements" so that you don't have to store each of these fields twice, cutting the db records in half.
There are approximately 43,000 zip codes in the US. 43K x 43K is still alot of zips. However, you only need to store the distance between each of the zip codes you have a restaurant record for. For instance, if you have 1,000 entries, you would only have to store 1K x 1K listings which comes to 1,000,000. A million is much more realistic than over 2 billion (43K squared).
Vincent Wright
10-26-2006, 09:44 AM
Greetings cyque,
I absolutely LIKE your approach. Indeed, one doesn't have to store all the distances -- only the ones that relate to listings.
This is how it can be done.
We start with an empty Distances table. When we add the first listing we do nothing. When we add the second listing we calculate the distance between the 2nd and the 1st. Third listing: calc the distance between the 3rd and the 2nd, and the 3rd and the 1st, and so on. Sounds a little bit tricky, but will store as many distances as necessary.
vBulletin® v3.7.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.