This article only helpful for eSyndiCat version PRO1.2.x and FREE1.5
In this article I would like to answer the following questions that eSyndiCat directory script users often ask about cron jobs:
1. What are cron jobs?
2. Why are they used in eSyndiCat?
3. How do I set up cron jobs?
So, what the heck a cron job is? Simply put a cron job is an automated process that operates at predefined time intervals.
This is how it works: you tell the operating system what program/script and when to execute and the operating system takes care about the rest.
This is very simplified explanation but if you want to learn more about how the Unix-like systems perform this task please read this Wikipedia article.
The keyword "automated" in the definition above answers the second question. Let me explain what I mean.
You can set up eSyndiCat so that it performs the following tasks: checks for broken links, checks for valid/invalid reciprocal links, updates PageRank values, and finds out expired paid submissions.
Cron jobs allow to automate this process. And without cron jobs you would have to do this manually.
Cron jobs are a great time saver that's why they are used in eSyndiCat!
In other words you set up cron jobs to perform all the aforementioned checks automatically. You set it up once and forget about it. It's that simple.
Well, probably not that simple since you yet have to learn how to set up cron jobs properly ...
... which I will be talking about in the rest of the article.
Well, we have implemented a new cron job package that you have to download and install.
1. Log into eSyndiCat Customer Area.
2. Go to Downloads » Mods & Patches.
3. If you are using Pro version (1.2 or 2.0) click the "Cron Jobs Pro" link, otherwise click the "Cron Jobs Free" link. The download process will start shortly.
4. If you are using Pro version (2.1.x) it isn't necessary to download any files. All necessary files are included in the version's package.
The downloaded package contains the following files:
readme.txt /cgi-bin/checksum.cgi /cron/check.php /cron/header.inc.php /cron/install.php /cron/path.php /cron/php-bug.php
If you already have non-empty cron/ subdirectory in your eSyndiCat folder please remove all the files in it. Otherwise create a new cron/ subdirectory.
cron/
Upload the following files in the package into the cron/ subdirectory:
/cron/check.php /cron/header.inc.php /cron/install.php /cron/path.php /cron/php-bug.php
Run the install.php by typing a URL in your address bar similar to this:
install.php
http://www.mydomain.com/dir/cron/install.php
This will run the installation script. This script executes several MySQL queries that slightly modify the database.
http://www.mydomain.com/dir/
At this point the installation is almost done.
Yes, I said "almost" because if you want the script to update PageRank values properly you will have to follow the instructions outlined in the next section.
On the other hand, if you don't display PageRank and don't care about it just ignore the section below.
To start with let me describe the process of fetching PageRank values and explain the potential problem. After that I will show you the workaround.
So, the script is given a URL for which it should return a valid PageRank value.
First of all, it calculates a so-called checksum for the URL. The checksum is something required by Google along with a URL in order for it to return the PageRank value.
Once the checksum is calculated the script connects to Google, passes the checksum and the URL to it. And in return Google gives the PageRank value back to the script.
The problem is that if the script passes invalid checksum Google returns error instead of PageRank.
And that's exactly what happens on some servers. Some versions of PHP have a bug that prevents the script from calculating correct checksums.
This has been discussed on GoogleCommunity forum so you can refer to it for more details.
Now you have to find out if you PHP version has this bug or not.
In your address bar type a URL similar to this:
http://www.mydomain.com/dir/cron/php-bug.php
If the script reports that your PHP version is ok then you can safely ignore the rest of this section and jump straight to setting up cron jobs in your CPanel.
Otherwise, please read on ...
Now that you know that your PHP version is buggy you have to do something about it.
The workaround is to install an alternative Perl CGI script for calculating checksums. eSyndiCat will use it instead of built-in PHP function. This Perl script comes with the package you have just downloaded.
To install it follow these steps:
1. Copy /cgi-bin/checksum.cgi from the package to the /cgi-bin/ folder on your server.
/cgi-bin/checksum.cgi
/cgi-bin/
2. Go to the /cgi-bin/ folder on your server and issue the following command:
chmod 755 checksum.cgi
3. Now you can check if the script is working properly. Open a URL similar to this:
http://www.mydomain.com/cgi-bin/checksum.cgi?url=www.google.com
If you see the number 63385224020 then the script is ok. Otherwise please submit a ticket to support at esyndicat dot com.
63385224020
If everything went smoothly then my congrats: you have successfully applied the workaround and are ready to read the next section.
In order to put it all together you have to complete one final step: set up a cron job in your control panel.
1. Log into CPanel. Find the Cron Jobs icon/link. Click it.
2. Click the Standard button.
3. Put a valid email address into the email field (the very first edit box on the Standard Cron Manage page).
In case the cron script produces some unexpected errors all of them will be sent to this email.
4. Put a command into the "Command to run" box.
A typical command looks like this:
path_to_php -f full_path_to_script
Where path_to_php is the full path to PHP executable. Usually this is /usr/local/bin/php. But this might differ on your server so you had better consult with your hosting.
path_to_php
/usr/local/bin/php
full_path_to_script is the full filesystem path to cron/check.php file.
full_path_to_script
cron/check.php
I have included an auxiliary script that will let you quickly find out what the full path to the cron script on your server is. Please type a URL like this:
http://www.mydomain.com/dir/cron/path.php
And you will see the full path.
5. Select time interval. As we all know a good picture is worth a thousand words. So below are three examples of different time intervals.
Run every 15 minutes.
Run every hour.
Run once per day.
Usually running the script every hour would suffice, but this may vary depending on the number of links in your directory: more on this in the following section.
I've been thinking for about an hour (!) on how to describe the concept (used by the new cron package) I'm going to introduce here.
And I seem to have found the solution: I will start with an example.
So, here we go ...
... imagine you have 10,000 links and all of them should be checked within one week (7 days). By checking I mean for each link the script must perform at least one of the following tasks (depending on the settings in your admin panel):
1. Determine if the link is broken or not.
2. Check if the reciprocal URL is valid or not.
3. Update PageRank of the link.
Imagine also that it takes 1 second to check one link (in practice it may take much longer). Thus it takes 10,000 seconds or about 3 hours for the script to check all the links.
It means that for 3 hours your server will be very busy checking the bunch of links at once, without interruptions. This will probably slow the server down to a crawl so that it will take forever for a visitor to open this or that page.
Now imagine another scenario.
The script checks only a small chunk of links every hour. Say, 100 links per hour. In this case the server will only be busy for less than 2 minutes each hour.
And still it will be able to check all the links within a week. Actually, in this example the script will be able to check up to 16,800 links:
100 links per hour BY 24 hours per day BY 7 days a week
I think that the second approach is much better since it distributes the load over the whole week.
And this approach has been implemented in the new cron package.
You are now ready to grasp the concept of cycles.
The major cycle is the period within which all the links in your database should be checked by the cron. In our example it is one week or 7 days. You can adjust this value in your admin panel.
The minor cycle is the time interval between two consecutive executions of the cron script. You set this interval in your CPanel as described in the previous section: 15 minutes, 1 hour, 1 day, or whatever.
Each execution of the script checks only a small fixed amount of links. In our example it is 100 links. You can also adjust this number in your admin panel.
As you might have guessed, one major cycle includes as many minor cycles as necessary to check all the links in the database.
Along with the new concept the package introduces a couple of new settings. If you go to Admin Panel » Configuration » Cronjob Configuration you will see this in Free 1.5/Pro 1.2:
... or this in Pro 2.0:
New settings are highlighted in red.
Number of links to check in one run controls how many links are checked within one execution of the cron script (or within one minor cycle).
Check interval (in days) is the length of one major cycle.
The length of one minor cycle is set up in CPanel on the Cron Manage page.
By adjusting these three values: number of links, the length of minor cycle, and the length of major cycle you can balance the load on the server.
For the majority of directories the following set up is recommended: number of links - 100, minor cycle - 1 hour, major cycle - 7 days.
If you have really large directory with lots of links you will have to either increase major cycle (say, 30 days), or increase the number of links (say, 200 - 300 instead of 100). Or apply both of these.
Now that you have successfully installed the cron job script you might want to get rid of some of the [now] unnecessary files.
They are:
cron/install.php cron/path.php cron/php-bug.php
Just remove these files from your server and you are done.
Now that the cron script is set up you can forget about checking broken and reciprocal links, and updating PageRank -- the script will do it for you regularly and automatically.
Phew! This has been a long and, hopefully, useful article. But if you still have questions concerning cron jobs feel free to ask.
--
Best regards,Vincent Wright