View Full Version : New Cron Jobs update: logging added
Vincent Wright
08-09-2006, 11:06 AM
Greetings all cron jobs beta testers,
I have added logging facility to the new package so that if you have some errors or think that cron jobs are not working for you I can inspect your log files and detect errors.
I have sent the updated package to the following members:
sdawkings
safatweb
snerd
BryanEx
mariancon
zamollxis
naplesdave
redeye
djbaxter
lsb
gb
billybh
taffmartin
In case I missed somebody let me know.
P.S.
As you might have noticed we have created a separate sub-forum dedicated entirely to cron jobs. So, please create a new thread each time you encounter an unknown problem or have a specific question. As I see now the initial "New Cron Jobs" thread has now stretched into 10 pages and sometimes it's hard to track who needs what and I just overlook some issues, etc.
snerd
08-10-2006, 09:59 AM
Just a note to let you know that the new cron is installed and working okay. The logging is working good too. I'm only using the broken links cron though, so I can't tell you anything about the recip checker.
I'm ready for the pagerank updater cron whenever you get to it. :lol:
Vincent Wright
08-14-2006, 04:54 AM
Glad to hear that the updated cron is working ok too. Hope someone will report on Recip cron performance.
As for PageRank: this seems to be a real pain in the a**. The initial problem is that on some servers PHP calculates checksum incorrectly (php bug) and thus the script fails to fetch PR value. I'm now trying Perl solution as Perl doesn't seem to have this checksum problem, but requires CGI setup.
Mark Brookes
08-14-2006, 06:53 PM
Hi Vincent:
as requested ... :)
Tests of Reciprocal.php
run from browser - set to check 10 links
* No output to browser screen to indicate any progress
* study of reciprocal.txt indicates testing occurred
run using cron settings for hourly
* email received - nil content to indicate success or problems
* study of reciprocal.txt indicates testing occurred
attached = reciprocal.txt - incl. email content
Vincent Wright
08-15-2006, 04:34 AM
Hmm... I've just looked through reciprocal log file. It says it has not found a single reciprocal link. Is that true? If it's not, could you give me just a couple of URLs (mentioned in the log file) that you are sure have valid reciprocal links?
Thanks.
Mark Brookes
08-15-2006, 08:41 AM
Hi Vincent
The cron has been running hourly overnight - :yahoo: [success]
The reciprocal.txt shows that by 1.am checks reset to 0 each hour :yahoo: [success] - I've seen your post about ending checks after one full check, till the next period starts.
The logging files are superbly helpful in understanding what is going on :D
A couple of ideas:
* Checks Header - in my file it looks like:
Total number of links to check: 0
Cron execution time: 0.2529 seconds.
Cron started on Aug 15, 2006 07:00 AM
-------------------------------------
.
.
.
.<check results>
.
.
.
Total number of links to check: 0
Cron execution time: 0.2470 seconds.
Cron started on Aug 15, 2006 08:00 AM
-------------------------------------
I find the header "disappears" in among the check results. May I suggest a revised layout with both top & bottom dividers, such as:-
==============================
Total number of links to check: 0
Cron execution time: 0.2529 seconds.
Cron started on Aug 15, 2006 07:00 AM
==============================
.
.
.
.<check results>
.
.
.
==============================
Total number of links to check: 0
Cron execution time: 0.2470 seconds.
Cron started on Aug 15, 2006 08:00 AM
==============================
or perhaps
###########################
Total number of links to check: 0
Cron execution time: 0.2529 seconds.
Cron started on Aug 15, 2006 07:00 AM
###########################
.
.
.
.<check results>
.
.
.
###########################
Total number of links to check: 0
Cron execution time: 0.2470 seconds.
Cron started on Aug 15, 2006 08:00 AM
###########################
* Is there a maximum file size for the log files? or do they just keep growing?
Regards
Mark
Mark Brookes
08-15-2006, 08:50 AM
Hmm... I've just looked through reciprocal log file. It says it has not found a single reciprocal link. Is that true? If it's not, could you give me just a couple of URLs (mentioned in the log file) that you are sure have valid reciprocal links?
Thanks.
Hmmm, Yes,
and looking at the overnight log, ALL report a non-valid reciprocal.
Now, I know I retain a lot of spam link details (from my old linnks program) as an attempt to stop the spammers submitting duplicate urls. These are "banned" for display purposes but available for duplicate checking at time of link suggestion.
Nevertheless - I would be expecting something like 100+ valid reciprocals.
I'll look out some valid links and get back to you.
regards
Mark
Vincent Wright
08-15-2006, 08:52 AM
Thank you, Mark.
Looking forward to your response.
Vincent Wright
08-15-2006, 08:55 AM
Just noticed your post about log files.
Ok, I will format log files as you suggest with == dividers.
As for file size: it just keeps growing. Probably not a good idea... My suggestion is to probably reset it after some number of runs.
Mark Brookes
08-15-2006, 09:55 AM
Thank you, Mark.
Looking forward to your response.
:kovarstvo2: ready? ..... (:) )
Hi Vincent
two example of valid reciprocal links, which are reported as INvalid by both Cron check & Manual check.
================================================== ==================
Link Id 56 is in the no-reciprocal list - but appears to be a valid reciprocal.
ID:56
Wedding Jokes and Humour
http://www.weddinghumour.pwp.blueyonder.co.uk/
category: Wedding Speech Help
Edit > reveals
url = http://www.weddinghumour.pwp.blueyonder.co.uk/
reciprocal url = http://www.weddinghumour.pwp.blueyonder.co.uk/www/speech-writers.htm
Browsing to http://www.weddinghumour.pwp.blueyonder.co.uk/www/speech-writers.htm displays a page where the links to sparklingspeeches is clearly displayed
Manual check of reciprocal => Please check it manually. Reciprocal link seems to be invalid.
================================================== ==================
================================================== ==================
Link Id 113 is in the no-reciprocal list - but appears to be a valid reciprocal.
ID:113
Flash Wedding Websites by Weddingorg.com
http://www.weddingorg.com/
category: Wedding Couple's Web Sites
Edit > reveals
url = http://www.weddingorg.com/
reciprocal url = http://www.weddingorg.com/links/weddingspeechesandhumour.htm
Browsing to http://www.weddingorg.com/links/weddingspeechesandhumour.htm displays a page where the links to sparklingspeeches is clearly displayed
Manual check of reciprocal => Please check it manually. Reciprocal link seems to be invalid.
================================================== ==================
regards
Mark
Mark Brookes
08-15-2006, 10:02 AM
Just noticed your post about log files.
As for file size: it just keeps growing. Probably not a good idea... My suggestion is to probably reset it after some number of runs.
OK, <brainstorm mode on>
How about ...
* reciprocal.txt keeps on growing until it reaches the start of the 'next' period (when it will start working through the database again)
* Then it renames the current reciprocal.txt as reciprocal-1.txt
and starts a new log reciprocal.txt.
* Then next time it starts a new period:
--- it renames the reciprocal-1.txt as reciprocal-2.txt
--- it renames the current reciprocal.txt as reciprocal-1.txt
--- and starts a new log reciprocal.txt.
<brainstorm mode - forecasts: gentle breeze only>
regards
Mark
Vincent Wright
08-15-2006, 11:21 AM
to Mark Brookes: Concerning valid recip reported as invalid
I seem to have found out why those links were reported broken. It's flawed regular expression that checked for presense of <a> tag with appropriated href value. But it ignored the fact that there are other attributes like target. In our case it was target="_blank" and target="_new".
In brief: I have just fixed it and sent the updated reciprocal.php to you. Please give it a try. I recommend resetting last check dates so that the whole cycle starts over again. You can do this by executing this query:
UPDATE `dir_links` SET `last_recip_check_date` = NULL
Of course, don't forget to replace dir_ prefix with yours.
P.S.
I also changed log file formatting as you suggested in a post above.
Vincent Wright
08-15-2006, 11:24 AM
OK, <brainstorm mode on>
* reciprocal.txt keeps on growing until it reaches the start of the 'next' period (when it will start working through the database again)
Why didn't it occur to me? :wallbash: ;)
* Then it renames the current reciprocal.txt as reciprocal-1.txt
and starts a new log reciprocal.txt.
* Then next time it starts a new period:
--- it renames the reciprocal-1.txt as reciprocal-2.txt
--- it renames the current reciprocal.txt as reciprocal-1.txt
--- and starts a new log reciprocal.txt.
I have to think on how to implement this.
Mark Brookes
08-15-2006, 11:39 AM
[quote=Vincent Wright]to Mark Brookes: Concerning valid recip reported as invalid
UPDATE `dir_links` SET `last_recip_check_date` = NULL
Of course, don't forget to replace dir_ prefix with yours.
[quote]
Hi Vincent: Cron via browser is running ... meanwhile
I tried setting config > cron check interval to 0 days thinking that this would force a re-test. but it did not force a re-test ... I wonder why?
So then I came back to the forum and found your posting with the SQL.
* thanks for teaching me how to import/run these yesterday
* browsing table_links I can see them att re-set to null.
Great Stuff ... I'll get back to you with the reciprocal.txt results.
Mark Brookes
08-15-2006, 11:48 AM
OK,
bad news I'm afraid.
cron tests done - reciprocal.txt attached. All tests = not valid.
I manually tested link ID 56 & 113. They both reported as Not Valid even though we know they are.
This looks like exactly the same results as before. Perhaps I did not upload the new reciprocal.php properly (?) can you give me a line number and code for yout latest change for me to check against my live version?
(also emailing you)
Vincent Wright
08-15-2006, 12:44 PM
First of all, I added one more link attribute that's now being logged, namely, it is reciprocal url value.
Second, I have checked the updated reciprocal.txt file. It checked only 10 links and links 56 & 113 are not among them. Could you please re-run it several times so that those links are checked as well.
Third, please log into your admin panel, go to Configuration >> General Configuration and tell me the value of "Reciprocal checking for the following URL". This is exactly the value each partner page is being checked against.
Mark Brookes
08-15-2006, 04:40 PM
Hi Vincent
More log-file ideas:
First I'd like to say the log files are SO useful!!! :applause:
here's some more ideas :)
extract from my log
=====================================
Cron started on Aug 15, 2006 16:00 PM
Cron execution time: 339.7054 seconds.
Total number of links to check: 100
=====================================
URL: http://www.toastmasters.org/
ID: 41
REPLY: Reciprocal URL is not well formed
RECIP: not valid
TIME: 0.0001 seconds to check the link.
[1] In the check header can the log document what is being checked for:
eg:
=====================================
Reciprocal Cron started on Aug 15, 2006 16:00 PM
Reciprocal URL being checked for: [ value from Admin>config>link-checks>filed=Reciprocal checking for the following URL]
Cron execution time: 339.7054 seconds.
Total number of links to check: 100
=====================================
[2] in the individual check report, can the log clarify what the "URL" field is.
R-URL: http://www.toastmasters.org/
ID: 41
REPLY: Reciprocal URL is not well formed
RECIP: not valid
TIME: 0.0001 seconds to check the link.
In my reciprocal.txt almost all the URL's look like home pages not reciprocal pages so some confirmation that it is really the recip-URL field value that is being checked would remove doubt
regards
Mark
Vincent Wright
08-16-2006, 06:12 AM
I have just applied these minor changes.
Mark Brookes
08-16-2006, 06:22 AM
I have just applied these minor changes.
B)
I'll look forward to an updated reciprocal.php ...
I have emailed you some test results .. looks like progress :)
Vincent Wright
08-16-2006, 06:48 AM
Have just sent it.
Mark Brookes
08-16-2006, 08:08 AM
Hi Vincent
Have just sent it.
Does this refer to the Header/information changes?
(Which I have received & tested. Which BTW I think are very helpful)
OR
Does this refer to the wildcard reciprocal url issue?
(I don't appear to have received this update if you did mean I should have)
regards
Mark
Vincent Wright
08-16-2006, 09:14 AM
Sorry for confusion -- it refers to log file header changes.
I'm still testing wildcard checking.
Mark Brookes
09-04-2006, 10:50 AM
Hello Vincent,
[1]
My log file for reciprocal.txt is now 417551 bytes big :)
Any progress on 'archiving' log files each time a new 'cycle' starts (as per messages above)?
How about: in the Admin-panel > configs > cron section asking the administrator how many generations of log files they want to keep?
[2]
log files, now include a header each time the cron runs. This is great.
If there is a lot of links checked each time cron is run, then it can be quite a hunt to find the header information.
So, Could we have a footer at the end of each run (i.e. latest would be at the end of the file & so easy to find)
** This would also have the benefit of confirming that the cron job finished properly!! so If my host decided that 326 links were taking too long to check & stopped the process - then if I happenned to check the log file I could see for sure that the footer had not been written and conclude that something was wrong. **
E.g. Header would be:
================================================
START
Reciprocal cron started on Sep 04, 2006 06:56 AM
Cron execution time: 538.2845 seconds.
Total number of links to check: 326
Total number of valid links: 70
Total number of non-valid links: 256
Reciprocal URL being checked for: http://www.sparklingspeeches (http://www.sparklingspeeches).*
================================================
NON-VALID LINKS (256)
---------------------
etc....
[3,200 rows of report ]
Footer would be:
================================================
END
Reciprocal cron ended on Sep 04, 2006 06:56 AM
Cron execution time: 538.2845 seconds.
Total number of links to check: 326
Total number of valid links: 70
Total number of non-valid links: 256
Reciprocal URL being checked for: http://www.sparklingspeeches (http://www.sparklingspeeches).*
================================================
For administrators who do not have a lot of links, perhaps they should be able to select to enable the footer reporting only if link check number is greater than ???
e.g
CRON LOG SETTINGS:
cron logging .................................................. ..........enabled/disabled
Include Footer info. if number of links > ........................_______
Number of backup/archive log files .............................._______
what do you think?
Vincent Wright
09-06-2006, 07:04 AM
I'm not sure...
Initially I introduced log files to debug the cron jobs. In case of errors I could easily see the problem.
Probably in production version we should get rid of them.
Mark Brookes
09-06-2006, 08:19 AM
I'm not sure...
Initially I introduced log files to debug the cron jobs. In case of errors I could easily see the problem.
Probably in production version we should get rid of them.
Hi Vincent,
I am surprised.
Despite your excellent and superb pdf instructions;) I assume that some new users will encounter difficulties with setting up and configuring. If so I think the log files would be very useful in sorting out the problem?
Stiil, you are the boss B)
Vincent Wright
09-06-2006, 08:58 AM
I'm surprised as well.
It proves once again that PEOPLE DO NOT READ MANUALS.
vBulletin® v3.7.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.