Pages in topic:   [1 2] >
Translating a website: Tool for downloading hundreds of files and counting words
Thread poster: Rajan Chopra
Rajan Chopra
Rajan Chopra
India
Local time: 05:51
Member (2008)
English to Hindi
+ ...
Dec 5, 2010

Hi friends,

A translation agency wants me to tranlate a website that contains dozens of links and in every link there are many links and sub-links. If I download the files one by one, it will take a great deal of time. Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner?

Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and a
... See more
Hi friends,

A translation agency wants me to tranlate a website that contains dozens of links and in every link there are many links and sub-links. If I download the files one by one, it will take a great deal of time. Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner?

Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.

Thanks in advance for your precious help.

Regards,

Chopra
Collapse


 
Laurent KRAULAND (X)
Laurent KRAULAND (X)  Identity Verified
France
Local time: 02:21
French to German
+ ...
Unprofessional way of dealing Dec 5, 2010

Hi langclinic,
there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe.

I would insist on manageable work, i. e. the agency contacts the client and asks them to send the contents they really want to have translated. For you, it will be an insurance against doing not enough or too muc
... See more
Hi langclinic,
there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe.

I would insist on manageable work, i. e. the agency contacts the client and asks them to send the contents they really want to have translated. For you, it will be an insurance against doing not enough or too much work.

This being said, I use Anycount to count the words in a PDF file. But the PDF file must be genuine PDF (like pages created in a DTP software or through an office application), not scanned files - in this case, and as the file would be images put in a PDF, you would have to count the words manually too.

Good luck!
Collapse


 
Riadh Muslih (X)
Riadh Muslih (X)  Identity Verified
Local time: 17:21
Arabic to English
+ ...
I concur Dec 5, 2010

Laurent KRAULAND wrote:

Hi langclinic,
there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe.



I fully agree with Krauland. Not only on the point of professionalism, and perhaps copyright, also because I will not do the work of the client. The client must send me what he/she wants me to translate, not me fishing for it, with or without pay.


 
jyuan_us
jyuan_us  Identity Verified
United States
Local time: 20:21
Member (2005)
English to Chinese
+ ...
I think the question is still relevant and worth looking into Dec 5, 2010

Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files.

 
Vadim Kadyrov
Vadim Kadyrov  Identity Verified
Ukraine
Local time: 03:21
English to Russian
+ ...
I have a piece of advice Dec 5, 2010

1. To download a site, you need Teleport Pro. It allows us to download the site. You just indicate the URL, and the program does not go beyond the limits you have indicated (it is very important that you do not download external pages, links to which can occur on the web-site that you need to download). Downloaded files will be stored in a separate folder. Be aware that the program downloads everything (images, or whatever). It stores all these files in one folder.

2. Fine count is
... See more
1. To download a site, you need Teleport Pro. It allows us to download the site. You just indicate the URL, and the program does not go beyond the limits you have indicated (it is very important that you do not download external pages, links to which can occur on the web-site that you need to download). Downloaded files will be stored in a separate folder. Be aware that the program downloads everything (images, or whatever). It stores all these files in one folder.

2. Fine count is a very powerful tool to count html files, pdf, etc. You just select the folder where your downloaded files are stored, and than select html files only (to add them to the list).

3. You translate the files in TagEditor.

4. You than look through the on-line version of your translation to find any errors, slips of the pen, etc.

That is all. I successfully translated and localized several sites using the method. Of course, only small-scale web-sites can be translated in such a way. When having a large one, you will be lost in the piles of pages, images, etc.


All that takes you time (which means money). And frankly speaking, only rather small sites, of individuals or small companies, can be processed in that way. Large companies will of course never ask a single free-lancer to translate the whole web-site.

[Edited at 2010-12-05 06:52 GMT]
Collapse


 
Laurent KRAULAND (X)
Laurent KRAULAND (X)  Identity Verified
France
Local time: 02:21
French to German
+ ...
Obviously... Dec 5, 2010

jyuan_us wrote:

Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files.


but a website does not appear ex nihilo somewhere on the Internet. Someone *must* be in possession of the original files.

It is like the plague some of us are dealing with when handling scanned PDFs - you'd be surprised how fast some clients manage to get the originals when you say that processing scanned PDFs comes at a surcharge of X%.

And how does one download Flash-generated content?


 
Christina Paiva
Christina Paiva  Identity Verified
Brazil
Local time: 21:21
Portuguese to English
+ ...
PDF word count Dec 5, 2010

Hi langclinic!

Lots of suggestions on PDF word count here:

http://www.proz.com/forum/dtp_desktop_publishing/131071-tips_for_pdf_translation.html


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 02:21
Member (2006)
English to Afrikaans
+ ...
Three sets of tools Dec 5, 2010

langclinic wrote:
Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner?


Yes, you need an "offline browser". I recommend Oleg Chernavin's Web Downloader 2.2 (google for webdown.exe and look on abandonware sites).

Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.


You can try Anycount:
http://www.anycount.com/download.html


 
Emma Goldsmith
Emma Goldsmith  Identity Verified
Spain
Local time: 02:21
Member (2004)
Spanish to English
forum topic Dec 5, 2010

Have you read this thread:
http://www.proz.com/forum/software_applications/132076-software_used_to_extract_html_files_from_websites.html
It looks helpful.


 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 02:21
German to Swedish
+ ...
Original files Dec 5, 2010

"Someone must be in possession of the original files".

Yes, but they may be server-side scripts querying databases and contain no actual HTML at all.

(That still doesn't make it the translator's problem, of course.)

 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 02:21
German to Swedish
+ ...
Original files Dec 5, 2010

"Someone must be in possession of the original files".


Yes, but they may be server-side scripts querying databases and contain no actual HTML at all.

(That still doesn't make it the translator's problem, of course.)


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 02:21
English to Hungarian
+ ...
Some info Dec 5, 2010

As stated by various previous posters, the right way to do this is to get the files (and instructions!) from the IT guy of the company. If the site is not exclusively made up of static HTML pages, you can't possibly translate it by just trying to download the site "from outside".
If you know that it's all static html, it's still better to get the files from the webmaster, but it's possible to grab them from the net. The "right" tools for that are httrack and wget. I use wget, I believe the
... See more
As stated by various previous posters, the right way to do this is to get the files (and instructions!) from the IT guy of the company. If the site is not exclusively made up of static HTML pages, you can't possibly translate it by just trying to download the site "from outside".
If you know that it's all static html, it's still better to get the files from the webmaster, but it's possible to grab them from the net. The "right" tools for that are httrack and wget. I use wget, I believe the command to download (mirror) a site is
wget -m -np -P outputfolder -p http://www.site/address.com
-m: mirror site, -np no parent folders, -P: specify name of output folder, -p: get page dependencies such as images

Word counts shouldn't be an issue with HTML. You should do HTML with a CAT anyway, and your CAT will give you a word count.

BTW both downloading and translating these files takes a fair bit of IT knowledge - I'm not sure I myself would take it on without the client's guidance and support.

[Edited at 2010-12-05 12:22 GMT]
Collapse


 
Jack Doughty
Jack Doughty  Identity Verified
United Kingdom
Local time: 01:21
Russian to English
+ ...
In memoriam
Translator's Abacus Dec 5, 2010

Looked at "Anycount" and wondered if there was anything similar but free. Came across "Translator's Abacus" at http://www.globalrendering.com/download.html and downloaded it. I've tried it at it seems quite useful.

 
ahmadwadan.com
ahmadwadan.com  Identity Verified
Saudi Arabia
Local time: 03:21
English to Arabic
+ ...
Webreaper & Anycount Dec 5, 2010

langclinic wrote:

Hi friends,

Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner?


WebReaper 10.0 (Freeware)


Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.


Anycount


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 02:21
Member (2006)
English to Afrikaans
+ ...
Not free for us Dec 5, 2010

Ahmad Wadan wrote:
WebReaper 10.0 (Freeware)


Not free for us (unless you're a volunteer translator):
http://www.webreaper.net/licence.html


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translating a website: Tool for downloading hundreds of files and counting words







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »