Merge into chm2pdf_branch : chm2pdf_branch : Code : chm2pdf

Reviewer	Review Type	Date Requested	Status
Reto Knaak		2012-10-07	Pending
Review via email: mp+128385@code.staging.launchpad.net

Revision history for this message

Reto Knaak (reto-knaak) wrote on 2012-10-15:

#

Hi Grishkin (Max?) !

Thank you for the files... I'm not a real programmer (just tried to fix
some issues that where avoiding me to use the script) so I don't know if I
am the right person to make the code review.
It makes now a lot of monts I didn't boot up my virtual box with ubuntu and
I have the feeling I forgot most of what I learned trying to fix the script.

Anyway, this evening I had some time and began to download the files - just
to see what's going on.

My operating system is Win7, and if I open the CHM file from windows it
won't open, probably doe to the missinf toc!
Then I tryied to import it to calibre, and there if I open the CHM
something is displayed but it the "Liberty Bay" article from the online(!)
wikipedia.
If I convert the file to some other format (mobi), I get a page with "1951
Chicago Bears season" which seems me the right output.
So I'm not sure the demo chm file has a valid output, but I agree that it's
a good idea to try to extract what's there.

I'm not familiar with the code review process, and I am asking myself if
i/we ahould open a bug under ubuntu (that is what I did with the bugs i
found previously)?

I gave a quick glance at the diff and most are differences that are not
really there (probably some spaces), and the only true differences are:
in def get_html_list(cfile) and def get_objective_urls_list(filename).

For the first one, it's the first time I see "lambda" (so again, probably
I'm not the right one to review...).
I think I understood what it's meant for but I can't say I understand how
it works.... (if first way to retrieve the html files fail, use the second
one using all files found or something similar?)

For the second one, "my local chm2pdf" is like this:

*def get_objective_urls_list(filename):
    '''
    takes the list of files inside the chm archive, with the correct urls
of each one.
    '''
    os.system('enum_chmLib '+filename+' >
"'+CHM2PDF_WORK_DIR+'/urlslist.txt"')
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','rU')
    urls_list=[]
    for line in flist.readlines()[3:]:
        #print 'line',line
        #This won't work if internal paths of CHM contains spaces: e.g.
/doc space/ will only become /doc
        #spline=line.split()
        #urls_list.append(spline[5])
        #this should work better:
        spline= re.sub(r".*?normal file\s*(.*?)\n$", "\\1", line)
        if spline[0]=="/":
          #print "got spline="+spline
          urls_list.append( spline)
    flist.close()
    # os.remove(CHM2PDF_WORK_DIR+'/urlslist.txt')
    return urls_list
*
Does your solution work with chm paths containing spaces? (If you need a
sample file see
https://bugs.launchpad.net/ubuntu/+source/chm2pdf/+bug/894193 )
I have the feeling (not really run any scripts this evening and forgetting
pyton) that using urls_list.append(spline[5]) will fail in case of paths
with spaces!
I have also the feeling that my solution is not really state of the art, so
maybe you can suggest something that solves both problems?

Hope to hear you soon and Kind regards from the italian part of Switzerland!

Ciao
Reto Knaak

On Sun, Oct 7, 2012 at 5:23 PM, Grishkin <email address hidden> wrote:

> Grishkin has propose...

Hi Grishkin (Max?) !

Thank you for the files... I'm not a real programmer (just tried to fix
some issues that where avoiding me to use the script) so I don't know if I
am the right person to make the code review.
It makes now a lot of monts I didn't boot up my virtual box with ubuntu and
I have the feeling I forgot most of what I learned trying to fix the script.

Anyway, this evening I had some time and began to download the files - just
to see what's going on.

My operating system is Win7, and if I open the CHM file from windows it
won't open, probably doe to the missinf toc!
Then I tryied to import it to calibre, and there if I open the CHM
something is displayed but it the "Liberty Bay" article from the online(!)
wikipedia.
If I convert the file to some other format (mobi), I get a page with "1951
Chicago Bears season" which seems me the right output.
So I'm not sure the demo chm file has a valid output, but I agree that it's
a good idea to try to extract what's there.

I'm not familiar with the code review process, and I am asking myself if
i/we ahould open a bug under ubuntu (that is what I did with the bugs i
found previously)?

I gave a quick glance at the diff and most are differences that are not
really there (probably some spaces), and the only true differences are:
in def get_html_list(cfile) and def get_objective_urls_list(filename).

For the first one, it's the first time I see  "lambda" (so again, probably
I'm not the right one to review...).
I think I understood what it's meant for but I can't say I understand how
it works.... (if first way to retrieve the html files fail, use the second
one using all files found or something similar?)

For the second one, "my local chm2pdf" is like this:

*def get_objective_urls_list(filename):
    '''
    takes the list of files inside the chm archive, with the correct urls
of each one.
    '''
    os.system('enum_chmLib '+filename+' >
"'+CHM2PDF_WORK_DIR+'/urlslist.txt"')
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','rU')
    urls_list=[]
    for line in flist.readlines()[3:]:
        #print 'line',line
        #This won't work if internal paths of CHM contains spaces: e.g.
/doc space/ will only become /doc
        #spline=line.split()
        #urls_list.append(spline[5])
        #this should work better:
        spline= re.sub(r".*?normal file\s*(.*?)\n$", "\\1", line)
        if spline[0]=="/":
          #print "got spline="+spline
          urls_list.append( spline)
    flist.close()
    # os.remove(CHM2PDF_WORK_DIR+'/urlslist.txt')
    return urls_list
*
Does your solution work with chm paths containing spaces? (If you need a
sample file see
https://bugs.launchpad.net/ubuntu/+source/chm2pdf/+bug/894193 )
I have the feeling (not really run any scripts this evening and forgetting
pyton) that using  urls_list.append(spline[5]) will fail in case of paths
with spaces!
I have also the feeling that my solution is not really state of the art, so
maybe you can suggest something that solves both problems?

Hope to hear you soon and Kind regards from the italian part of Switzerland!

Ciao
Reto Knaak

On Sun, Oct 7, 2012 at 5:23 PM, Grishkin <MGrishkin@gmail.com> wrote:

> Grishkin has proposed merging lp:~grishkin/chm2pdf/chm2pdf_branch into
> lp:~reto-knaak/chm2pdf/chm2pdf_branch.
>
> Requested reviews:
>   Reto Knaak (reto-knaak)
>
> For more details, see:
> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
>
> Mainly fix from
> https://groups.google.com/d/topic/chm2pdf/SeOGMcMFsBw/discussion.
> --
> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
> You are requested to review the proposed merge of
> lp:~grishkin/chm2pdf/chm2pdf_branch into
> lp:~reto-knaak/chm2pdf/chm2pdf_branch.
>

Reply

lp://staging/~grishkin/chm2pdf/chm2pdf_branch updated on 2012-10-19

14. By Grishkin <grishkin@mint> on 2012-10-19: Merged Reto's pathes, fixed problems with spaces

Revision history for this message

Max Grishkin (grishkin) wrote on 2012-10-19:

#

Download full text (6.2 KiB)

Hello, Reto, and thanks for constructive response!

I had been watching activity in chm2pdf google groups and on Launchpad for
a while and understand that you are just an ordinary user of chm2pdf, not a
maintainer or author of software. But I see that chm2pdf was published
quite a lot of time ago and until now there were no any bugfixes or
improvements, so the project may be considered abandoned. Some time ago you
were most active in project discussion and you own a branch of it on LP.
When I found this branch, I've decided to upload my chm2pdf version on
Launchpad too, just to make it public. And my point is that our branches
should be synchronized and both should have the latest chm2pdf version with
all available fixes.

Concerning that chm file from Google Group topic. It is certainly very
dirty and someone may argue that chm2pdf should not process such files
correctly. But it was created just to demonstrate the type of files, on
which chm2pdf failed before, and now it generates pdf's with them. So I've
just put two completely random Wikipedia articles into one chm file, they
even do not link to each other, that's why some software shows one page and
other software completely another page. Anyway, the resulting pdf contains
both pages.

I've downloaded your patch for spaces in names, but it appeared that I did
not understand to which version apply it - the one from distribution, from
code.google.com or from you branch? I've tried to merge it to different
versions by hand, but the resulting files still failed on chm file from
your Demo_CMH.zip. So I've just merged your patch to my branch and fixed
the rest so that conversion started to work for me - mainly added quotes
around filenames when needed. Seems that no-table-of-conents and
spaces-in-filenames fixes perfectly work together! I did not try to solve
problem with '%20' symbols, but I'll think about it shortly and this does
not seem a difficult problem.

Reto, please also note, that since you reply through @code.launchpad.net,
you reply will be publicly available at
https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385.
That's definitely not a problem, just pointing this out in case you have
not noticed.
-----
Best regards,
Grishkin Maxim

2012/10/16 Reto Knaak <email address hidden>:
> Hi Grishkin (Max?) !
>
> Thank you for the files... I'm not a real programmer (just tried to fix
> some issues that where avoiding me to use the script) so I don't know if I
> am the right person to make the code review.
> It makes now a lot of monts I didn't boot up my virtual box with ubuntu
and
> I have the feeling I forgot most of what I learned trying to fix the
script.
>
> Anyway, this evening I had some time and began to download the files -
just
> to see what's going on.
>
> My operating system is Win7, and if I open the CHM file from windows it
> won't open, probably doe to the missinf toc!
> Then I tryied to import it to calibre, and there if I open the CHM
> something is displayed but it the "Liberty Bay" article from the online(!)
> wikipedia.
> If I convert the file to some other format (mobi), I get a page with "1951
> Chicago Bears season" which seems me the right outpu...

Hello, Reto, and thanks for constructive response!

I had been watching activity in chm2pdf google groups and on Launchpad for
a while and understand that you are just an ordinary user of chm2pdf, not a
maintainer or author of software. But I see that chm2pdf was published
quite a lot of time ago and until now there were no any bugfixes or
improvements, so the project may be considered abandoned. Some time ago you
were most active in project discussion and you own a branch of it on LP.
When I found this branch, I've decided to upload my chm2pdf version on
Launchpad too, just to make it public. And my point is that our branches
should be synchronized and both should have the latest chm2pdf version with
all available fixes.

Concerning that chm file from Google Group topic. It is certainly very
dirty and someone may argue that chm2pdf should not process such files
correctly. But it was created just to demonstrate the type of files, on
which chm2pdf failed before, and now it generates pdf's with them. So I've
just put two completely random Wikipedia articles into one chm file, they
even do not link to each other, that's why some software shows one page and
other software completely another page. Anyway, the resulting pdf contains
both pages.

I've downloaded your patch for spaces in names, but it appeared that I did
not understand to which version apply it - the one from distribution, from
code.google.com or from you branch? I've tried to merge it to different
versions by hand, but the resulting files still failed on chm file from
your Demo_CMH.zip. So I've just merged your patch to my branch and fixed
the rest so that conversion started to work for me - mainly added quotes
around filenames when needed. Seems that no-table-of-conents and
spaces-in-filenames fixes perfectly work together! I did not try to solve
problem with '%20' symbols, but I'll think about it shortly and this does
not seem a difficult problem.

Reto, please also note, that since you reply through @code.launchpad.net,
you reply will be publicly available at
https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385.
That's definitely not a problem, just pointing this out in case you have
not noticed.
-----
Best regards,
Grishkin Maxim

2012/10/16 Reto Knaak <reto.knaak@gmail.com>:
> Hi Grishkin (Max?) !
>
> Thank you for the files... I'm not a real programmer (just tried to fix
> some issues that where avoiding me to use the script) so I don't know if I
> am the right person to make the code review.
> It makes now a lot of monts I didn't boot up my virtual box with ubuntu
and
> I have the feeling I forgot most of what I learned trying to fix the
script.
>
> Anyway, this evening I had some time and began to download the files -
just
> to see what's going on.
>
> My operating system is Win7, and if I open the CHM file from windows it
> won't open, probably doe to the missinf toc!
> Then I tryied to import it to calibre, and there if I open the CHM
> something is displayed but it the "Liberty Bay" article from the online(!)
> wikipedia.
> If I convert the file to some other format (mobi), I get a page with "1951
> Chicago Bears season" which seems me the right output.
> So I'm not sure the demo chm file has a valid output, but I agree that
it's
> a good idea to try to extract what's there.
>
> I'm not familiar with the code review process, and I am asking myself if
> i/we ahould open a bug under ubuntu (that is what I did with the bugs i
> found previously)?
>
> I gave a quick glance at the diff and most are differences that are not
> really there (probably some spaces), and the only true differences are:
> in def get_html_list(cfile) and def get_objective_urls_list(filename).
>
> For the first one, it's the first time I see  "lambda" (so again, probably
> I'm not the right one to review...).
> I think I understood what it's meant for but I can't say I understand how
> it works.... (if first way to retrieve the html files fail, use the second
> one using all files found or something similar?)
>
> For the second one, "my local chm2pdf" is like this:
>
> *def get_objective_urls_list(filename):
>     '''
>     takes the list of files inside the chm archive, with the correct urls
> of each one.
>     '''
>     os.system('enum_chmLib '+filename+' >
> "'+CHM2PDF_WORK_DIR+'/urlslist.txt"')
>     flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','rU')
>     urls_list=[]
>     for line in flist.readlines()[3:]:
>         #print 'line',line
>         #This won't work if internal paths of CHM contains spaces: e.g.
> /doc space/ will only become /doc
>         #spline=line.split()
>         #urls_list.append(spline[5])
>         #this should work better:
>         spline= re.sub(r".*?normal file\s*(.*?)\n$", "\\1", line)
>         if spline[0]=="/":
>           #print "got spline="+spline
>           urls_list.append( spline)
>     flist.close()
>     # os.remove(CHM2PDF_WORK_DIR+'/urlslist.txt')
>     return urls_list
> *
> Does your solution work with chm paths containing spaces? (If you need a
> sample file see
> https://bugs.launchpad.net/ubuntu/+source/chm2pdf/+bug/894193 )
> I have the feeling (not really run any scripts this evening and forgetting
> pyton) that using  urls_list.append(spline[5]) will fail in case of paths
> with spaces!
> I have also the feeling that my solution is not really state of the art,
so
> maybe you can suggest something that solves both problems?
>
> Hope to hear you soon and Kind regards from the italian part of
Switzerland!
>
> Ciao
> Reto Knaak
>
>
> On Sun, Oct 7, 2012 at 5:23 PM, Grishkin <MGrishkin@gmail.com> wrote:
>
>> Grishkin has proposed merging lp:~grishkin/chm2pdf/chm2pdf_branch into
>> lp:~reto-knaak/chm2pdf/chm2pdf_branch.
>>
>> Requested reviews:
>>   Reto Knaak (reto-knaak)
>>
>> For more details, see:
>> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
>>
>> Mainly fix from
>> https://groups.google.com/d/topic/chm2pdf/SeOGMcMFsBw/discussion.
>> --
>> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
>> You are requested to review the proposed merge of
>> lp:~grishkin/chm2pdf/chm2pdf_branch into
>> lp:~reto-knaak/chm2pdf/chm2pdf_branch.
>>
>
> --
> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
> You are the owner of lp:~grishkin/chm2pdf/chm2pdf_branch.

Reply

Revision history for this message

Reto Knaak (reto-knaak) wrote on 2012-10-23:

#

Hi Max!

You're welcome and thank you!
It's true that the project is abandoned... for me a pity because it does a
wonderful job for me!
Thank you for pointing out to me my replies are public, this is ok for me
(I guessed it).

I think it's a good idea to share your patch, and i suggest also to add the
bug officially (with a link to the google code bug page).
I also agree to merge the branches, but I'll need some help...

In the mean time I updated my ubuntu system and tried your chm eample file,
and as expected an error occured....

I started from the ubuntu version, and made my brach using this sequence:

patch chm2pdf < ../patches/chm2pdf_check_soup.diff
patch chm2pdf < ../patches/chm2pdf_no_javascript.diff
patch chm2pdf < ../patches/chm2pdf_multiple_page_problem.diff
patch chm2pdf < ../patches/chm2pdf_color_removed.diff
patch chm2pdf < ../patches/chm2pdf_links_case_insensitive.diff
patch chm2pdf < ../patches/chm2pdf_images_case_insensitive.diff
patch chm2pdf < ../patches/chm2pdf_specialchars.diff

Probably you need to appy all the patches.

Maybe it's also a good idea if you give a critical look at my patches,
chm2pdf is my first and only experience with python (and linux) and some of
my solutions may not be too clean...

Kind regards
Reto

Reply

lp://staging/~grishkin/chm2pdf/chm2pdf_branch updated on 2012-11-04

15. By Grishkin <grishkin@mint> on 2012-10-27: merged several other Reto's patches
16. By Grishkin <grishkin@mint> on 2012-11-04: Fixed processing of links with spaces

Revision history for this message

Max Grishkin (grishkin) wrote on 2012-11-04:

#

Hi Reto!
I've reviewed your branch at
https://code.launchpad.net/~reto-knaak/chm2pdf/chm2pdf_branch and it seems
that it does not contain any of patches you mentioned, just changes in
changelog file. So I've merged these patches to my branch manually. I did
not test them deeply still and I am not especially confident in correctness
of merge of the latter two patches.
Seems that I've found the solution to the problem you've described at
https://bugs.launchpad.net/ubuntu/+source/chm2pdf/+bug/894193 - check the
latest comment there.
To merge the branches you'll have to install bzr somewhere on your local PC
and download your branch (if it is not already present there) by running:
bzr branch lp:~reto-knaak/chm2pdf/chm2pdf_branch
After that you merge branches by running:
bzr merge lp:~grishkin/chm2pdf/chm2pdf_branch
And upload the results back to launchpad:
bzr push lp:~reto-knaak/chm2pdf/chm2pdf_branch

Please feel free to contact me if you have any questions or if something is
not working.

-----
Best regards,
Grishkin Maxim

2012/10/24 Reto Knaak <email address hidden>

> Hi Max!
>
> You're welcome and thank you!
> It's true that the project is abandoned... for me a pity because it does a
> wonderful job for me!
> Thank you for pointing out to me my replies are public, this is ok for me
> (I guessed it).
>
> I think it's a good idea to share your patch, and i suggest also to add the
> bug officially (with a link to the google code bug page).
> I also agree to merge the branches, but I'll need some help...
>
> In the mean time I updated my ubuntu system and tried your chm eample file,
> and as expected an error occured....
>
> I started from the ubuntu version, and made my brach using this sequence:
>
> patch chm2pdf < ../patches/chm2pdf_check_soup.diff
> patch chm2pdf < ../patches/chm2pdf_no_javascript.diff
> patch chm2pdf < ../patches/chm2pdf_multiple_page_problem.diff
> patch chm2pdf < ../patches/chm2pdf_color_removed.diff
> patch chm2pdf < ../patches/chm2pdf_links_case_insensitive.diff
> patch chm2pdf < ../patches/chm2pdf_images_case_insensitive.diff
> patch chm2pdf < ../patches/chm2pdf_specialchars.diff
>
> Probably you need to appy all the patches.
>
> Maybe it's also a good idea if you give a critical look at my patches,
> chm2pdf is my first and only experience with python (and linux) and some of
> my solutions may not be too clean...
>
> Kind regards
> Reto
>
> --
> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385<https://code.launchpad.net/%7Egrishkin/chm2pdf/chm2pdf_branch/+merge/128385>
> You are the owner of lp:~grishkin/chm2pdf/chm2pdf_branch.
>

Hi Reto!
I've reviewed your branch at
https://code.launchpad.net/~reto-knaak/chm2pdf/chm2pdf_branch and it seems
that it does not contain any of patches you mentioned, just changes in
changelog file. So I've merged these patches to my branch manually. I did
not test them deeply still and I am not especially confident in correctness
of merge of the latter two patches.
Seems that I've found the solution to the problem you've described at
https://bugs.launchpad.net/ubuntu/+source/chm2pdf/+bug/894193 - check the
latest comment there.
To merge the branches you'll have to install bzr somewhere on your local PC
and download your branch (if it is not already present there) by running:
bzr branch lp:~reto-knaak/chm2pdf/chm2pdf_branch
After that you merge branches by running:
bzr merge lp:~grishkin/chm2pdf/chm2pdf_branch
And upload the results back to launchpad:
bzr push lp:~reto-knaak/chm2pdf/chm2pdf_branch

Please feel free to contact me if you have any questions or if something is
not working.

-----
Best regards,
Grishkin Maxim

2012/10/24 Reto Knaak <reto.knaak@gmail.com>

> Hi Max!
>
> You're welcome and thank you!
> It's true that the project is abandoned... for me a pity because it does a
> wonderful job for me!
> Thank you for pointing out to me my replies are public, this is ok for me
> (I guessed it).
>
> I think it's a good idea to share your patch, and i suggest also to add the
> bug officially (with a link to the google code bug page).
> I also agree to merge the branches, but I'll need some help...
>
> In the mean time I updated my ubuntu system and tried your chm eample file,
> and  as expected an error occured....
>
> I started from the ubuntu version, and made my brach using this sequence:
>
> patch chm2pdf < ../patches/chm2pdf_check_soup.diff
> patch chm2pdf < ../patches/chm2pdf_no_javascript.diff
> patch chm2pdf < ../patches/chm2pdf_multiple_page_problem.diff
> patch chm2pdf < ../patches/chm2pdf_color_removed.diff
> patch chm2pdf < ../patches/chm2pdf_links_case_insensitive.diff
> patch chm2pdf < ../patches/chm2pdf_images_case_insensitive.diff
> patch chm2pdf < ../patches/chm2pdf_specialchars.diff
>
> Probably you need to appy all the patches.
>
> Maybe it's also a good idea if you give a critical look at my patches,
> chm2pdf is my first and only experience with python (and linux) and some of
> my solutions may not be too clean...
>
> Kind regards
> Reto
>
> --
> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385<https://code.launchpad.net/%7Egrishkin/chm2pdf/chm2pdf_branch/+merge/128385>
> You are the owner of lp:~grishkin/chm2pdf/chm2pdf_branch.
>

Reply

Revision history for this message

Reto Knaak (reto-knaak) wrote on 2012-11-04:

#

Hi Grishkin!

Nice to hear you, and thank you for your work!
It's possbile that I made some mistakes, I also saw the patches are not
there but assumed (as they are listed in /debian/patches/series) that they
would be applied at installation.

I started my virtualbox and executed the steps you described.
Unfortunately, the merge step gives me
bzr: ERROR: Not a branch: "/home/reto/".
So I'm stuck again...
Neverless, I checked your code against mine, and there are a few points to
discuss.

- in your version you don't use temporary directories (like the original
script version on google.code); in the ubuntu/devian version temporary
directories where iserted for security reasons (and against the will of the
developers with the result that they left the project). As this is a branch
on ubuntu, probably it would be good to stay with the temporary diretories
solution.
Personally I like the --dontextract because it's useful in debugging and
this is broken with temporary directories...

-Thank you for the suggestion for solving the %20 issue.
My suggestion is to use your solution only if --BeatifulSoup is used, and
if not, stay with current solution (It's only a minor problem for me).

- I'm not sure if line 522 should be commented, as it's now solved with
526:
522: page = re.sub('(?i)"'+match_string, '"'+replace_string, page)
526: page = re.sub(r'(?i)("|"[^\/"].*?\/)'+match_string,
'"'+replace_string, page)

-what is the difference between os.mkdir and os.makedirs? Am I right that
os.makedirs is safer to be used?

I still hope that someone will take charge of maintaining the chm2pdf
project, so that our shared efforts are not lost....
Kind regards!

Reto

Reply

chm2pdf

Merge lp://staging/~grishkin/chm2pdf/chm2pdf_branch into lp://staging/~reto-knaak/chm2pdf/chm2pdf_branch

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers