Convert PDF to TEXT

How I converted PDF to text in Ubuntu.

pdf a text recognition with linux and Operating systems|lower


LINUX RECOGNIZE TEXT

 

  1. Seems to be OCRmyPDF tool suits for me. Lets try  $ sudo apt install ocrmypdf

  2. Select languages $ tesseract –list-langs

  3. Install that you need $ sudo apt install tesseract-ocr-spa

  4. $ ocrmypdf -v -l ‘spa’ old.pdf new.pdf //Got errors and no output

  5. I need -f attribute. $ ocrmypdf -v -f -l spa old.pdf new.pdf //Instead SPA use your PDF language

  6. I’ve got PIL.Image.DecompressionBombError: Image size (1115186111 pixels) exceeds limit of 256,000,000 pixels, could be decompression bomb DOS attack.

  7. Try to set --max-image-mpixels 1300 $ ocrmypdf -v -f -l spa --max-image-mpixels 1300 old.pdf new.pdf

 

TO EDIT PDF

  1. Install LibreOffice draw $ sudo apt install libreoffice-draw //Starts to work after reboot

  2. $ sudo apt install libreoffice-gnome libreoffice -y // For GNOME -y means YES for any prompt

  3. Try to istall scribus $ sudo apt install scribus // I don't have needed dependancy

  4. $ sudo apt install inkscape // Suitable only for editing one page PDF

  5. Try PDF-Shuffler for edit multipage PDF. $ sudo apt install pdfshuffler // It is good idea to use pdfshuffler for cli

     

 

GRAPHICAL TESSERACT

1. Install gImageReader it from snap.

2. Got an error No tessaract languages are available for use. Recognition will not work.

2.1. Try $ sudo apt install tesseract-ocr-rus // Helped