CONVERT A PDF TO A PLAIN TEXT FILE

Do you need to convert a PDF to plain text? Perhaps you need to extract content that can be easily parsed for data mining or efficient processing by large language models. Or perhaps you need the content from your PDF in a smaller file size that can quickly be loaded and processed.

 

With the new pdf2txt command, you can now convert your PDF to plain text format. This feature is a part of StataNow™.

 

If you want to convert a Word document to plain text, try the new docx2txt command


Below, we use the putpdf suite of commands to create a PDF with a table of descriptive statistics and a table of regression results. We use data from the Second National Health and Nutrition Examination Survey (NHANES II) (McDowell et al. 1981) to analyze blood pressure, weight, and body mass index. We run the following commands to create our PDF:

 

 

And now we convert bpreport.pdf to a plain text file by typing

 

 

© Copyright 1996–2026 StataCorp LLC. All rights reserved.

Here is our plain text file:

 

 

REFERENCE

McDowell, A., A. Engel, J. T. Massey, and K. Maurer. 1981. “Plan and operation of the Second National Health and Nutrition Examination Survey, 1976–1980.” In Vital and Health Statistics, ser. 1, no. 15. Hyattsville, MD: National Center for Health Statistics.

 

Read more about pdf2txt in [RPT] pdf2txt in the Stata Reporting Reference Manual.