◷ Reading Time: 3 minutes

This extensions enables your logic to extract data information from PDF document.


Opens an input as a PDF document

input | toPdf (options)
  • options: Object to provide set of options
    • textExtractionMethod: When not specified, the default is instructions
      • instructions: Extract text as the raw PDF instructions
      • words: Extract text by using the instructions and creates words
  • input: Path as string, loaded bytes or Stream
  • returns: a PDF object
pdf = file| toPdf ({textExtractionMethod:'words'})


Finds list of page indices that match a pattern

pdf | pdfIndexOf (pattern, skipCount)
  • pdf: PDF document object
  • skipCount: number of pages to skip (default is 0)
  • returns: a list of pages indices that match pattern e.g. [17,19,34]


Extract one or more pages from a PDF document

pdf | pdfExtract (pages, outputFile)
  • pages: array of page index (0 based)
  • outputFile: path of the file to extract to
  • returns: full file name of the output file


Splits a PDF document based on a matching pattern

pdf | pdfSplit (pattern, outputFilePattern, skipCount)
  • pattern: a regular expression pattern
  • outputFilePattern: Allows dynamically build a set of file paths for the split documents.
    • Single string value: In the path using {$GroupName} can be used to reference a match value of pattern
    • Array of string: Must match the numbers of documents that are being split
  • skipCount: number of pages to skip (default is 0)
  • returns: full files path of split documents.

Example: Splitting a document that must return 3 items and will be stored in the provided location

list = pdf|pdfSplit('(ATTACHMENT\\sTO\\sTAX)(\\n*\\s*INVOICE)(\\n*\\s*Page\\s1\\sof\\s(\\d*))(\\n*\\s*.*)(\\n*\\s*.*)(\\n*\\s*.*)(\\n*\\s*Matter\\n*\\s*(?<matter>\\d*))', ['D:/1.pdf', 'D:/2.pdf', 'D:/3.pdf'])

Example: Splitting a document with dynamic name based on pattern

list = pdf|pdfSplit('(ATTACHMENT\\sTO\\sTAX)(\\n*\\s*INVOICE)(\\n*\\s*Page\\s1\\sof\\s(\\d*))(\\n*\\s*.*)(\\n*\\s*.*)(\\n*\\s*.*)(\\n*\\s*Matter\\n*\\s*(?<matter>\\d*))','Matter-{$matter}.pdf')


Filtering Resumes

Updated on June 24, 2021

Was this article helpful?

Related Articles