{"id":504,"date":"2021-11-10T10:23:12","date_gmt":"2021-11-10T10:23:12","guid":{"rendered":"http:\/\/18.141.20.153\/?p=504"},"modified":"2025-08-21T09:43:53","modified_gmt":"2025-08-21T09:43:53","slug":"how-to-use-aws-textract-to-extract-data-from-any-image-pdf","status":"publish","type":"post","link":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/","title":{"rendered":"How to use Amazon Textract to extract data from any Image &#038; PDF?"},"content":{"rendered":"<span class=\"rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\">6<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh3.googleusercontent.com\/0vCDCbh7zPa4sxjpaRaNNK_chUdIiY_D6TW_23_DzMOe_ASMCCL7gsxKuvzy67pi9PkzzPpfSGDKFH5CIpwbpVoCebeWVm4FZh7Jl1ZgTl9zx7ya5yvpgWir7UnAAcypAHB3TudE\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">Amazon Textract is a highly scalable machine learning service that collects printed text, handwriting, and other information from scanned documents automatically.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-justify\">Using Amazon Textract, you can easily extract text and data from images and any scanned documents that go beyond simple optical character recognition (OCR) to extract data from tables and forms.&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-justify\">Many businesses and government organizations extract data from scanned documents, such as PDFs, tables, and forms, through manual data entry that is slow, expensive, and prone to errors.&nbsp; Some businesses and government organizations are using simple business process automation (BPA), which provides fully automated workflows or semi-automated processes in the majority of businesses within various domains. These processes require manual configuration which needs to be updated each time the form changes to be usable. Textract uses machine learning to handle any type of document in real-time, accurately extracting text, forms, and tables without the need for any operator intervention or custom code.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-justify\">Amazon Textract consists of higher capabilities than the average optical character recognition (OCR) system.&nbsp; It is able to extract information like names, birthdates, and social security numbers from the images and PDF files that are stored in the <a href=\"https:\/\/www.workfall.com\/learning\/blog\/how-to-fetch-contents-of-json-files-stored-in-amazon-s3-using-express-js-and-aws-sdk\/\">S3<\/a> buckets.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh6.googleusercontent.com\/z1NgEa3TcpXOSta6CG99G55dY1hp--NA-6AyyogAx4qLaVWR4vl8auw_poNzbTYbei6CkoVbYocgMLofoYOsooX5l01asKm8QjrRpI9mj-M6KLCUr7WyqYPcK9yLIcbn-qeT5SBR\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\"><strong>&#8220;Amazon Textract is built on the same highly scalable, proven deep-learning technology that Amazon&#8217;s computer vision scientists use to analyze billions of photos and movies every day.&#8221; It can be used without any prior knowledge of machine learning.&#8221;&nbsp;&nbsp;<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\"><strong>Let\u2019s explore AWS Textract!<\/strong><\/p>\n\n\n\n<h2>In this exercise, we will be utilizing the following AWS services:<\/h2>\n\n\n\n<ol><li>Simple Storage Service (S3)<\/li><li>Identity Access Management Service (IAM)<\/li><li>Lamda Service<\/li><li>Textract Service<\/li><\/ol>\n\n\n\n<p class=\"has-text-align-justify\">We will be demonstrating one major use case of AWS Textract service using AWS Lambda with Python implementations:<\/p>\n\n\n\n<h2>Extracting Text from an S3 Bucket Image (Hands-On)<\/h2>\n\n\n\n<p><strong>Adding boto3<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\">In order to use AWS Textract in Python, the latest boto3 package is required. This package we will download and upload as an AWS Lambda \u201cLayer\u201d.<\/p>\n\n\n\n<p class=\"has-text-align-justify\">Execute the following command in the command shell.<\/p>\n\n\n\n<p><strong><code>pip install boto3<\/code><\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\">Now let\u2019s create a boto3 layer. Go to AWS Lambda -&gt; Layers and click \u201cCreate Layer\u201d.<\/p>\n\n\n\n<p>Give a layer name, select the latest Python version and upload the zip file as below.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh6.googleusercontent.com\/lQywiiSAVHvb0kyhIUR2jNwUhTlJjXk-GiccJa2nf9l97gHlUiXNjTh7XdbXo_F572riLdz9SmD_VXC9ZT6zqViyO3eI1yWm15RNkzbZkg8vs2Ll3oZgoUpYwsdKDHlIQb0k6jmp\" alt=\"\"\/><\/figure>\n\n\n\n<p><strong>So, let\u2019s start doing text extraction!<\/strong><\/p>\n\n\n\n<ol><li><strong>Extracting Text from the image stored in the S3 bucket<\/strong><\/li><\/ol>\n\n\n\n<p class=\"has-text-align-justify\">We are going to create a Lambda function that gets triggered whenever an image gets uploaded to S3 Bucket.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh3.googleusercontent.com\/0vCDCbh7zPa4sxjpaRaNNK_chUdIiY_D6TW_23_DzMOe_ASMCCL7gsxKuvzy67pi9PkzzPpfSGDKFH5CIpwbpVoCebeWVm4FZh7Jl1ZgTl9zx7ya5yvpgWir7UnAAcypAHB3TudE\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p><strong>1.1 Creating the S3 Bucket<\/strong><\/p>\n\n\n\n<p>In the Amazon console, go to the AWS S3 page and click \u201cCreate bucket\u201d.<\/p>\n\n\n\n<p class=\"has-text-align-justify\">Enter Bucket name and Region same as the region that will be used in Lambda function, in the Set permissions section, set the permissions as below image and create a bucket<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh5.googleusercontent.com\/LRPY2TUQf7r0I1_S_JMLuCurbY8f1haMKGyNL7GyzYih13gDO15d3zTVmT9fRwvc4MGodZQUIYlMVOdnNPsaD0tyAitIl0xBoLP5n-4kKCmcqrIWn4kuPRDIJXDZqp_H4RsI9XRR\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p><strong>1.2 Creating The S3 Lambda Trigger<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\">Now we&#8217;ll write a Lambda function that will be called whenever a new image is uploaded to the bucket we built.<\/p>\n\n\n\n<p class=\"has-text-align-justify\">Click &#8220;Create function&#8221; on the AWS Lambda service page.&nbsp;&nbsp;<\/p>\n\n\n\n<ol><li>Select \u201cUse a blueprint\u201d and search for \u201cs3-get-object-python\u201d template and click \u201cConfigure\u201d as shown in the below image:<\/li><\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh4.googleusercontent.com\/zIPQcwLP7EqtaBp8AF69bbsJnteoLL0iWBBLc1iNDV3L_E0vRytSOhxLZs3yMjw7kXWsHndU2xRuc4vtNR6zLYXyWL_JMuKplplX-Lh0C02VhevOoMZ3vedK8OE-qrNYeZPG84nh\" alt=\"\"\/><\/figure>\n\n\n\n<ol start=\"2\"><li>Enter \u201cFunction name\u201d, \u201cRole name\u201d and select the \u201cBucket name\u201d as the S3 bucket created in the previous step. Add \u201cSuffix\u201d to restrict the trigger only for PNG images. Fill out the rest of the settings as shown in the below image. Don\u2019t change Lambda function code as of now, we will do the changes later in the code. Click on the \u201cCreate function\u201d button.&nbsp;<\/li><\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh6.googleusercontent.com\/vJ9iqBFsYnKC6_wNWSZIRRHuvKT5NnydBuYzDCKLE0JtvqYQzEB7tEofIoVXsqc7PLIfpu4qIDpGBx8NBbVheH0fr_tiIKUvsay5xcYb5IAHyzSoMenPDLbm3bTs86LJrjhIBgEM\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh4.googleusercontent.com\/TgA0WGmwwyUzCIlMSGR5uREh_6HHkRQCEzw1gzmmAnaqP6v9CZTNyDfbAsUivUYk_JEiU6kGQyHvQeRqGSJcUF5DeaJXTZp2Z7X8eaqadmo6TVpM5Rc51Ckqtikjga_Zs0adBIq3\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">Replace the existing code in the Function Code area with the following line of code. This code sends the uploaded image to the AWS Textract and writes the response as a text file with the same name to the S3 bucket.<\/p>\n\n\n\n<p><strong>1.3&nbsp; Attaching Permission Policies to Lambda<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\">In the Lambda setup, go to the Permission tab and select &#8220;getTextFromImageRole.&#8221;&nbsp; as displayed in below image:&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh6.googleusercontent.com\/jbiGBXh8bBzLvkEdaJ0njPCRK_oAiJMzdW6pFA3pw51tiHDGi-FB-LlO_YU2Od5kp4g1S78ZlgjIL12g0tn31DDtDXGh5YA0bZUToM2fq_UK2qxXbPBjLcDXq5Sh0kyZml1m8feL\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p>This will open the \u201cgetTextFromImageRole\u201d configuration page as below.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh6.googleusercontent.com\/xUOGlHs6QwqMOmMyGor_1v4_2rT9wKynBMmLiLrFQGNp0IsxBN_xRPO_uPbCAVPWroRblk2F6lpSUn0fwtOiRMuUWksnRWfMDolOFFfUGAea44F62abn5Pbn4Vcq-kKgyu9JviZ1\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">Click \u201cAttach policy\u201d and select \u201cAmazonTextractFullAccess\u201d policy and click \u201cAttach policy\u201d as displayed in the below image.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh5.googleusercontent.com\/kpM_GEtIL3oCE8adC4mkIjz0LeafxriPjTyxYFn_2w-sWsMRY425ESerH_wjPKd6IDuPg3HZh_vwsQpLuXhy570UrZx6JVFA5jZeJW_kAwi-10LWpdhNUQ3U-38uKdzv9hCdFc4D\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p>This will give Lambda function permission to access AWS Textract service as shown in the following image<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh4.googleusercontent.com\/LXC9BvgvHld8q-jKTxDfH5yOVkGi9V0ErLow9sanMPJN5g3GQMNG1UrIzgmvbGI2LYCE0PFknt84gxxe1lIY6RPYwzmS2VkwyUe3wg-Jt7UT6d3Bgqsg3trXMzFUpqu4jP5nef2Z\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p><strong>1.4 Adding Custom \u201cboto3-layer\u201d to Lambda<\/strong><\/p>\n\n\n\n<p>Click \u201cLayers\u201d from Lambda designer and click \u201cAdd a layer\u201d as shown in the below image&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh5.googleusercontent.com\/lOyhNV0nlYGx2juffcQp6wF3UlPOEHHU3h7vhwBNHL9eVoiUlqZxZbKEduzB5sZbF3d_mZ4CuNyMOKVoCyK4vk9sMBEyQboI07993NfuJBUYBGg-LOvLpLb67SkUEhC4drGGndJ-\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">It will show the Add Layer screen as shown in the following image. Select Custom Layer and add the \u201cworkfall-boto3-layer\u201d that was created earlier and click on Add button<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh3.googleusercontent.com\/PxgND2Iq3CBU7mZGr5-9wDdrCFmM3ffrx5sNBS_o4YzzpSk40_OBT3y2tt7xjHEel7eOshxffl9kKizWOcW2JDOP-rJxFESPywbjy6q4yQxsx_jPSS7uSHvz70_GW5MFFkjrCxO1\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p><strong>1.5<\/strong> <strong>Testing The S3 Lambda Trigger<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\">Before uploading the files into S3 bucket, let\u2019s test our Lambda function. Click on \u201cTest\u201d, as of now we are getting some errors as shown in the following image. This error is regarding access denied on the GetObject operation.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh6.googleusercontent.com\/PcCPu78Tnx_gXo6PrwUqBXwYhMXaHTBpqXqyBoXuPi27BZpXAFowt-wWfVNWLMQgrNUpA7kNikico2dZa15ijmGstTxfQHdin4WFC1jKhYW0yJwUSFMscRwvBVJv7un4phh7c_MQ\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">Let\u2019s fix this error. Go to the permission tab of the Lambda function and click on the role named <a href=\"https:\/\/ap-southeast-1.console.aws.amazon.com\/iam\/home#\/roles\/getTextFromImageRole?section=permissions\">getTextFromImageRole<\/a>. In the next screen, click on the &#8220;Add inline policy\u201d to add a policy to give access to the lambda function to access objects as shown in the following image:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh4.googleusercontent.com\/WJWLpn01uUAc-tDVy42VTMhTt-Kw-KtMqzrf5N_QMfeE_IKNN7N9eRQrU_U9els3ZalSKmeN3L4S3VlxIsDNeUt0VzaHrJxtda5Gc_n0h0hpVVtnIr2ulAeo-ifkHRhjJ2nReyrf\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">It will take you to the next screen as shown in the below image. Give a name to the policy and click on \u201cCreate Policy\u201d button<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh5.googleusercontent.com\/LNeeCcDPCPyFmq1Qz5Om5tp2wUuiNozzgcjcMAP1gm_hurlxDbq8p5gOupbVKyI1bAOu8EvhaaRVyuMCKPCBW_m5Aqdp3gNPAWDzQGypdBnXlYb5P8EM1ez4OKBZBLV8VuUZXS_P\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">Go to the S3 bucket \u201cworkfallbucket\u201d created in the previous step and upload a png image with some text. I have uploaded the following image which represents the Workfall Partner Onboarding process.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh6.googleusercontent.com\/pEhANvHENold3da_RxCXOXQCdQt7O9B3OgMBefJnaiwSc7fQB8AvSgm_COX9q_2UttjQMxNqp4eDbfUFP_OdqhT8g1feYzpDSC9cMmG9UTzu7y1Q05XFQ3nr6hBKmnQVbZ10Zuu2\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">Once the image is uploaded, after a few seconds the extracted text file should be created in the same bucket with the same name as displayed in the following image:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/lh4.googleusercontent.com\/ET71ZIJaKTGKb0ginyaySC0UnvF9VXz6U6VNVa_IWHdSzqffF_AxB8kuzA9_-6tKF7S2GNlvWwvLZFhchkOeNjNgP_sznIvnVKNcaF3BT3utQZJzpe_jD-4N-yA9apuDx9KfjIFi\" alt=\"Amazon Textract\"\/><\/figure>\n\n\n\n<h2>Conclusion<\/h2>\n\n\n\n<p class=\"has-text-align-justify\">In this blog, we learned about how to use AWS Textract to extract data from any Image &amp; PDF. We will discuss more use cases of  AWS Textract in our upcoming blogs. Stay tuned to keep getting all updates about our upcoming new blogs on AWS and relevant technologies.<\/p>\n\n\n\n<p class=\"has-text-align-justify\">Meanwhile \u2026<\/p>\n\n\n\n<p class=\"has-text-align-justify\"><strong>Keep Exploring -&gt; Keep Learning -&gt; Keep Mastering<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\">This blog is part of our effort towards building a knowledgeable and kick-ass tech community. At <a href=\"https:\/\/www.workfall.com\/\">Workfall<\/a>, we strive to provide the best tech and pay opportunities to AWS-certified talents. If you\u2019re looking to work with global clients, build kick-ass products while making big bucks doing so, give it a shot at<a href=\"https:\/\/www.workfall.com\/partner\/\"> workfall.com\/partner<\/a> today.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\">6<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span> Amazon Textract is a highly scalable machine learning service that collects printed text, handwriting, and other information from scanned documents automatically.&nbsp;&nbsp;&nbsp; Using Amazon Textract, you can easily extract text and data from images and any scanned documents that go beyond simple optical character recognition (OCR) to extract data from tables and forms.&nbsp; Many businesses and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":508,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"spay_email":""},"categories":[2],"tags":[3,158,89,159,57,141,157,6],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to use Amazon Textract to extract data from any Image &amp; PDF? - The Workfall Blog<\/title>\n<meta name=\"description\" content=\"Automatically collect printed text and handwriting from scanned documents with Amazon Textract, a scalable machine learning service.\u00a0\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to use Amazon Textract to extract data from any Image &amp; PDF? - The Workfall Blog\" \/>\n<meta property=\"og:description\" content=\"Automatically collect printed text and handwriting from scanned documents with Amazon Textract, a scalable machine learning service.\u00a0\" \/>\n<meta property=\"og:url\" content=\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/\" \/>\n<meta property=\"og:site_name\" content=\"The Workfall Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/facebook.com\/workfall\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-10T10:23:12+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-21T09:43:53+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@workfall\" \/>\n<meta name=\"twitter:site\" content=\"@workfall\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Workfall\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#organization\",\"name\":\"Workfall - Hire #Kickass Coders On Demand\",\"url\":\"https:\/\/18.141.20.153\/learning\/blog\/\",\"sameAs\":[\"https:\/\/www.instagram.com\/workfall\/\",\"https:\/\/www.linkedin.com\/company\/workfall\/\",\"https:\/\/facebook.com\/workfall\",\"https:\/\/twitter.com\/workfall\"],\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i1.wp.com\/18.141.20.153\/learning\/blog\/wp-content\/uploads\/2021\/10\/cropped-WF_logo.png?fit=400%2C400\",\"contentUrl\":\"https:\/\/i1.wp.com\/18.141.20.153\/learning\/blog\/wp-content\/uploads\/2021\/10\/cropped-WF_logo.png?fit=400%2C400\",\"width\":400,\"height\":400,\"caption\":\"Workfall - Hire #Kickass Coders On Demand\"},\"image\":{\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#website\",\"url\":\"https:\/\/18.141.20.153\/learning\/blog\/\",\"name\":\"The Workfall Blog\",\"description\":\"#Tech #Remote #Jobs\",\"publisher\":{\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/18.141.20.153\/learning\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#primaryimage\",\"url\":\"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png\",\"contentUrl\":\"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png\",\"width\":1200,\"height\":628,\"caption\":\"How to use AWS Textract to extract data from any Image & PDF\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#webpage\",\"url\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/\",\"name\":\"How to use Amazon Textract to extract data from any Image & PDF? - The Workfall Blog\",\"isPartOf\":{\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#primaryimage\"},\"datePublished\":\"2021-11-10T10:23:12+00:00\",\"dateModified\":\"2025-08-21T09:43:53+00:00\",\"description\":\"Automatically collect printed text and handwriting from scanned documents with Amazon Textract, a scalable machine learning service.\u00a0\",\"breadcrumb\":{\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/18.141.20.153\/learning\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to use Amazon Textract to extract data from any Image &#038; PDF?\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#webpage\"},\"author\":{\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/person\/cab8236044692bc5b27606b13167794a\"},\"headline\":\"How to use Amazon Textract to extract data from any Image &#038; PDF?\",\"datePublished\":\"2021-11-10T10:23:12+00:00\",\"dateModified\":\"2025-08-21T09:43:53+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#webpage\"},\"wordCount\":1055,\"publisher\":{\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png\",\"keywords\":[\"AWS\",\"characterrecognition\",\"data\",\"file\",\"lambda\",\"s3\",\"textract\",\"workfall\"],\"articleSection\":[\"AWS Cloud Computing\"],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/person\/cab8236044692bc5b27606b13167794a\",\"name\":\"Workfall\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2023\/09\/avatar_user_1_1693914404-96x96.png\",\"contentUrl\":\"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2023\/09\/avatar_user_1_1693914404-96x96.png\",\"caption\":\"Workfall\"},\"sameAs\":[\"https:\/\/www.workfall.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to use Amazon Textract to extract data from any Image & PDF? - The Workfall Blog","description":"Automatically collect printed text and handwriting from scanned documents with Amazon Textract, a scalable machine learning service.\u00a0","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/","og_locale":"en_US","og_type":"article","og_title":"How to use Amazon Textract to extract data from any Image & PDF? - The Workfall Blog","og_description":"Automatically collect printed text and handwriting from scanned documents with Amazon Textract, a scalable machine learning service.\u00a0","og_url":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/","og_site_name":"The Workfall Blog","article_publisher":"https:\/\/facebook.com\/workfall","article_published_time":"2021-11-10T10:23:12+00:00","article_modified_time":"2025-08-21T09:43:53+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_creator":"@workfall","twitter_site":"@workfall","twitter_misc":{"Written by":"Workfall","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/18.141.20.153\/learning\/blog\/#organization","name":"Workfall - Hire #Kickass Coders On Demand","url":"https:\/\/18.141.20.153\/learning\/blog\/","sameAs":["https:\/\/www.instagram.com\/workfall\/","https:\/\/www.linkedin.com\/company\/workfall\/","https:\/\/facebook.com\/workfall","https:\/\/twitter.com\/workfall"],"logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/i1.wp.com\/18.141.20.153\/learning\/blog\/wp-content\/uploads\/2021\/10\/cropped-WF_logo.png?fit=400%2C400","contentUrl":"https:\/\/i1.wp.com\/18.141.20.153\/learning\/blog\/wp-content\/uploads\/2021\/10\/cropped-WF_logo.png?fit=400%2C400","width":400,"height":400,"caption":"Workfall - Hire #Kickass Coders On Demand"},"image":{"@id":"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/logo\/image\/"}},{"@type":"WebSite","@id":"https:\/\/18.141.20.153\/learning\/blog\/#website","url":"https:\/\/18.141.20.153\/learning\/blog\/","name":"The Workfall Blog","description":"#Tech #Remote #Jobs","publisher":{"@id":"https:\/\/18.141.20.153\/learning\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/18.141.20.153\/learning\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#primaryimage","url":"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png","contentUrl":"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png","width":1200,"height":628,"caption":"How to use AWS Textract to extract data from any Image & PDF"},{"@type":"WebPage","@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#webpage","url":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/","name":"How to use Amazon Textract to extract data from any Image & PDF? - The Workfall Blog","isPartOf":{"@id":"https:\/\/18.141.20.153\/learning\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#primaryimage"},"datePublished":"2021-11-10T10:23:12+00:00","dateModified":"2025-08-21T09:43:53+00:00","description":"Automatically collect printed text and handwriting from scanned documents with Amazon Textract, a scalable machine learning service.\u00a0","breadcrumb":{"@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/18.141.20.153\/learning\/blog\/"},{"@type":"ListItem","position":2,"name":"How to use Amazon Textract to extract data from any Image &#038; PDF?"}]},{"@type":"Article","@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#article","isPartOf":{"@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#webpage"},"author":{"@id":"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/person\/cab8236044692bc5b27606b13167794a"},"headline":"How to use Amazon Textract to extract data from any Image &#038; PDF?","datePublished":"2021-11-10T10:23:12+00:00","dateModified":"2025-08-21T09:43:53+00:00","mainEntityOfPage":{"@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#webpage"},"wordCount":1055,"publisher":{"@id":"https:\/\/18.141.20.153\/learning\/blog\/#organization"},"image":{"@id":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-use-aws-textract-to-extract-data-from-any-image-pdf\/#primaryimage"},"thumbnailUrl":"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png","keywords":["AWS","characterrecognition","data","file","lambda","s3","textract","workfall"],"articleSection":["AWS Cloud Computing"],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/person\/cab8236044692bc5b27606b13167794a","name":"Workfall","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/18.141.20.153\/learning\/blog\/#\/schema\/person\/image\/","url":"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2023\/09\/avatar_user_1_1693914404-96x96.png","contentUrl":"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2023\/09\/avatar_user_1_1693914404-96x96.png","caption":"Workfall"},"sameAs":["https:\/\/www.workfall.com"]}]}},"jetpack_featured_media_url":"https:\/\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Textract.png","jetpack-related-posts":[{"id":486,"url":"https:\/\/learning.workfall.com\/learning\/blog\/how-can-we-use-amazon-comprehend-with-aws-lambda-and-amazon-lex-for-sentiment-analysispart-1\/","url_meta":{"origin":504,"position":0},"title":"How can we use Amazon Comprehend with AWS Lambda and Amazon Lex for Sentiment Analysis (Part 1)?","date":"November 10, 2021","format":false,"excerpt":"When we are asking Siri - \u201cHey Siri, where is the nearest grocery store?\u201d or telling Alexa - \u201cAlexa, can you play my workout music?\u201d, we are actually talking to machines! These virtual assistants are examples of machines that understand human languages and respond!! Sounds interesting right? Now you must\u2026","rel":"","context":"In &quot;AWS Cloud Computing&quot;","img":{"alt_text":"Natural Language Processing using Amazon Comprehend","src":"https:\/\/i0.wp.com\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/comprehend1.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":541,"url":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-build-a-serverless-event-driven-workflow-with-aws-glue-and-amazon-eventbridgepart-1\/","url_meta":{"origin":504,"position":1},"title":"How to build a serverless event-driven workflow with AWS Glue and Amazon EventBridge(Part 1)?","date":"November 10, 2021","format":false,"excerpt":"Have you ever wondered how huge IT companies construct their ETL pipelines for production? Are you curious about how TBs and ZBs of data are effortlessly captured and rapidly processed to a database or other storage for data scientists and analysts to use? The answer is the serverless data integration\u2026","rel":"","context":"In &quot;AWS Cloud Computing&quot;","img":{"alt_text":"AWS Glue","src":"https:\/\/i1.wp.com\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/Glue.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1498,"url":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-etl-api-data-to-aws-s3-bucket-using-apache-airflow\/","url_meta":{"origin":504,"position":2},"title":"How to ETL API data to AWS S3 Bucket using Apache Airflow?","date":"November 1, 2022","format":false,"excerpt":"2.5 quintillion bytes of data are produced every day with 90% of it generated solely in the last 2 years (Source: Forbes). Data is pulled, cleaned, transfigured & then presented for analytical purposes & put to use in thousands of applications to fulfill consumer needs & more. While generating insights\u2026","rel":"","context":"In &quot;AWS Cloud Computing&quot;","img":{"alt_text":"How to ETL API data to AWS S3 Bucket using Apache Airflow?","src":"https:\/\/i0.wp.com\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2022\/11\/Cover-Images_Part2-2.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":851,"url":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-create-redshift-clusters-and-query-data\/","url_meta":{"origin":504,"position":3},"title":"Create Redshift Clusters And Query Data","date":"January 4, 2022","format":false,"excerpt":"Data has become such a crucial asset to businesses in today's environment. Almost every significant company has created a data warehouse for reporting and analytics. Utilizing information from a range of sources most data warehousing systems are difficult to set up, cost millions of dollars in initial software and hardware\u2026","rel":"","context":"In &quot;AWS Cloud Computing&quot;","img":{"alt_text":"Amazon Redshift Clusters and Query Data - Workfall","src":"https:\/\/i0.wp.com\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2022\/01\/Cover-Images_Part2.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":647,"url":"https:\/\/learning.workfall.com\/learning\/blog\/store-query-and-index-json-data-using-aws-documentdb\/","url_meta":{"origin":504,"position":4},"title":"How to store, query, and index JSON data using AWS DocumentDB?","date":"November 11, 2021","format":false,"excerpt":"NoSQL databases are a great choice for many modern applications such as mobile, web, and gaming that require flexible, scalable, high-performance, and highly functional databases to provide great user experiences. In this blog, we will discuss AWS DocumentDB followed by the implementation to connect to the Amazon DocumentDB cluster from\u2026","rel":"","context":"In &quot;AWS Cloud Computing&quot;","img":{"alt_text":"AWS DocumentDB and JSON Data - Workfall","src":"https:\/\/i1.wp.com\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/11\/DocumentDB01.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":789,"url":"https:\/\/learning.workfall.com\/learning\/blog\/how-to-add-privacy-to-your-transcriptions-with-amazon-transcribe-to-hide-all-the-confidential-information-from-the-audio-fetched-text-part1\/","url_meta":{"origin":504,"position":5},"title":"How to add privacy to your transcriptions with Amazon Transcribe to hide all the confidential information from the audio-fetched text? (Part1)","date":"December 2, 2021","format":false,"excerpt":"One of the most difficult challenges in computer science is teaching a computer to understand human language. Although in today's technological world, speech-to-text looks to be a simple procedure, it takes a number of language models and algorithms to attain near-perfect accuracy Automatic speech recognition (ASR) and machine translation (MT)\u2026","rel":"","context":"In &quot;AWS Cloud Computing&quot;","img":{"alt_text":"Amazon Transcribe for Transcriptions Privacy - Workfall","src":"https:\/\/i2.wp.com\/learning.workfall.com\/learning\/blog\/wp-content\/uploads\/2021\/12\/CoverImages_1200x628px.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/posts\/504"}],"collection":[{"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/comments?post=504"}],"version-history":[{"count":8,"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/posts\/504\/revisions"}],"predecessor-version":[{"id":2528,"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/posts\/504\/revisions\/2528"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/media\/508"}],"wp:attachment":[{"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/media?parent=504"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/categories?post=504"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learning.workfall.com\/learning\/blog\/wp-json\/wp\/v2\/tags?post=504"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}