in

simplusoft.net

another .NET open source community

iulia

Search content in pdf files stored into Sql Server


If you have some .pdf files stored into Sql Server and you want to search content in them, the steps are:

1. Suppose you allready have a table into sql server that have a column "FileContent" of type varbinary

(MAX). Check this. Add a column "Extension" of type nvarchar(50) where you have to store the exact

extension for your file; of course, in our particular case the value will be ".pdf". This column will be

used when you will create an full text search index on the column "FileContent"

2. Enable creation of full text index search on your Sql database


 EXEC sp_fulltext_database 'enable'
 go

 sp_fulltext_service 'load_os_resources',1
 go

 sp_fulltext_service 'verify_signature', 0
 go


3. Create the full text catalog
 create fulltext catalog CatalogTest
 go


4. Create the full text index on the column "Filecontent"


 CREATE FULLTEXT INDEX ON CatalogTest.tblDocs(FileContent) KEY INDEX PK_extra_Docs;
 go

where "tblDocs" is the table that have the column "FileContent" and "Extension"

5. Of course, now you will try to test the full text search index, suposing that you allready have some

.pdf files inserted into database. But will not work. In order to can search in your pdf files, you have

to install on the Sql Server the Adobe PDF Filter. You can take Adobe PDF Filter from here AdobePdfFilter.zip

6. Now you can check to search by using the query sample below

declare @SearchTags nvarchar(1000)


set @SearchTags = 'this is a text to search in pdf content'


select *
from tblDocs
where
 (@SearchTags = ''
  or (@SearchTags <> '' 
    and CONTAINS(FileContent, @SearchTags))
 )

Comments

No Comments
Simplusoft.net is a non-profit community dedicated to all .NET developers
Powered by Community Server (Non-Commercial Edition), by Telligent Systems