TSG has several clients using Documentum as a repository and a custom front end application for consumption of the records or renditions of records. In most cases there is a mechanism in place such as SCS (Site Caching Services) or TSG’s OpenMigrate PUMA (See CIS Case Study for more details). While a typical Documentum application (ex: Webtop) provides a “one stop shop” for authors and approvers, the interface can be challenging when “consumers” are just looking for quick search and retrieval. This solution provides improved performance, business continuity, and ability to add documents from other systems. One potential risk to using a cache of documents and metadata for search and retrieval is the integrity of data. Publishing techniques are designed to accurately cache records; however there are uncontrollable circumstances that may result in a mismatch. Continue reading ‘Documentum to Portal Consistency Checker – Proof of Concept’
Archive for the 'R&D' Category
Tags: Consistency, data integrity, Documentum, OpenMigrate, portal, PUMA, Webtop
Documentum Transformation Services (DTS) – Alternative Approaches with Adobe LiveCycle and OpenOfficePublished April 8, 2010 Adobe , Alfresco , Documentum , DTS , LiveCycle , Open Source , R&D , Tech Tip Leave a Comment
Since the very first Momentum (1996 in a very windy Miami), the Documentum user community has pushed for a more reliable means to convert mostly Microsoft office documents into PDF. Back then, during a wrap-up luncheon, the feedback on AutoRender ( a previous incarnation of DTS) was anything but positive. Similar to some complaints today, some of the main complaints included:
- Having to monitor/reboot the AutoRender Server throughout the day
- Unreliable PDF Transformation included:
- Unsupported Document Types
- Font Replacement
- Broken links
At the time, Documentum threw some engineering effort into AutoRender to address some of the shortcomings. One of the changes was to have AutoRender reboot itself (not really a fix but it did address some of the shortcomings). Like other products from Documentum, TSG is occasionally asked for alternatives. This post will address some of the tools we use in non-Documentum environments that could easily be adapted to the PDF rendition needs for Documentum.
For a couple of our non-Documentum customers, we have leveraged the Adobe LiveCycle component PDF Generator. We have been very impressed with their reliability and functionality. Considering Adobe created the best known implementation of Portable Document Format, it makes sense to rely on Adobe technology to convert your native content.
The last post discussed the results of an HPI Lucene Search test compared to a Webtop FAST Search as part of a proof of concept for a client looking to provide a consumer interface. As we have often mentioned on this forum, we continually see clients looking for a better search interface than Webtop, as well as some content cached outside of Documentum for business continuity, performance, and licensing.
One accurate comment raised by the post was that our comparison of HPI/Lucene against a Webtop/FAST search wasn’t really comparing apples to apples as the Webtop search was running against Documentum with security, while the Lucene search was not. While the client’s goals were to show the benefits of the cached repository and Lucene against Documentum, many Documentum users would like to know how Lucene would perform directly against a Documentum repository (as with upcoming DSS).
For this post, we will discuss TSG’s strategy and initial proof of concept results in leveraging Lucene for a Documentum full text search engine.
As mentioned in a previous article, many clients are moving to away from FAST in preparation for the eventual release of Documentum Search Services (DSS) slated for release in June that leverages the open source product, Apache Lucene. This post will share the results from one client that executed a proof of concept test to compare the two search engines.
Proof of Concept Approach – As we have mentioned before, many clients have decided to implement an external cache outside of Documentum to address business continuity, performance and licensing issues. For a large pharmaceutical client, TSG was tasked with performing a proof of concept on 156,000 documents in an external data source indexed by Lucene. The proof of concept would compare search results of FAST within Documentum (Webtop) and Lucene (HPI) outside of Documentum in regards to search results. The proof of concept additionally evaluated leveraging Lucene for metadata storage rather than storing in another database such as Oracle.
POC Findings – Lucene/HPI and the external repository was found to be considerably quicker that the existing FAST/Webtop implementation on most queries.
|1200 Results||90 seconds||3 seconds|
|8 Results||5 seconds||3 seconds|
|10 Results||8 seconds||4 seconds|
|76 Results||10 seconds||5 seconds|
|5100 Results||72 seconds||5 seconds|
|65 Results||6 seconds||3 seconds|
Simple configuration of the Lucene index did a better job of returning a more complete search result set than the standard FAST/webtop configuration. Examples included additional documents that were logical derivatives of the initial search word. For example – a search for “exception report” could return “exceptions report” or “exception reports”. The proof of concept data set also included German documents and Lucene demonstrated multilingual stemming capability.
Key Stats – Lucene
- 156,000 Documents – 31.6 Gigabytes
- Total Index Space – 521 MB
- Total Index Build Time – 10 hours – The client was very interested in the time it took to index the content and metadata in Lucene because they had experience lengthy indexing times with FAST in their 5.3 upgrade. This was tracked as part of the proof of concept, however, the corresponding FAST data is no longer available from the 5.3 upgrade.
FAST and Lucene – Full Text Syntax Differences
- “One Two” – will return documents with the exact phrase “One Two” in the document
- One Two – will return documents with the words One OR Two in the document
- One+Two – will return documents with the words One OR Two in the document
- One and Two – will return documents with the words One AND Two in the documen
- Lucene – Based on the Proof of Concept’s configuration
- “One Two” – will return documents with the exact phrase “One Two” in the document
- One Two – will return documents with the words One AND Two in the document
- One OR Two – will return documents with the words One OR Two in the document
- One and Two – will return documents with the words One AND Two in the document
- One+Two – will return documents with the exact phrase “One Two” in the document
Overall the client was very satisfied with the findings and is moving forward with the solution. The flexibility of Lucene to index both the metdata and full-text values allowed the client to avoid adding an additional Oracle database to their external cache for attribute storage. The client also liked the more simple, intuitive search interface of HPI compared to the Webtop interface.
In addition to leveraging Lucene for searching an external cache, we are also working to leverage Lucene for internal Documentum/Webtop search.
If you have any questions or would like more detailed information, please contact us or comment below:
Another advantage of GWT is that it allows debugging in a hosted mode browser, so most changes in the client side code can be viewed by simply refreshing the browser. Several plug-ins are available which allow GWT development in different development environments including Eclipse.
Here is a screenshot of our GWT Annotation tool interface. Be sure to check back often as we will be releasing our annotation demo shortly.
With the upgrade to D6.5, many of our clients are reconsidering their annotation choices. This blog post will address some of the annotation product choices based on our experience, as well as our internal development efforts on our Free Viewer Tool that is based on a thin client with Adobe Flex and support for viewing and basic annotation capabilities.
Definition –this entry is referring to “annotation” as a mark-up “layer” on top of the document. Redline changes (like Word track changes) are embedded in the Word file and is not the focus of this entry.
Thick Client or Thin Client
One of the first decision points when choosing an annotation tool is between a thick or thin client. Early annotation tools required a client side component for client/server capabilities. With browser-based annotation tools, annotations might rely on either a client side plug in or an applet. For Documentum, client components are required for Brava (applet), Annodocs and Documentum Annotation Services (Adobe Acrobat). Snowbound offers versions that don’t require a client component or have an applet based approach. Our Free Viewer only requires Adobe Flash to be installed on the client. With a thin client approach, the image (not the entire file) is sent to the client. This could be a substantial performance improvement when viewing large files. Also, thin client approach provides for additional security since the file is never passed to the client.
TSG Thoughts – We are usually recommending the thin client to improve performance and security while reducing IT support costs particularly when extending the application to outside third parties.
Native Document Annotations or PDF-only
One approach would be to allow the mark-up layer to view on top of any type of file format. Snowbound and Brava both support this type of annotation. Another approach would be to turn everything into PDF and only allow mark-ups on top of the PDF. This approach is required by Adobe and Annodocs although supported by Brava and Snowbound as well.
TSG Thoughts – Many of our clients have had difficulty with the native document approach not due to fault of the vendor but due to the constantly evolving and backward compatible native file formats. For our free viewer, we are only supporting PDF or TIFF.
With all annotation tools, the amount of graphic options (circle, arrow, highlight, underscore….) can confuse the user and blur the line between annotations and redlines. Also, one major user complaint is that annotations can be buried on subsequent pages and users will have to flip to them to find them. Annotation tools should highlight/bookmark annotations when viewing the document to avoid having the user flip through every page looking for annotations.
TSG Thoughts – We lean toward simple annotations for basic markup to reduce training costs and markup/review time.
It is important to understand that every annotation tool typically stores it’s annotations in a proprietary format making it difficult to change annotation tools. When changing annotation tools, the existing annotations must be deleted or reformatted.
TSG Thoughts – For our Free Viewer, we have targeted Adobe’s new XFDF for mark-up to be compatible with Adobe as well as Documentum Annotation Services.
Tags: Migration, Upgrade
For many Documentum customers, deciding how to upgrade a Documentum system often boils down to whether or not to upgrade in-place with a clone or just leave the environment alone and upgrade it in-place on the existing hardware. This year, I worked with a client on a project to explore the differences between upgrading several Documentum systems in place versus migrating the documents straight to a new 6.5 installation. Many of the in-place upgrade complexities were due to the older database and OS.
- Oracle needed to go from 9i to 10.2.03 as well as be converted to UTF-8
- The Unix OS needed a significant upgrade, including the rack supporting the virtual partitions
- The Documentum Content Server required several upgrade steps. It needed to go from 5.2.5 (some 5.2) to 5.2.5 SP5, then 5.3 SP6, and finally to 6.5. I then did a separate upgrade to 6.5 SP2.
There were several project goals that could only be achieved with a migration strategy.
- Combine Repositories on Windows installation and move to a single UNIX installation
- Reorganize object model by flattening object hierarchy
- Undo custom folder configurations created many years ago
The technical complexities of upgrading in-place from 5.2.5, and the need to merge Documentum repositories, led the client to pick a migration approach for the upgrade
Based on TSG’s upgrade experience with this client and others, we created an upgrade planning guide.
The planning guide is available here.
Please let me know your thoughts below.
Tags: ECM, HVS, Migration
Documentum High Volume Server (HVS) is a new product designed to cut database space usage in Documentum 6.5 by a third or even up to one half depending on the type of content. Given the significantly reduced database size, overall performance should increase. This year TSG evaluated HVS for a client as part of a Documentum Upgrade. (See other thoughts in our Documentum Upgrade Planning Guide )
HVS – When to use it
Basically, HVS was developed to efficiently store non-changing static or immutable content and meta-data. A good example is scanning/imaging but COLD and other content/meta-data that will never change makes sense as well. Content stored using HVS should not need to be versioned, rendered, annotated or changed. Otherwise, HVS converts the object from a light weight object back to a normal Documentum object and the benefits of HVS are lost. Examples of content that are ideal for HVS include reports, invoices, check images, documents archived for historic purposes and reference, and emails.
HVS – How it works
HVS reduces the size of the database by sharing security and common meta-data amongst a set of lightweight objects. HVS can also partition the database to increase the rate content can be stored and retrieved. There are some limitations placed on the content to achieve these benefits. First, security is applied broadly to a lightweight object type. This results in all documents of a lightweight type being available to all users that can access the type even though a user may only need access to a portion of the documents. In other words, HVS cannot support the normal object-level ACL security and accordingly security may need to be built into the application layer. The other limitation, as already mentioned, is that documents cannot be versioned or changed.
If you need to make large volumes of content available in near real time, the rapid ingestion feature of HVS may be of interest. Using special HVS DFC functions, applications can load raw database tables that contain the meta-data information for your lightweight object types. This is very different than typical DFC applications that work strictly through the Documentum object layer. To use rapid ingestion, a custom program is necessary (Documentum does not have any tools that currently support this, including Captiva), the DBA will also need to partition the database tables. The partitioning allows the data to be loaded into “offline” Documentum tables. The tables are then swapped with empty place holder tables making the newly documents available while the Content Server stays up and running.
With a partitioned database, other new tricks are available in the HVS DFC to scope searches to particular database partitions. This can be handy if the system is very large and the user community is having unacceptable metadata search performance times.
WHERE TO GO NEXT
When considering HVS – users should keep in mind specific points
- Cost of HVS (will vary by installation)
- Performance Benefits versus normal database tuning
- Ingestion program development as this would be custom HVS DFC calls
In relation to the ingestion process, TSG has added support to HVS in OpenMigrate to help clients ingest new content as well move existing content to HVS. One benefit of this approach is that one tool can be used for ongoing ingestion of new content while also being able to support movement of existing content within the docbase (ex: archived items).
With our client, the proof of concept went well but the client didn’t quite realize that HVS required additional cost and licensing. In evaluating the benefits versus the cost, the database and Documentum support requirements did not outweigh the benefits and the client did not move forward with HVS.
TSG recently had the privilege of test driving and customizing Documentum’s new collaboration solution, CenterStage Pro. CenterStage Pro is a next-generation collaboration tool and features a sleek new interface that does away with Documentum’s traditional WDK front ends.
We updated CenterStage with two very similar actions. The first action that we added launched the Active Wizard in the same window, automatically logging the user in. The user then had access to the entire Active Wizard, and when they completed their work, the “Return” link in the Active Wizard returned them to CenterStage, automatically logging them in and returning them to their last location. The second action that we added implemented all of the first action, with the addition of the ability to route a specific document using an Active Wizard form.
The overall similarities with WDK in terms of the multiple files should be an advantage to anyone familiar with WDK; however while WDK splits up the different files (ie – NLS properties, action xml, etc) into individual files for each action, CenterStage tends to group all of the string definitions into one file, all of the action definitions into one file, etc. This centralizes the work quite a bit, as you are not creating a suite of new files for each additional action.
Since CenterStage is still version 1.0, it has a limited amount of customization capabilities. Documentum has stated that an official customization SDK should be available with version 1.5.
Be sure to check out a video of our customizations in TSG’s LearningZone.