As you may recall from my previous post, Knowledge Network (KN)Â is about unlocking tacit knowledge in the enterprise (versus explicit knowledge). Glen Anderson, Group Product Manager, presented an “under the hood” view of KN, following on the heels of John Hand’s overview presentation yesterday (ref).
The first topic discussed was client profile creation, which is accomplished in four phases: select information (i.e. data sources to analyze such as the primary data source of Outlook folders), run analysis, review profile and publish profile. Client profile creation follows the Notification-Control-Consent (NCC) privacy model I relayed yesterday, which is also shown graphically here:
The initial analysis phase (#2 within the overall client profile creation process)Â also consists of four phases as follows:
- Read each email. Essentially only the first paragraph of the email body is parsed by KN; attachments are not scanned.
- Capture key “interaction data” in a local Access database.
- Read in contacts from Outlook and IM clients.
- Sync colleagues from SharePoint Portal Server 2007Â profile.
2. Contact resolution
- Lookup contacts against the Global Address List (GAL) (MAPI).
- Internal or external?
- Is the contactÂ a distribution list? If so, discard.
- Capture key properties.
- Aggregate counts.
- Check thresholds.
- Calculate strength.
- “Exclusion lists”
- Special rules
- Limits applied
- Organization name mapping for external contacts
KN analysis is captured locally by the KNClient.log and MDB files located within %UserProfile%\Local Settings\Application Data\Microsoft\Knowledge Network. There are two Access databases on the local client machine: one has the raw data (i.e. parts of emails), and the other reflects your actual profile choices.
After talking about client profile creation, Glen went on to address some of the top questions customers have raised about KN.
Why client-side mining?
- Privacy – Nothing leaves user’s machine until they “publish.”
- Access to information – PSTs, future data sources–I’d certainly like to understand what some of these potential data source might become.
- Distribute the processing.
Why not mine sources other than email?
- Email is by far the richest and most pervasive source today.
- Calculating strength across different data sources adds complexity–I’d like to understand the nature of this complexity in more detail.
How long does the analysis process take?
- Depends on a number of factors (e.g. on the amount of email and unique contacts; on disk performance, RAM, and CPU; on user activity–since analysis process runs at low priority)
- Microsoft hasÂ measured anywhere from 5 minutes (3K emails) to 12 hours (120K emails)–further optimization prior to release is planned.
The rest of Glen’s presentation covered SharePoint server integration (e.g. mention of a KN Profile Management Web Service for publishing client-side, Access-based data to the server-side KN store), privacy and anonymous brokering (i.e. a process of connecting seekers and targets anonynously in a manner reminiscent of referrals on LinkedIn), deployment and administration, and extensibility.
On the last subject of extensibility, there is a managed API on the client-side to access the KN database. On the server-side, there are at least two web services: oneÂ that exposes a full-fledged query language for expertise/social network people search and another to retrieve (read) and augment (write) profile data (e.g.Â bootstrap the profile or add custom keywords and contacts). It will be interesting to examine the WSDL of these web services (e.g. via the KN SDK) to understand how to potentially introduce other data sources into the KN system beyond email (e.g. content subscriptions via feeds, authoring trend data from content repositories, discussion thread contributions in collaboration stores, etc.).
Today’s KN session finished with a fair amount of Q&A at the end, indicating clear product interest.
For more KN information, stay tuned here. I also recommend visiting the KN team blog.