This documentation is outdated. For the latest version of this software, please go to the Yoshikoder Homepage. (The latest version has built-in help pages).
Consequently this page will disappear soon...
Will Lowe, September 2007
Yoshikoder is a simple multilingual content analysis engine. Below is a short tutorial to get you going. After the tutorial is a set of notes on particular aspects of the program. Finally there is a frequently asked questions list.
[This documentation is unfinished]
When Yoshikoder opens you should a three panel window will open. On Windows it looks like this:
The top left panel headed 'Dictionary' will contain whatever content dictionary you are currently working with. The name of the dictionary appears as the root node of a tree next to a blue book icon in the panel. Yoshikoder starts by presenting an empty dictionary called 'Untitled' that contains no categories.
The top right panel is the document panel, and contains the document you are currently working with. Yoshikoder starts without a document, so the label says 'No Document'. When a document is loaded this label will reflect your document's name.
The bottom panel holds concordances, sometimes called keywords-in-context.
Sometimes it is useful to make one of these panels, usually the concordance panel, larger. All the divisions between panels are movable. Rearrange them to suit your data by dragging the separators.
First we'll make a new dictionary. On the menu bar click on File-New-Dictionary, and enter a new name e.g. 'Foo'.
A dictionary is not much use without categories: Now click on File-New-Category, to add a content category. You will be presented with a dialog box that looks like this:
You must choose a name for the new category and type it into the upper box at the prompt. It is also possible, though not required, to associate this new category with a numerical score. The purpose of the score is to place the category on a numerical scale. Every piece of text that falls into this category will then be associated with this number. We shall see more of this feature later when we come to the reporting functions. For the moment, you might leave the score empty.
After you press 'OK', the dictionary panel changes to show your new category as a leaf node. Feel free to add some more categories to the dictionary. You can add categories to the dictionary root node, or to existing categories to make sub-categories, and sub-sub-categories.
Constantly pulling down menus to create new categories an be tiresome, so you can use two other methods. Right-clicking (on Windows) or control-clicking on Mac OSX) whilst inside the dictionary panel will launch a popup menu you can use instead of the main menu. Alternatively you can use the toolbar buttons loated just below the menubar. Pressing the first button will launch the new category dialog.
Categories in your dictionary represent the conceptual structure of your content domain, but they are not yet connected to text. To connect a category to some text, we need to define patterns. To create a pattern, select a category node (not the root node because that is the dictionary itself), and click on File-New-Pattern. Alternatively, you can simply press the second button on the toolbar. You will be presented with a dialog that looks like this:
Now type in a word you think indicates the presence in text of the concept named by your category. For example, if your category was called 'Positive Emotions' then 'love' might be a suitable pattern. Feel free to add more patterns to any of the patterns in the dictionary.
Sometimes Yoshikoder will complain that it 'cannot compile' what you have typed into the pattern dialog. This is because it is expecting a 'regular expression', rather than just a simple string. Regular expressions are elements of a linguistic pattern-matching language that allows you to specify more than one word or phrase in a single pattern, rather like the Windows 'wildcard' characters, except much more powerful. Regular expressions are well worth taking the time to learn about, and you can read about how to use regular expression below. For now, to avoid any 'cannot compile' problems, you can just avoid putting punctuation in your patterns.
You can create patterns in uppercase or lowercase and get the same matches in text: Yoshikoder will treat "Content", "CONTENT", and "content" as matching all the same words in a document.
Now that there's some structure to your dictionary, we can load a document and analyse it according to your new category system.
From the menu bar click on Document-Open Document and pick a document from your files. This document must be a text file. Yoshikoder cannot read proprietary document formats such as Microsoft Word. However most software, including MS Word, can save your document as text.
After you've chosen a file an encoding window will appear so you can make sure it is in the right form to be worked with. Below is an encoding dialog for a document containing a short poem in Russian.
On the left you see the first section of the chosen document. On the right are settings for the document encoding and a choice of fonts to display it.
For english language documents, it it will seldom matter which encoding you choose on the right hand list. However, if you are dealing with other languages you may at first see nonsense characters in the preview screen. This is because Yoshikoder is set to expect the wrong document encoding. You can correct this by clicking on the name of the correct encoding. You can work through several possible encodings until the text looks right. Then press 'OK'. Yoshikoder will now expect future documents to be in your chosen encoding.The font list is present because not all fonts can display all characters. For example not many fonts can represent all of Simplified Chinese, and some even have trouble with German and Russian.
There is longer discussion of document encodings in the sections below. For now we shall assume that you are successfully viewing your document in the document window.
Now that you have a dictionary and document, you can see where your patterns occur. Select a pattern and click on View-Highlight. You can also press the magnifying glass button on the toolbar. Everywhere the pattern matches a piece of text in he document will be highlighted yellow in the document.
If you select a category to highlight, all the patterns that appear underneath that category node in the dictionary panel will be highlighted, including those patterns in subcategories.
It is often useful when constructing a dictionary to be able to see the local context of particular patterns. This is often called keyword-in-context. The list of patterns and theor contexts is called a concordance. To examine a concordance for a pattern, select that pattern and click on View-Concordance. You can also press the rightmost button on the toolbar.
The concordance panel will then fill up with all the contexts of that pattern. Yoshikoder looks a fixed number of characters either side of the pattern to create a concordance. You can change this amount in in the preferences panel by clicking View-Preferences.
If you select a category and create a concordance, you will get a concordance for all the patterns beneath that category, including those in subcategories.
Below is an example of Yoshikoder's concordance and highlighting capabilities. The dictionary is a translation of the dictionary used in Laver M. and J. Garry (2000) American Journal of Political Science 44.3, 619-34; the document is the 1997 manifesto for the British Conservative Party, available from the Manifesto Coding Project, and from http://www.politics.tcd.ie/kbenoit/wordscores/