|
|
CAFCA employs the methods of group-compatibility and component-compatibility to run a cladistic analysis of a data matrix. The data matrix may either be binary, multi-state, or mixed binary/multi-state. Missing values are allowed and should be indicated by a negative integer, or a question mark. Multi state characters may be expressed in binary form.
CAFCA uses a so-called column partitioning vector to indicate which columns belong together block wise. Each block of binary states represents one multi state character, thus avoiding the errors that are introduced when each state of a multi state character is treated as a separate nominal variable. Polymorphism can be indicated, but only in the binary expression of multi-state characters. Characters may either be polarized by indicating the (putative) ancestral state (zero), or polarized and ordered by means of a (partial) additive binary coding, or kept neutral (no polarity, no order), or polarized and linearly ordered upon request (multi-state characters only). CAFCA has no options for step matrices. Taxa may either all belong to the ingroup, or the outgroup(s) may be included and (interactively) indicated by the user, or deduced from the data matrix (full zero row). The data matrix must be available either as an ASCII file, or be present in the OutputFile system, or the user may use CAFCA's built-in editor to enter a data matrix. The group- and component-compatibility method is based on the idea that each cladogram has components as its building blocks, i.e. sets of terminal taxa corresponding with the nodes in a cladogram. Any two components share one of four possible relations; exclusion, inclusion, overlap, and replication. Components are compatible when they either include, exclude, or replicate each other. All components (taxon subsets) of a cladogram are mutually compatible. Components of a cladogram can be seen as nodes of a graph, connected by lines depicting the compatibility relation. Thus the cladogram corresponds to a clique, a maximally connected [sub]graph.
CAFCA starts by extracting components from character data. Components can be defined in terms of character states in several ways. They may be defined by unique character states, by unique combinations of character states (neither one of the separate states needs to be unique; the combination, however, is), by using all possible transformation series (additive binary codings) in multi state characters, by three taxon statement permutations, or by polythetic sets of character states. Components thus defined are seen as nodes in a graph, connected when they are compatible. The graph is searched for maximally connected subgraphs (cliques) by a branch and bound algorithm. Each clique corresponds to a cladogram. The characters from the data matrix are optimized on each of the cladograms found (parsimony mapping), and the most parsimonious cladograms are selected as the most likely representation of the cladistic relationships of the taxa involved.
Note that component compatibility is, in general, not the same as character compatibility. In component compatibility groups of taxa may be based on partial agreement of characters, something that is forbidden in character compatibility. As a consequence, more, better resolved, and more parsimonious cladograms can be found by the component compatibility method. In general, CAFCA can be used to analyse any pattern generated by historically associated lineages, be it genes (characters) in taxa (taxa as areas for genes), parasites on hosts (hosts as areas for parasites), or taxa and their areas of endemism or biota's. In all these instances the input data are generated from a cladogram describing the cladogenetic relationships of the genes, taxa, or parasites involved, and a binary matrix describing the distributions of these entities over their associate, i.e., taxa, areas, and hosts, respectively. The method employed in all these cases, i.e., component compatibility, is identical to the one as described above for character data.
|