Frequently asked questions
- General questions
- Other tools
Silvestris is a project of free/libre/opensource libraries and web applications related to computer assisted translation (CAT) and translation memories (TM).
We want to use as much as possible the European Union Public License (EUPL), actually version 1.1 (read also our comments about EUPL 1.2). Unfortunately, EUPL has some incompatibilities with other licenses, and of course with those of commercial software, which we do not want to exclude from available clients. For this reason some of our projects use other licenses and can be closed source because they are linked to non-public APIs. But they remain available free of charge.
Maybe you may consider that our project is not totally free of charge since there is a paid hosting service. But this service is absolutely not mandatory : if you can install a server yourself, you do not need to use it. If on the contrary it gives you a real help, don't forget that all of this has a cost and that this is the only way we use to finance it.
Silvestris, in latin, is an adjective related to wood (silva). It can also have the meaning of wild, savage. Originally the plan was to create a free web CAT tool, which could be understood as wild CAT, so we made the reference with silvestris felis, the wild version of your familiar cat. When we finally decided to go step by step, creating several smaller tools, we kept the word silvestris to design the full project, then for other subprojects, we use names from the official international animals taxonomy, in latin. For example, Cyclotis is the reference to african forest elephant, which is not the best known species of elephant but semt to reflect correctly the spirit of a short term translation memory.
First you can open an account to this web site and participate in the forum. We perfectly know that it was not possible until very recently: the reason is that this web site received a lot of spammer accounts (more than 100 a day!) meaning that we had to temporarily switch off registration until we could find a solution. And this temporarily situation was finally durable. Now we installed an anti-spammer plugin for Drupal, which reduces the problem but does not completely remove it. As a consequence we adopt the following rule: please register only when you plan to use the forum, because if you register and send nothing, your account will be included in the list of spamming accounts which are regularly deleted.
For technical contributions, you may also register to the forge, which has never been closed for registration (it is not affected by the spamming problem). However, this forge does not always send e-mail notifications, so if you receive no answer, consider to use the forum as well.
The two projects are maintained by the same person, but not in the same conditions. DGT-OmegaT is a project from the European Commission, containing what they decide to put in, while Silvestris is a personal project, designed at home. The fact that Silvestris uses EUPL license has nothing to do with the link between the two projects: this is only the license I prefer.
Another link between the two projects is that what they call TeamBase is nothing more than a customized version of Silvestris Cyclotis. Except for the name, most of what you can read in DGT-OmegaT web site about TeamBase is perfectly valid for Cyclotis - so, instead of repeating the same things in two locations, we invite you to read both web sites. Also DGT-OmegaT already includes the Cyclotis plugin, while OmegaT would need some adaptations (but the code for them is already available). It is probably a good idea that you use DGT-OmegaT to test Cyclotis, before we open an RFE for OmegaT to have the patches included in the original version as well.
Cyclotis database is optimized to reply fast, as well for reading or writing, but in the condition that the table contents remains small. According to our experience with years of using, performances start to decrease seriously after around 10 000 segments and are quite unacceptable after around 40 000. If you don't delete your memories, you will certainly reuse them in other projects, they will continue to grow, and as a result, you will after complain about performances. For long-term memories, we would still use Postgresql but with the choice of another table schema organisation. And it would be better to use it in batch mode, that is, send and receive lot of segments in one step rather than trying to answer at real time.
Using the function EXPLAIN PLAN from Postgres, we discovered that indexes were used for SELECT queries only if there were more than approximately 10 000 segments. However, they are filled at any time, meaning that UPDATE or INSERT queries become slower while SELECT queries are not faster. That's one of the reasons why we think that Cyclotis should remain a short-term database (see previous answer) : for a long-term database we would use different optimizations, and indexation is one of them which is only efficient on big databases (in the case of Postgresql).
Another reason why it is difficult to use indexes for Cyclotis is that it makes large use of table inheritance. This kind of inheritance does not apply to indexes: you must put indexes to the child tables really containing data, and if a table has a lot of children, it cannot do better than exploring each table and eventually the index dedicated to this table. Also for this reason indexes are globally inefficient for Cyclotis - while for Elefas, which uses different schemas, the things are really different.
Fuzzy matches inside Cyclotis make use of the Postgres package "pg_trgm", for which I do not know any equivalent in MySQL or other open source servers. If you want to port this service to another server, either you find an equivalent or you will have to limit the power, for example using exact searches or retreiving all results. Glossary search uses the module tsearch2 and is not portable for the same reason.
We did sucessfully execute an OmegaT project mode using other databases, since this mode uses only exact search, which is an SQL standard. So, we know that almost for this service, this is possible. But we do not provide a plugin for it, because it is only valid for this mode and we may add in the future some other optimisations, in case portability could be not possible in the future.
Most Computer Assisted Translation tools are proprietary and don't offer a plugins API, especially for translation memories or direct access to project or editor (while they usually do for Machine Translators). This is probably because most of them also promote their own server solution for translation memory. However, if you think that you are able to implement such a plugin for your favorite CAT tool, please tell us so and we will be proud to give you all necessary information about Cyclotis (like our server protocol) to make it possible to do so. You can find here a short documentation but it is far from being complete.
The purpose of these libraries is to be used in most common CAT tools, and most CAT GUIs are written either in Java or in .NET: the idea is that it would enable to write plugins which use the Ruby classes rather than rewriting them in the target programming language. Maybe it will be slower but it will ensure perfect compatibility between all platforms, even if they change in the time.
Except maybe Culter, most of them should be considered as highly experimental for the moment. Only Cyclotis is really mature.
When you create a database schema, you generally add indexes: this sounds perfectly natural. But in the case of Cyclotis, as a big surprise, we discovered that most often, the SELECT queries did not use the indexes (they do not appear in an EXPLAIN PLAN), even if they exist, until the database grows to a certain limit. So, in this case, indexes have a cost during UPDATE without any gain during SELECT.
Most translation projects are in the just mentioned limit. So we consider that a database which is created for one translation and destroyed immediately after has no reason why to be indexed. Unless we finally receive an explanation about why the indexes were not used...
Note: the possibility to index a Cyclotis memory is available, but not set by default. In the SaaS server, the rule is that memories are not indexed during creation, but they are automatically indexed when you renew a memory (because when you do it, we hope that the memory is now big enough to take profit from the indexes).
Another difference is that Cyclotis makes use of GiST indexes, while Elefas uses GiN indexes. According to PostgreSQL documentation, GiST indexes are better for dynamic searches while GiN are indicated for more static data - and this is precisely, the biggest difference between Cyclotis and Elefas!
Even with indexes, a big database can have poor performances if hundreeds of users are connected at the same time. For that reason, we find a compromise: the big database, which contains everything, should be only accessed in batch mode, then data are extracted to a Lucene index which can accept an high number of users (since it is local and read-only, the "connection" dos not even consume any resource)