Commit Graph

29 Commits

Author SHA1 Message Date
Evgeniy A. Dushistov
b77c0e793a replace deprecated g_pattern_match_string function 2022-06-24 21:34:47 +03:00
Evgeniy A. Dushistov
ebaa6f2136 clang-format for stardict_lib.cpp 2022-06-24 21:34:47 +03:00
Aleksa Sarai
4a9b1dae3d stardict_lib: remove dead poGet{Current,Next,Pre}Word iterators
They aren't used at all by scdv, and thus aren't tested (meaning that
adaptions to the core lookup algorithms can be complicated because these
methods use them but aren't tested so there's no real way of knowing if
a change has broken the methods or not).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-11-14 22:38:26 +03:00
Aleksa Sarai
6d385221d0 lookup: return all matching entries found during lookup
Previously, we would just return the first entry we found that matched
the requested word. This causes issues with dictionaries that have lots
of entries which can be found using the same search string. In these
cases, the user got a completely arbitrary word returned to them rather
than the full set.

While this may seem strange, this is incredibly commonplace in Japanese
and likely several other languages. In Japanese:

 * When written using kanji, the same string of characters could refer
   to more than one word which may have a completely different meaning.
   Examples include 潜る (くぐる、もぐる) and 辛い (からい、つらい).

 * When written in kana, the same string of characters can also refer to
   more than one word which is written using completely different kanji,
   and has a completely different meaning. Examples include きく
   (聞く、効く、菊) and たつ (立つ、建つ、絶つ).

In both cases, these are different words in every sense of the word, and
have separate headwords for each in the dictionary. Thus in order to be
completely useful for such dictionaries, sdcv needs to be able to return
every matching word in the dictionary.

The solution is conceptually simple -- return a set containing the
indices rather than just a single index. Since every list we search is
sorted (to allow binary searching), once we find one match we can just
walk backwards and forwards from the match point to find the entire
block of matching terms and add them to the set in linear time. A
std::set is used so that we don't return duplicate results needlessly.

This solution was in practice a bit more complicated because .otf cache
files require a bit more fiddling, and also the ->lookup methods are
used by some callers to find the next entry if no entry was found. But
on the whole it's not too drastic of a change from the previous setup.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-11-14 22:38:26 +03:00
Jeff Doozan
994c1c7ae6 Use mapfile directly instead of buffer 2020-12-21 17:10:37 -05:00
Jeff Doozan
d38f8f13c9 Synonyms: Use MapFile 2020-12-21 08:53:29 -05:00
Jeff Doozan
cc7bcb8b73 Fix crash if dictionary has no synonyms 2020-12-19 18:37:15 -05:00
Jeff Doozan
8e9f72ae57 Synonyms lookup: return correct offset 2020-12-19 18:01:21 -05:00
Jeff Doozan
88af1a077c Use binary search for synonyms, fixes #31 2020-12-19 15:10:39 -05:00
Evgeniy A. Dushistov
824764ab50 handle possibly invalid data: origin_data == nullptr 2020-08-14 12:46:42 +03:00
Evgeniy A. Dushistov
431a5774ba fix warning 2020-08-14 12:37:21 +03:00
Evgeniy A. Dushistov
7facbe215e refactoring: run clang-format against code 2020-08-14 12:36:02 +03:00
Michal Čihař
0f83f0aa0b Store integer magic in cache file
This allows to detect different endianity of machines to avoid
loading caches created with different endianity.

Fixes #36

Signed-off-by: Michal Čihař <michal@cihar.com>
2017-11-14 16:39:57 +01:00
Evgeniy A. Dushistov
8f16ceae59 refactoring: apply clang-format rules 2017-08-09 07:46:27 +03:00
Peter
e85927e562 Add -e for exact searches (no fuzzy matches).
Only exact matches (or synonyms) are returned for simple searches.
2017-07-28 11:39:34 +02:00
Evgeniy Dushistov
25dd4c8264 Merge pull request #22 from ecraven/json
Add json output, fix #6
2017-07-26 23:55:59 +03:00
Peter
3105823e8b Add option --json-output (-j)
If given -j, format the output of -l and of searches as JSON.
2017-07-26 22:07:23 +02:00
Evgeniy A. Dushistov
214fbbf91e fix portability issue in PR #20 , plus simplify code 2017-07-07 00:19:50 +03:00
Evgeniy Dushistov
f510300f59 Merge pull request #20 from ecraven/master
Add support for .syn synonym files.
2017-07-06 23:57:19 +03:00
Peter
4b52181898 Add support for .syn synonym files.
Fixes #8.
2017-07-06 19:46:15 +02:00
Evgeniy A. Dushistov
72a15b70a7 simplify parsing of integers in ifo file 2017-07-06 13:11:02 +03:00
Evgeniy A. Dushistov
b2ced870ab fix potential undefined behaviour
fix #19
2017-07-04 22:33:14 +03:00
Evgeniy A. Dushistov
97b13e6702 remove not used code 2017-04-22 20:52:18 +03:00
Evgeniy Dushistov
c78d59de5f fixes for last commit 2014-10-24 18:03:30 +00:00
Evgeniy Dushistov
4f80442ece fix build with clang's scan-build 2013-07-10 10:27:20 +00:00
Evgeniy Dushistov
8298a578b0 check fread calls 2013-07-07 23:29:09 +00:00
Evgeniy Dushistov
5f8d2cb174 remove not used code, use glib wrappers where possible 2013-07-07 20:12:03 +00:00
Evgeniy Dushistov
d05de97521 remove file module, move code to utils 2013-07-07 19:52:02 +00:00
Evgeniy Dushistov
e39f7eed9a Simplify file structure 2013-07-07 19:48:44 +00:00