Mathematical and Physical Journal
for High Schools
Issued by the MATFUND Foundation
Already signed up?
New to KöMaL?

Problem I. 210. (March 2009)

I. 210. Linear search is a simple but slow algorithm to find the first occurrence of a given pattern in a text: the first character of the pattern is compared with successive characters of the text, that is, the pattern is shifted from the left to the right by 1 position until we find a complete match.

However, when the size of the alphabet is large compared to the length of the pattern, the following modification can speed up the algorithm considerably: the pattern is again shifted from left to right, but when a mismatch is found at a certain position, the pattern may be shifted to the right by more than one character, depending on the first character of the text following the pattern.

This is justified by observing that if the first character of the text just after the pattern is not contained in the pattern at all, then it is enough to continue pattern matching from the right neighbour of that character. If, on the other hand, the first character of the text just after the pattern is contained in the pattern, then the length of the right shift can be chosen to be the distance between these two occurrences.

In our examples, ``Szöveg'' is ``text'', ``minta'' is ``pattern'', ``különbség'' is ``mismatch'', while ``eltolás'' is ``shift''.

The execution can be made even faster, because the length of the shift can be computed in advance for each character of the text.

Write a program that determines the first occurrence of a pattern in a given text. The pattern (being at most 255 characters) is entered from the keyboard, while the text is found in a file specified as the command line argument. The sentence containing the first occurrence should be displayed on the screen. The beginning of this sentence is defined as the first non-space character following the nearest end of sentence punctuation mark (``.'', ``!'', ``?'' or beginning of the file) preceding the pattern, while the end of this sentence should be the first end of sentence punctuation mark or end-of-file mark succeeding the pattern. Upper and lowercase letters should be treated identically.

The source code of your program (i210.pas, i210.cpp, ...) and a short documentation (i210.txt, i210.pdf, ...) should be submitted, also describing your solution and specifying the name of the developer environment to use for compiling.

(10 pont)

Deadline expired on April 15, 2009.


Sorry, the solution is available only in Hungarian. Google translation

Megoldásokról

Viszonylag kevés maximális pontszám sikerült. Ennek oka, hogy sokan nem a megadott algoritmus alapján próbálták a megoldást elkészíteni. Többen nem vették figyelembe, hogy végrehajtást jelentősen gyorsítja, ha a minta eltolását minden előforduló karakterre előre kiszámítjuk. A tesztelést nagy méretű szövegállományon végeztük, hogy a hatékonyság különbségek futási időben is megjelenjenek. Erre Thomas Mann: József és testvérei című könyvét választottuk. (Ez letölthető a Magyar Elektronikus Könyvtárból.)

Minta megoldás

Englert Péter 12. osztályos (Zalaegerszeg, Zrínyi Miklós Gimnázium) megoldását közöljük. ( I210.pas)


Statistics:

10 students sent a solution.
10 points:Englert Péter, Kővágó Zoltán.
9 points:Uray Marcell János.
7 points:1 student.
6 points:2 students.
5 points:1 student.
3 points:2 students.
2 points:1 student.

Problems in Information Technology of KöMaL, March 2009