Mathematical and Physical Journal
for High Schools
Issued by the MATFUND Foundation
Already signed up?
New to KöMaL?

Problem S. 9. (May 2005)

S. 9. Nowadays one can easily download DNA samples from the Internet. Such a sample consists of combinations of letters A, C, T and G, that is, the abbreviations of the four nucleotide bases adenine, cytosine, guanine and thymine.

Your task is to locate a given sequence (a ``gene'') in the given sample of letters A, C, T or G. The sample and the sequence are given in two text files. The first row of each file contains the number of letters, then the sample (or the sequence) itself of letters A, C, T, G follows. For the sake of readability, the lines are wrapped to have at most 100 characters.

Your program gets the necessary file names from the command line: first the name of the sample file, then the name of the file containing the desired sequence will be given. Your program should send a 0 to the standard output, if the sample does not contain the sequence, otherwise the output should be i, if the first occurrence of the sequence in the sample begins at the ith position. (Positions are numbered from 1.)

It can be assumed that the sample has at most 50 million characters, while the sequence consists of at most 1 million characters.

(10 pont)

Deadline expired on June 15, 2005.


Statistics:

7 students sent a solution.
10 points:Engedy Balázs, Treszkai László.
9 points:Deák 666 Áron, Vincze János.
8 points:2 students.
0 point:1 student.

Problems in Information Technology of KöMaL, May 2005