KöMaL - Középiskolai Matematikai és Fizikai Lapok
Sign In
Sign Up


Problem S. 9. (May 2005)

S. 9. Nowadays one can easily download DNA samples from the Internet. Such a sample consists of combinations of letters A, C, T and G, that is, the abbreviations of the four nucleotide bases adenine, cytosine, guanine and thymine.

Your task is to locate a given sequence (a ``gene'') in the given sample of letters A, C, T or G. The sample and the sequence are given in two text files. The first row of each file contains the number of letters, then the sample (or the sequence) itself of letters A, C, T, G follows. For the sake of readability, the lines are wrapped to have at most 100 characters.

Your program gets the necessary file names from the command line: first the name of the sample file, then the name of the file containing the desired sequence will be given. Your program should send a 0 to the standard output, if the sample does not contain the sequence, otherwise the output should be i, if the first occurrence of the sequence in the sample begins at the ith position. (Positions are numbered from 1.)

It can be assumed that the sample has at most 50 million characters, while the sequence consists of at most 1 million characters.

(10 pont)

Deadline expired on 15 June 2005.


7 students sent a solution.
10 points:Engedy Balázs, Treszkai László.
9 points:Deák 666 Áron, Vincze János.
8 points:2 students.
0 point:1 student.

Our web pages are supported by:   Ericsson   Cognex   Emberi Erőforrás Támogatáskezelő   Emberi Erőforrások Minisztériuma   Nemzeti Tehetség Program    
MTA Energiatudományi Kutatóközpont   MTA Wigner Fizikai Kutatóközpont     Nemzeti
Kulturális Alap   ELTE   Morgan Stanley