LOCUS HPV59 7896 bp DNA VRL 10-OCT-1994 DEFINITION Human papilloma virus type 59, complete viral genome. ACCESSION X77858 SOURCE Human papillomavirus type 59. ORGANISM Human papillomavirus type 59 Viridae; ds-DNA nonenveloped viruses; Papovaviridae; Papillomavirus. REFERENCE 1 (bases 1 to 7896) AUTHORS Rho,J., Roy-Burman,A., Kim,H., De Villiers,E.M., Matsukura,T. and Choe,J. TITLE Nucleotide sequence and phylogenetic classification of human papillomavirus type 59 JOURNAL Virology 203, 158-161 (1994) REFERENCE 2 (bases 1 to 7896) AUTHORS Choe,J. TITLE Direct Submission JOURNAL Submitted (25-FEB-1994) to the EMBL/GenBank/DDBJ databases. J. Choe, C/O Hajo Delius, DKFZ - Abt. ATV, Im Neuenheimer Feld 506, 69120 Heidelberg, FRG COMMENT HPV-59 was isolated and cloned from a vulvar intraepithelial neoplasia (Matsukura et al, unpublished data), and also detected in a lesion of the lip. Subcloned fragments of the genome were sequenced. The genome is similar in organization to other HPV genomes. HPV-59 contains an intact E5 ORF. There is an additional small ORF contained inside of the E1 ORF. Similarity and phylogenetic analysis (of the L1 and L2 ORFs) group HPV-59 most closely with HPV-18, HPV-45 and HPV-39. The HPV-59 LCR does not contain a consensus GRE binding site. E2 binding sites found in the LCR of related types are conserved; additional E2 binding sites are found within the L1 and L2 ORFs. Cys-X-X-Cys motifs are found in E6 and E7, as in other types; an additional such motif is found in E5. A putative tissue specific motif in L2 has been noted: T-T-P-A-V/I-L/I-D/N-V/I; an extended motif uniquely associated with the high risk group containing HPV-59 is: T-T/D-P-A-V-L-D-D-I-T-P. A 149 bp segment in the E2 orf of HPV-59 (starting at nt 2970 of the complete sequence) is dissimilar from all other papillomaviruses in an otherwise well-conserved region. The reverse complement of this segment, however, is extremely similar to related papillomaviruses, suggesting that the segment may have resulted from either an error in assembling the sequence or an actual inversion in the cloned genome. In light of this, it is interesting to note that the first six bases at the 5' end of the segment ("CTGCAG") are exactly repeated at the 3' end, and that these two sub-sequences are palindromic. If the inversion was present in the genome of the original isolate, it is likely that the resultant E2 protein was defective, perhaps being a factor in the induction of cancer, since disruption of E2 appears to be a factor in oncogenesis in other types. It should be noted that the inversion does not result in a premature stop in the E2 protein. NCBI gi: 557236 FEATURES Location/Qualifiers source 1..7896 /organism="Human papillomavirus type 59" CDS 55..537 /gene="ORF putative E6" /note="NCBI gi: 557237" /codon_start=1 /translation="MARFEDPTQRPYKLPDLSTTLNIPLHDIRINCVFCKGELQEREV FEFAFNDLFIVYRDCTPYAACLKCISFYARVRELRYYRDSVYGETLEAETKTPLHELL IRCYRCLKPLCPTDKLKHITEKRRFHNIAGIYTGQCRGCRTRARHLRQQRQARSETLV " CDS 542..865 /gene="ORF putative E7" /note="NCBI gi: 557238" /codon_start=1 /translation="MHGPKATLCDIVLDLEPQNYEEVDLVCYEQLPDSDSENEKDEPD GVNHPLLLARRAEPQRHNIVCVCCKCNNQLQLVVETSQDGLRALQQLFMDTLSFVCPL CAANQ" CDS 872..2806 /gene="ORF putative E1" /note="NCBI gi: 557239" /codon_start=1 /translation="MADSEGTDGEGTGCNGWFFVQAIVDKKTGDKISDDEDENATDTG SDLVDFIDDTTTICVQAERETAQALFNVQEAQRDAREMHVLKRKFGCSIENSSEKAAA GKKAKSPLQEISVNVNHPKVKRRLITVPDSGYGYSEVEMLETQVTVENTGNGDSNGSV CSDSQIDCSDSSNMDVENIVPTSPTNQLLQLLHSKNKKAAMYAKFKELYGLSFQDLVR TFKSDRTTCSDWVTAIFGVNPTVAEGFKTLIQPYVLYAHIQCLDCAWGVVILALLRYK CGKNRITVAKGLSTLLHVPDTCMLIEPPKLRSGVAALYWYRTGMSNISEVIGETPEWI QRLTIIQHGVDDSVFDLSEMIQWAFDNDLTDESDIAYEYALIADSNSNAAAFLKSNCQ AKYLKDCAVMCRHYKRAQKRQMSMSQWIKWRCDKIEEGGDWKPIVQFLRYQGVEFITF LCALKDFLKGTPKRNCIVLCGPANTGKSYFGMSLLHFLQGTVISHVNSNSHFWLEPLT DRKLAMLDDATDSCWTYFDTYMRNALDGNPISVDRKHRHLVQIKCPPMLITSNTNPVT DNRWPYLNSRLMVFKFPNKLPFDKNRNPVYTINDRNWKCFFERTWCRLDLNEEEEDAD SDGHPFAAFKCVTGSNIRTL" CDS 2736..3848 /gene="ORF putative E2" /note="NCBI gi: 557240" /codon_start=1 /translation="MQTVMDTLSQRLSVLQDQILEHYENDSKDINEHINYWKLVRMEN VILFAARENNIHTLNHQVVPTFLVSKNKACEAIELQSNRTSTVMPCFLKHFLGAVCHS SWHVSCIVHCSFLNSVCAKLSNAICSKENTMHYTSWTFIYYVNDVGQWCKTTGNVDFW GLYYKVEEEQVYYVKFIHDAKKYGTTDKWEVHYNGKVIDCYDSMCSTSDEQVSTAGSS EQLSYPSATPPEATYLGPQTWNRQTKTGKRPRQCGYTQHPQSTSVSVDYCDNPVVRLH PGNNPRRHIPCSNTTPIIHLKGDKNGLKCLRYRLRKVHWLFENISSTWHWTGNRGSAK TGILTLTYTSETQRNEFLDTVKIPNSVQIHVGYMSV" CDS 3268..3615 /gene="ORF putative E4" /note="NCBI gi: 557241" /codon_start=1 /translation="MMPKNMGLQTSGKCIIMARLLIVMTLCAVPVTSKYPLLDLLSNY HTPPQRPPKPRTWAPKRGTVRRRLESDQDSVDTHSTLSLPACQWTTVTTQSSVCIQAT TRDGTSLAVTLRL" CDS 3908..4129 /gene="ORF putative E5" /note="NCBI gi: 557242" /codon_start=1 /translation="MITLVFVCCVCVCLCVCCNVPLLQSVYMCAYTWLLVFVYIVVIT SSYECFLLYILFFIIPLLLLYAHAILSIQ" CDS 4231..5625 /gene="ORF putative L2" /note="NCBI gi: 557243" /codon_start=1 /translation="MVSHRAARRKRASATDLYKTCKQAGTCPSDVINKVEGTTLADKI LQWTSLGIFLGGLGIGTGSGTGGRTGYIPLGGRTNTIVDVSPAKPPVVIEPVGPTDPS IVTLVEDSSVITSGAPAPTFTGTSGFEISTSSTTTPAVLDITPTSSVQISSSSFINPA FTDPSVIEVPQTGEISGNILISTPTSGAHGYEEIPMQTFATEGTGLEPISSTPNPTVR RVAGPRLYSRANQQVRVSNADFLTRPSTFVTYDNPAYDPIDTTLTFDPSSEVPDPDFM DIVRLHRPALTSRRSTVRFSRLGQRATMFTRSGKQIGARVHFYHDISPIPHAEDIELQ PLVSSQAATDDIYDIYADITDEAPTSTANTAFTIPKSSFQSLSLTRSASSTFSNVTVP LATAWDVPVNTGPDIVLPNTNIVEPTYSTTPFTTIQSINIEGTNYFLWPIYYFLPRKR KRVPYFFTDGSMAF" CDS 5606..7132 /gene="ORF putative L1" /note="NCBI gi: 557244" /codon_start=1 /translation="MALWRSSDNKVYLPPPSVAKVVSTDEYVTRTSIFYHAGSSRLLT VGHPYFKVPKGGNGRQDVPKVSAYQYRVFRVKLPDPNKFGLPDNTVYDPNSQRLVWAC VGVEIGRGQPLGVGLSGHPLYNKLDDTENSHVASAVDTKDTRDNVSVDYKQTQLCIIG CVPAIGEHWTKGTACKPTTVVQGDCPPLELINTPIEDGDMVDTGYGAMDFKLLQDNKS EVPLDICQSICKYPDYLQMSADAYGDSMFFCLRREQVFARHFWNRSGTMGDQLPESLY IKGTDIRANPGSYLYSPSPSGSVVTSDSQLFNKPYWLHKAQGLNNGICWHNQLFLTVV DTTRSTNLSVCASTTSSIPNVYTPTSFKEYARHVEEFDLQFIFQLCKITLTTEVMSYI HNMNTTILEDWNFGVTPPPTASLVDTYRFVQSAAVTCQKDTAPPVKQDPYDKLKFWPV DLKERFSADLDQFPLGRKFLLQLGARPKPTIGPRKRAAPAPTSTPSPKRVKRRKSSRK " BASE COUNT 2473 a 1457 c 1594 g 2372 t ORIGIN 1 gttaagaccg aaaacggtgc atataaaggt agttgaaaag aaaagggcaa cggcatggca 61 cgctttgagg atcctacaca acgaccatac aaactgcctg atttgagcac aacattgaat 121 attcctctgc atgatattcg catcaattgt gtgttttgca aaggggaact gcaagaaaga 181 gaggtatttg aatttgcttt taatgactta tttatagtgt atagagactg tacaccgtat 241 gcagcgtgtc tgaaatgcat ttcattttat gcaagagtaa gagaattaag atattataga 301 gattccgtgt atggagaaac attagaggct gaaaccaaga caccgttaca tgagctgctg 361 atacgctgtt atagatgcct aaaacctcta tgtccaacag ataaattaaa gcatataact 421 gaaaaaagaa gattccataa tatagctgga atatatacag gacagtgtcg tgggtgtcgg 481 acccgagcaa gacacctaag acagcaacga caagcgcgta gtgaaacact ggtgtaaaac 541 aatgcatgga ccaaaagcaa cactttgtga cattgtttta gatttggaac cacaaaatta 601 tgaggaagtt gaccttgtgt gctacgagca attacctgac tccgactccg agaatgaaaa 661 agatgaacca gatggagtta atcatccttt gctactagct agacgagctg aaccacagcg 721 tcacaacatt gtgtgtgtgt gttgtaagtg taataatcaa cttcagctag tagtagaaac 781 ctcgcaagac ggattgcgag ccttacagca gctgtttatg gacacactat cctttgtgtg 841 tcctttgtgt gcagcaaacc agtaacctgc aatggccgat tcggaaggta cagatgggga 901 agggacgggg tgcaatggat ggttttttgt gcaggcaata gtagataaaa aaacaggtga 961 caaaatttca gatgacgagg atgaaaatgc aacagataca ggttcagact tggtagattt 1021 tattgatgat accacaacaa tttgtgtaca ggcagagcgc gagacagcac aggccttgtt 1081 taatgtgcag gaagcccaaa gggatgcacg ggaaatgcat gttttaaaac gaaagtttgg 1141 gtgcagtata gaaaacagta gtgagaaagc ggcggcagga aaaaaagcta agtcaccatt 1201 acaagaaata tcagtaaatg ttaaccaccc aaaagtaaaa agaaggttaa taacagtgcc 1261 agacagcggc tatggctatt ctgaagtgga aatgctcgag actcaggtaa ccgtggagaa 1321 tactggaaat ggggatagca atggcagtgt ttgtagcgac agtcaaatag actgtagcga 1381 cagcagtaac atggatgttg aaaacatagt tccaacatcc cccactaatc aattgttaca 1441 gttattacat agcaaaaata agaaagcagc tatgtatgca aaatttaaag aattgtatgg 1501 gttatcattt caagatttgg ttaggacatt taaaagtgac agaactacct gtagcgattg 1561 ggtaaccgcc atttttggtg ttaatccaac tgtagcagaa ggatttaaaa cattaataca 1621 accctatgtg ctatatgcac atatacaatg cttagattgt gcatggggag tagtaatatt 1681 agcattatta agatataaat gtggaaaaaa tagaataaca gttgcaaaag gacttagcac 1741 attactacat gtaccagata cgtgcatgtt aattgaacca cccaaattgc gtagtggtgt 1801 tgcagcacta tattggtaca gaacaggaat gtccaatatt agtgaagtta taggggaaac 1861 gcccgaatgg atacaaagac taacaattat acaacatgga gttgatgata gcgtgtttga 1921 cctgtcagaa atgatacaat gggcgtttga taatgaccta acagatgaaa gtgatattgc 1981 atatgaatat gcattaatag cagatagtaa tagtaacgcc gctgcatttt taaaaagcaa 2041 ctgccaggca aaatacctaa aagattgtgc agttatgtgt aggcattata aaagagcaca 2101 aaaaagacaa atgagtatgt cacagtggat aaaatggaga tgtgataaaa tagaagaggg 2161 gggagattgg aaacccatag tacaattttt aagatatcaa ggagtagaat ttataacgtt 2221 tttatgtgca ttaaaagatt ttttaaaagg taccccaaaa agaaattgca ttgtgctgtg 2281 tgggccagca aatacaggca agtcatactt tggaatgagc ctgctacatt ttttacaagg 2341 aactgtaatt tcacatgtaa attcaaatag tcacttttgg ctagaacctt taacagatcg 2401 taaattagct atgctagacg atgcaacaga tagttgttgg acatattttg atacatatat 2461 gcgaaatgct ttggatggca atcctataag tgtagataga aagcataggc acctagtaca 2521 aattaaatgt ccaccaatgc ttattacatc aaatacaaat ccagttacag ataacaggtg 2581 gccatattta aatagcagat taatggtatt taaatttcca aacaaattgc catttgacaa 2641 aaatagaaat ccagtatata caattaatga cagaaactgg aaatgttttt ttgaaaggac 2701 gtggtgcaga ttagatttga acgaggaaga ggaagatgca gacagtgatg gacacccttt 2761 cgcagcgttt aagtgtgtta caggatcaaa tattagaaca ttatgaaaac gatagtaaag 2821 acattaatga acacataaac tattggaaac tggtgcgtat ggaaaatgta attttatttg 2881 cagcaagaga gaacaatata catacattaa accaccaggt ggtgccaacg tttttggtgt 2941 ctaaaaacaa ggcatgtgaa gctattgaac tgcagtcaaa ccgtacttcc actgtaatgc 3001 cctgtttttt aaaacatttt ttaggtgctg tttgccatag ttcttggcat gtttcttgca 3061 ttgtccattg ctcattttta aactcagttt gtgccaaact ctctaacgcc atctgcagca 3121 aggaaaacac aatgcattac acaagctgga catttatata ttatgtaaat gatgtaggac 3181 agtggtgtaa aaccacagga aatgtggact tttggggact atattataaa gtggaagagg 3241 aacaggtgta ctatgtaaaa tttatacatg atgccaaaaa atatgggact acagacaagt 3301 gggaagtgca ttataatggc aaggttattg attgttatga ctctatgtgc agtaccagtg 3361 acgagcaagt atccactgct ggatcttctg agcaactatc atacccctcc gcaacgcccc 3421 ccgaagccac gtacttgggc ccccaaacgt ggaaccgtca gacgaagact ggaaagcgac 3481 caagacagtg tggatacaca cagcaccctc agtctaccag cgtgtcagtg gactactgtg 3541 acaacccagt cgtccgtttg catccaggca acaacccgcg acggcacatc ccttgcagta 3601 acactacgcc tataatacac ttaaaaggtg acaaaaatgg ccttaagtgt ttaaggtata 3661 gattaagaaa agtacactgg ttatttgaaa atatttcctc tacctggcat tggacaggaa 3721 acagaggatc agccaaaaca ggcattttaa cattaacata tacaagcgaa acacaacgca 3781 atgaattttt agatactgta aaaattccta atagtgtaca aatacatgtt gggtatatga 3841 gtgtgtaatg gttgttatgc aaatgtaaca caagccaata ctgctgctat attgtatagc 3901 tgaggaaatg ataacccttg tatttgtgtg ttgtgtttgt gtttgcttgt gtgtgtgttg 3961 caatgtcccg cttctgcaat ctgtctatat gtgtgcatat acatggttac tagtatttgt 4021 gtatattgtg gttatcacct cctcatatga gtgtttttta ctatatatat tgttttttat 4081 aattccactg ttactactat atgcccatgc aatactgtcc atacaataat tgctgtatat 4141 tgtaaattac attgcactgt attgtacagt atattttaaa cacattatta tttttgttag 4201 gtgttggttt tgttacattt ataataaaac atggtttccc atcgtgctgc tcgtcgtaaa 4261 cgtgcctcag caacagactt atataaaact tgcaagcagg caggtacatg cccttctgat 4321 gttattaata aagttgaagg tacaacttta gctgataaaa tattgcagtg gaccagccta 4381 ggaatatttt taggtggact aggtattggt actggatctg gtaccggtgg cagaacaggg 4441 tacatacctt taggggggcg tacaaacact atagtagatg tatcgcctgc taaaccacca 4501 gtagttattg aacctgttgg acctacagat ccatctatag ttacattagt tgaggattct 4561 agtgttataa catctggagc ccctgcccca acatttacag gtacttcagg atttgaaata 4621 tctacctcta gtacaacaac accagctgtt ttggatataa ccccaacctc ttctgttcaa 4681 attagtagct ctagttttat aaatcctgca tttacagacc cttctgtcat tgaggttccc 4741 caaacaggtg aaatttctgg taatatatta attagtaccc ctacctctgg tgcacatggc 4801 tatgaagaaa ttccaatgca aacgtttgct acggaaggta ctggtttgga acccattagc 4861 agtaccccca atccaacagt acgtcgtgtg gctggaccta gattgtacag tagggctaat 4921 caacaagttc gggtgtctaa cgctgacttt ttaacacgtc catccacatt tgttacatat 4981 gataaccctg cttatgatcc aattgatact acattaactt ttgacccctc atcagaggtt 5041 ccagacccgg actttatgga tatagttcgt ttgcataggc ctgcattaac atccagacgc 5101 agcactgtaa ggtttagtag gctaggacaa cgggcaacca tgtttacccg tagtggtaaa 5161 caaattgggg cccgtgtaca tttttatcat gatataagcc ctataccaca tgctgaagat 5221 attgaattgc aacctcttgt ttcttcccag gctgctactg atgatatata tgatatatat 5281 gcagatatta cagatgaagc acctactagt actgccaaca ctgcatttac aattcctaaa 5341 tcttcttttc aaagtttgtc attaacacgg tcggcatcta gcaccttttc aaatgtaact 5401 gttcctttgg ctactgcctg ggatgttcct gtaaatacag gacccgatat agttttacct 5461 aatactaata ttgttgaacc cacttattct actacaccct ttaccaccat acagtctatt 5521 aatatagaag gcacaaatta ttttttatgg cctatatatt attttttacc tcgtaaacgt 5581 aaacgtgttc cctatttttt tacagatggc tctatggcgt tctagtgaca acaaggtgta 5641 tctacctcca ccttcggtag ctaaggttgt cagcactgat gagtatgtca cccgtaccag 5701 tattttctac cacgcaggca gttccagact tcttacagtt ggacatccat attttaaagt 5761 acctaaaggt ggtaatggta gacaggatgt tcctaaggtg tctgcatatc aatacagagt 5821 atttagggtt aagttacctg atcccaataa atttggcctt ccagataaca cagtatatga 5881 tcctaactct caacgcttgg tctgggcctg tgtaggtgtt gaaatcggtc ggggccaacc 5941 tttaggggta ggactcagtg gtcatccatt atataataaa ttggatgaca ctgaaaactc 6001 tcatgtagca tctgctgttg ataccaaaga tacacgtgat aatgtatctg tggattataa 6061 acaaactcag ctgtgtatta ttggctgtgt acctgccatt ggagaacact ggacaaaggg 6121 cactgcttgt aagcctacta ctgtggttca gggcgattgt cctccactag aattaataaa 6181 tacaccaatt gaagatggtg atatggtaga cacaggatat ggggctatgg actttaaatt 6241 gttgcaggat aacaaaagtg aagtaccatt ggatatttgt cagtctattt gtaaatatcc 6301 tgattattta caaatgtcag cagatgctta tggagacagt atgttttttt gtttaaggcg 6361 agaacaggtt tttgccagac atttttggaa tagatctggt actatgggtg atcaacttcc 6421 tgaatcacta tatattaaag gtactgacat acgtgccaac ccaggcagtt atttatattc 6481 cccttcccca agtgggtctg tggttacttc tgattcacaa ttatttaata aaccatattg 6541 gctgcacaag gctcagggtt taaacaatgg tatatgttgg cacaatcaat tgtttttaac 6601 agttgtagat actactcgca gcaccaatct ttctgtgtgt gcttctacta cttcttctat 6661 tcctaatgta tacacaccta ccagttttaa agaatatgcc agacatgtgg aggaatttga 6721 tttgcagttt atatttcaac tgtgtaaaat aacattaact acagaggtaa tgtcatacat 6781 tcataatatg aataccacta ttttggagga ttggaatttt ggtgttacac cacctcctac 6841 tgctagttta gttgacacat accgttttgt tcaatctgct gctgtaactt gtcaaaagga 6901 caccgcaccg ccagttaaac aggaccctta tgacaaacta aagttttggc ctgtagatct 6961 taaggaaagg ttttctgcag atcttgatca gtttcctttg ggacgtaaat ttttattgca 7021 attaggagct agacctaagc ccactatagg cccacgcaaa cgtgcagcgc ctgcccctac 7081 ctctacccca tcaccaaaac gtgttaagcg tcgcaagtct tccagaaaat agtgttgttt 7141 gttatgtgtt tgtatgtgtg catgttgtat gttttgtatt gtttgcctgt ttgtatgttg 7201 tgtatatgta catgtttgtt tgtctgctgt atgtgtgtat ttgtttttgt acataataaa 7261 gtatgcatga cagtttcatg tgtggttgca cccaatgagt aaggtactgt ccctttattg 7321 tttctttgtc cttattacac attattacac attgccctac ttacataggt gtgtttgttc 7381 cttcattttg tcctgaatgt ccagttttgc atttgcacat tatatggcgt ccattttatc 7441 ctttaaatcc tccattttgc tgtgcaaccg ttttcggtta ccttggttta accttacctt 7501 tttgaacaat taatctgttt aaacatcagc aaaacagtta atccccatct tgtttcctcc 7561 tacacgccta gactactaac acaacttaca aacgccaaat agttagtcat catcctgtcc 7621 aggtgcactc taacaatact tgcataactt tggtggcgcc cttgttaata aaacagcttt 7681 taggcacata ttttcactgt ttttactact ttaattgcat aattggcttg caaaactact 7741 gtgcaatcca agaatgtgtc tataatttat tgtaaaaaac atgactaagg tttttgtcat 7801 tgttaagcaa ccgaaaaagg tcgggcaagt acatgcacac tttctactta ttacttttta 7861 caatcatagt aataaaaaag ggtgtaaccg aaaacg