Cơ chế máy học chẩn đoán virus máy tính
Tóm tắt Cơ chế máy học chẩn đoán virus máy tính: ...´t luaˆ.n cu’a qua´ tr`ınh suy die˜ˆn. 3.2. Phaˆn hoa.ch ba`i toa´n chaˆ’n doa´n virus ma´y t´ınh Du.. a va`o da˘. c tru .ng nhaˆ.n da.ng cu’a ca´c lo´ .p du˜. lieˆ.u, ba`i toa´n chaˆ’n doa´n virus ma´y t´ınh du.o.. c phaˆn tha`nh ca´c ba`i toa´n con, su .’ du. ng ca´c ky˜ thuaˆ.t ho.c tu` . ....c 2 CSDL chu´.a ca´c boot virus da˜ bieˆ´t va` ca´c MTKD sa.ch phoˆ’ bieˆ´n cu’a ca´c HDH. • Cung caˆ´p 2 taˆ.p mie`ˆn (domain theory) di.nh ngh˜ıa ha`nh vi cu’a boot virus va` MTKD sa.ch. Vı´ du. : Bootvirus ← GetMemSize, DecMemSize, SetMemSize, SetMemV i,MovV iCode GetMemSize ← ReadMem,GetV a...´c AV thu.’ nghieˆ.m goˆ`m Norton Anti-virus (NAV), Kaspersky Lab (KL) va` Grisoft Anti-virus (AVG). Taˆ.p du˜ . lieˆ.u X co´ 36178 taˆ.p tin. Ca´ch thu . . c hieˆ.n nhu . sau: - Do tho`.i gian trung b`ınh cu’a ca´c VirusFix (chı’ que´t moˆ.t virus) cu’a moˆ˜i AV. - Do tho`.i gian cha.y trung...
´.p B (Boot record) theo co. cheˆ´ ho.c chı’ daˆ˜n. - Ba`i toa´n 4: chaˆ’n doa´n lo´.p E (Executable files) theo co. cheˆ´ ho.c t`ınh huoˆ´ng. - Ba`i toa´n 5: chaˆ’n doa´n lo´.p A (stand Alone program) theo co. cheˆ´ ho.c quy na.p. Moˆ˜i ba`i toa´n su.’ du. ng co . so.’ du˜. lieˆ.u (CSDL) virus maˆ˜u da˘.c thu` tu .o.ng u´.ng cu’a lo´.p: S = {SA, SB, SC, SD, SE} vo´.i SA, SB, SC , SD va` SE la` CSDL virus maˆ˜u cu’a ca´c lo´ .p; aObject, bObject, cObject, dObject va` eObject la` ca´c dieˆ’m du˜. lieˆ.u trong khoˆng gian chaˆ’n doa´n cu’a moˆ˜i ba`i toa´n, theo thu´ . tu.. do´. 3.3. Ca´c ba`i toa´n chaˆ’n doa´n virus ma´y t´ınh 3.3.1. Ba`i toa´n 1: chaˆ’n doa´n lo´.p virus C-class Virus lo´.p C laˆy nhie˜ˆm ba`˘ng ca´ch che`n hoa˘. c ta.o mo´ .i caˆu leˆ.nh script va`o doˆ´i tu .o.. ng. Go. i: T = {ai, c|i = 32, ..., 127; c ∈ N} la` doˆ´i tu .o.. ng chaˆ’n doa´n. V = {bj, m|i = 32, ..., 127; n ∈ N} la` doˆ´i tu .o.. ng laˆy nhie˜ˆm (virus). trong do´ ai la` taˆ.p ky´ tu . . cu’a T , c la` k´ıch thu .´o.c (soˆ´ ky´ tu.. ) cu’a T, bj la` taˆ.p ky´ tu . . cu’a virus V, m la` k´ıch thu.´o.c cu’a V va` N la` taˆ.p soˆ´ nguyeˆn du .o.ng. T nhie˜ˆm virus V khi va` chı’ khi CO . CHE´ˆ MA´Y HO. C CHAˆ ’N DOA´N VIRUS MA´Y TI´NH 35 V ⊆ T. Go.i SC = {V1, V2, ..., Vn} la` CSDL lo´ .p C. U´ . ng vo´.i moˆ˜i doˆ´i tu.o.. ng chaˆ’n doa´n T , xa´c di.nh: • Tru.`o.ng ho.. p 1: T ⊃ Vi∀i = 1..n, keˆ´t luaˆ.n T nhie˜ˆm virus Vi (tu´ .c la` T = T0 ∪ V ): - Xa´c di.nh T0 = CT (Vi) = T\Vi∀CT (Vi) la` pha`ˆn bu` cu’a Vi trong T - Loa. i bo’ virus: Vi ← {φ}. • Tru.`o.ng ho.. p 2: T = Vi∀i = 1..n, keˆ´t luaˆ.n doˆ´i tu .o.. ng T la` saˆu tr`ınh Vi. Do saˆu tr`ınh khoˆng co´ vaˆ.t chu’ (T0 = {φ}) neˆn thu . . c hieˆ.n Vi ← {φ}. Ba’n chaˆ´t cu’a ba`i toa´n chaˆ’n doa´n C-class la` ho.c ve.t. Tri thu´ .c virus du.o.. c chuyeˆn gia cung caˆ´p du.´o.i da.ng <<Maˆ˜u du˜ . lieˆ.u, Kha˘’ ng di.nh virus>>. Thuaˆ.t gia’ i do .n gia’n, co´ doˆ. phu´ .c ta.p O(n) ty’ leˆ. vo´ .i k´ıch thu.´o.c du˜. lieˆ.u va` soˆ´ maˆ˜u virus co´ trong SC . Tuy nhieˆn thuaˆ.t toa´n khoˆng du.a ra kha˘’ ng di.nh du .o.ng khi co´ virus mo´.i. Do virus text co´ taˆ.p leˆ.nh ha.n cheˆ´ va` ı´t phoˆ’ bieˆ´n neˆn ho.c ve.t la` lu . . a cho.n phu` ho . . p trong giai doa.n hieˆ.n nay. Trong tu .o.ng lai khi lu.o.. ng virus text du’ lo´.n, co´ theˆ’ thay ba`˘ng ca´c moˆ h`ınh ho.c du . . a xa´c suaˆ´t treˆn du˜ . lieˆ.u va˘n ba’n nhu . Nave Bayes. 3.3.2. Ba`i toa´n 2: chaˆ’n doa´n lo´.p virus D-class D− class la` lo´.p ca´c virus macro su.’ du.ng taˆ.p ma˜ leˆ.nh VBA (Visual Basic Application) deˆ’ laˆy nhie˜ˆm treˆn moˆi tru.`o.ng MSOffice [12]. Kha´c vo´.i ca´c macro thoˆng thu.`o.ng thi ha`nh nho`. leˆ.nh Run, ca´c virus macro tu . . thi ha`nh ba`˘ng ca´c thu’ tu. c trigger (nhu . AutoExec). Chı’ co´ ca´c tu. lieˆ.u na`o su .’ du. ng macro mo´ .i co´ nguy co. chu´.a virus macro (Hı`nh 2). Trong moˆ h`ınh ho.c kha´m pha´ tu .o.ng doˆ`ng, ca´c ha`m R nhaˆ.n da.ng co´ da.ng: (Xi = Vi) ∧ ...∧ (Xk = Vk) trong do´ moˆ˜i Xj la` ca´c bieˆ´n, Vj la` ca´c gia´ tri. co´ theˆ’ co´ cu’a ca´c bieˆ´n na`y, ca´c phe´p tuyeˆ’n cu’a nhu˜.ng gia´ tri. co´ theˆ’ co´, hoa˘. c taˆ.p cu’a nhu˜ .ng gia´ tri. na`y. Moˆ.t ha`m R co´ tri. TRUE doˆ´i vo´ .i doˆ´i tu.o.. ng chaˆ’n doa´n dObject khi ca´c gia´ tri. cu’a ca´c bieˆ´n cu’a dObject la` moˆ.t trong nhu˜ .ng ha`m do´. Ngoa`i ra, ha`m tra’ ve`ˆ tri. FALSE. Trong khoˆng gian chaˆ’n doa´n N doˆ´i tu.o.. ng, khi ha`m R nhaˆ.n da.ng nhie`ˆu ho .n moˆ.t doˆ´i tu .o.. ng, taˆ.p con cu’a ca´c gia´ tri. ma` no´ nhaˆ.n da.ng go. i la` du .o.. c nhaˆ.n da.ng bo .’ i R. Ngu.o.. c la. i, cho moˆ. t taˆ.p con ca´c doˆ´i tu.o.. ng, ta co´ theˆ’ ta.o moˆ. t ha`m nhaˆ.n da.ng du .o.. c pha´t sinh bo .’ i taˆ.p con na`y ba`˘ng ca´ch laˆ´y phe´p tuyeˆ’n ca´c gia´ tri. cu’a ca´c bieˆ´n cu’a chu´ng [13]. Trong khoˆng gian SD, heˆ. se˜ xaˆy du . . ng ca´c ha`m R cho moˆ˜i doˆ´i tu .o.. ng dObject. Neˆ´u R nhaˆ.n da.ng du .o.. c Vj (tu .o.ng u´.ng vo´.i nu´t la´ ”Virus macro”), keˆ´t luaˆ.n dObject nhie˜ˆm virus da˜ bieˆ´t: R : (X1 = true) ∧ (X2 = true) ∧ (X3 = true) ∧ (X4 = true) ∧ (X4+i = true) ∀i = 1..n. Ngu.o.. c la. i, co´ theˆ’ keˆ´t luaˆ.n dObject nhie˜ˆm moˆ. t loa. i virus macro mo´ .i. Hı`nh 3a va` 3b moˆ ta’ ca´c luaˆ. t nhaˆ.n da.ng virus macro cu˜ va` mo´ .i theo co. cheˆ´ ho.c tu .o.ng tu.. . Ba`i toa´n chaˆ’n doa´n D − class co´ theˆ’ nhaˆ.n da.ng deˆ´n 98% ca´c macro la. (2% thaˆ´t ba. i do password cu’a ngu.`o.i du`ng). Tuy nhieˆn ky˜ thuaˆ.t na`y khoˆng pha´t hieˆ.n du .o.. c ca´c virus chen giu˜ .a ca´c macro tu.. ta.o. Hu .´o.ng gia’ i quyeˆ´t la` thieˆ´t laˆ.p boˆ. tinh chı’nh luaˆ. t du .´o.i da.ng tu`y cho.n die`ˆu khieˆ’n tra.ng tha´i ca´c meˆ.nh de`ˆ “dObject khoˆng co´ macro tu . . ta.o” va` “Doˆ`ng y´ xo´a macro.” 36 HOA`NG KIE´ˆM, TRU.O.NG MINH NHAˆ. T QUANG H`ınh 2. Phaˆn loa. i tu . lieˆ.u MSOffice va` ca´c ha`m R nhaˆ.n da.ng virus macro 3.3.3. Ba`i toa´n 3: chaˆ’n doa´n lo´.p virus B-class Lo´.p B chu´.a ca´c boot virus laˆy va`o ca´c MTKD treˆn sector da`ˆu tieˆn cu’a toˆ’ chu´.c d˜ıa. Ba`i toa´n chaˆ’n doa´n B − class du.o.. c gia’ i quyeˆ´t theo hu .´o.ng phaˆn t´ıch ha`nh vi [14] nhu. sau: • Toˆ’ chu´.c 2 CSDL chu´.a ca´c boot virus da˜ bieˆ´t va` ca´c MTKD sa.ch phoˆ’ bieˆ´n cu’a ca´c HDH. • Cung caˆ´p 2 taˆ.p mie`ˆn (domain theory) di.nh ngh˜ıa ha`nh vi cu’a boot virus va` MTKD sa.ch. Vı´ du. : Bootvirus ← GetMemSize, DecMemSize, SetMemSize, SetMemV i,MovV iCode GetMemSize ← ReadMem,GetV alue DecMemSize ← SetNewSize,WriteMem(...) • Ta’i bObject va`o khoˆng gian t`ım kieˆ´m la` moˆ. t caˆy nhi. phaˆn co´ nu´t goˆ´c da˘.c ta’ dieˆ’m va`o leˆ.nh. Nha´nh bieˆ’u die˜ˆn ca´c leˆ.nh tua`ˆn tu . . . Nu´t con la` ca´c leˆ.nh re˜ hu .´o.ng va` nha’y. Nu´t la´ la` ca´c dieˆ’m du`.ng. Ca´c leˆ.nh la˘.p xu .’ ly´ nhu. leˆ.nh tua`ˆn tu . . va`o-ra treˆn caˆy con cu. c boˆ. (Hı`nh 4). • A´p du. ng thuaˆ.t gia’ i t`ım kieˆ´m, thu thaˆ.p ca´c ha`nh vi cu’a bObject va`o danh sa´ch ta´c vu. : - Neˆ´u danh sa´ch pha’n a´nh da`ˆy du’ ca´c moˆ ta’ cu’a taˆ.p mie`ˆn thu´ . nhaˆ´t, thoˆng ba´o t`ınh tra.ng nhie˜ˆm virus cu’a bObject, xu.’ ly´ beˆ.nh, ba´o ca´o keˆ´t qua’ , keˆ´t thu´c qua´ tr`ınh. - Neˆ´u danh sa´ch pha’n a´nh ca´c moˆ ta’ cu’a taˆ.p mie`ˆn thu´ . hai, keˆ´t luaˆ.n bObject an toa`n. - Ngoa`i ra, bObject co´ t`ınh tra.ng baˆ´t thu .`o.ng (virus mo´.i, sector ho’ng, di.nh da.ng la. ...). • Keˆ´t thu´c qua´ tr`ınh, caˆ.p nhaˆ.t thoˆng tin doˆ´i tu .o.. ng va`o CSDL tu .o.ng u´.ng. So vo´.i moˆ h`ınh ma.ng no .ron [7], chaˆ’n doa´n boot virus theo co. cheˆ´ ho.c chı’ daˆ˜n co´ toˆ´c doˆ. nhanh (tu.o.ng du.o.ng tho`.i gian kho.’ i doˆ.ng d˜ıa me`ˆm troˆ´ng) va` ch´ınh xa´c ho .n (nhaˆ.n da.ng 96% boot virus la. ) [15]. Tuy nhieˆn phu .o.ng pha´p na`y co´ nhu.o.. c dieˆ’m la` phu´ .c ta.p trong ca`i da˘.t [16]. CO . CHE´ˆ MA´Y HO. C CHAˆ ’N DOA´N VIRUS MA´Y TI´NH 37 H`ınh 4. Caˆy chı’ thi. nhi. phaˆn t`ım kieˆ´m 3.3.4. Ba`i toa´n 4: chaˆ’n doa´n lo´.p virus E-class Lo´.p E − class chu´.a ca´c loa. i virus ghe´p ma˜ va`o taˆ.p thi ha`nh [17]. MAV gia’ i quyeˆ´t ba`i toa´n na`y ba`˘ng moˆ h`ınh AMKBD (Association Model of Knowledge Base and Database) [18]. Su.’ du. ng CSDL (chu´ .a thoˆng tin doˆ´i tu.o.. ng chaˆ’n doa´n) va` CSTT (chu´ .a taˆ.p luaˆ. t nhaˆ.n da.ng virus), co. cheˆ´ suy luaˆ.n chaˆ’n doa´n virus lo´ .p E nhu. sau: - Doˆ´i vo´.i taˆ.p du˜ . lieˆ.u la. , kieˆ’m tra beˆ.nh cu˜, ghi nhaˆ.n thoˆng tin va`o CSDL “hoˆ` so . beˆ.nh a´n”. - Khi da˜ co´ thoˆng tin, thu.`o.ng xuyeˆn gia´m sa´t coˆ.ng doˆ`ng ve`ˆ ma˘. t “veˆ. sinh di.ch teˆ’”. - Khi co´ ca´ theˆ’ la. xuaˆ´t hieˆ.n, kieˆ’m tra doˆ´i tu .o.. ng deˆ’ ha.n cheˆ´ vieˆ.c nhie˜ˆm beˆ.nh tu` . beˆn ngoa`i. - Khi co´ di.ch virus, chı’ ca`ˆn kieˆ’m tra tu` .ng ca´ theˆ’ xem co´ ma˘´c beˆ.nh mo´ .i hay khoˆng. - Khi pha´t hieˆ.n beˆ.nh mo´ .i, phu. c hoˆ`i t`ınh tra.ng cho ca´ theˆ’ tu` . CSDL hoˆ` so. beˆ.nh a´n. Deˆ’ ba’o veˆ. heˆ. thoˆ´ng trong tho` .i gian thu.. c, MAV su .’ du. ng co . cheˆ´ da ta´c tu.’ (multi-agent mechanism) vo´.i hai ta´c tu.’ . Ta´c tu.’ Canh pho`ng (Autoprotect Agent) cha.y thu .`o.ng tru.. c o .’ mu´.c ne`ˆn sau (background) nha`˘m do´n ba˘´t ca´c t`ınh huoˆ´ng pha´t sinh treˆn doˆ´i tu.o.. ng. Ta´c tu .’ Duyeˆ.t que´t (Scanning Agent) cha.y o .’ mu´.c ne`ˆn tru.´o.c (foreground) co´ nhieˆ.m vu. duyeˆ.t taˆ.p du˜ . lieˆ.u. Ca’ hai ta´c tu.’ su.’ du. ng chung doˆ.ng co . suy die˜ˆn, lieˆn la.c nhau theo co . cheˆ´ truye`ˆn thoˆng dieˆ.p [19]. 4.3.34. Trong die`ˆu kieˆ.n ly´ tu .o.’ ng, phu.o.ng pha´p na`y co´ theˆ’ pha´t hieˆ.n deˆ´n 99% file virus la. . Tuy nhieˆn khi AMKBD ca’nh ba´o, heˆ. se˜ gaˆy boˆ´i roˆ´i cho ngu .`o.i du`ng ı´t kinh nghieˆ.m. 3.3.5. Ba`i toa´n 5: chaˆ’n doa´n lo´.p virus A-class Lo´.p A − class chu´.a ca´c trojan horse/saˆu tr`ınh nhu. germs, dropper, injector, rootkit, intruder, zombie... Nhaˆ.n da.ng ma˜ doˆ.c (malware) la` vaˆ´n de`ˆ mo .’ cu’a ca´c anti-virus hieˆ.n nay [20]. Nhieˆ.m vu. cu’a ba`i toa´n la` kieˆ’m tra doˆ´i tu .o.. ng M co´ pha’ i la` ma˜ doˆ. c hay khoˆng. Neˆ´u khoˆng, heˆ. pha’ i du . . ba´oM co´ kha’ na˘ng thuoˆ.c nho´m virus na`o khoˆng, ty’ leˆ. ma˜ doˆ.c la` bao nhieˆu. Go.i wRate ∈ (0, 1] la` ty’ leˆ. ma˜ doˆ.c cu’a M ; λ ∈ [0, 1] la` ha`˘ng soˆ´ ngu .˜o.ng an toa`n cho tru.´o.c. Da`ˆu tieˆn, ta´ch CSDL A tha`nh ca´c nho´m f theo traˆ. t tu . . cha-con treˆn caˆ´u tru´c du˜ . lieˆ.u V − tree [21]. Sau do´, a´p du.ng nguyeˆn ly´ TF-IDF [22], bieˆ’u die˜ˆn M du .´o.i da.ng vecto . ta`ˆn suaˆ´t tu`. F (M) su.’ du. ng moˆ h`ınh khoˆng gian vecto ., trong do´ moˆ˜i tha`nh pha`ˆn F (M,w) da˘.c ta’ soˆ´ la`ˆn tu`. w xuaˆ´t hieˆ.n trong M . Tieˆ´p theo, bieˆ’u die˜ˆn moˆ˜i virus trong CSDL A du .´o.i da.ng vecto . ta`ˆn suaˆ´t tu`. di = (wi1, wi2, ..., wiv), roˆ`i a´nh xa. ca´c vecto . na`y va`o ma traˆ.n 2 chie`ˆu tu` . - ta`i lieˆ.u (word-document matrix). Moˆ˜i ha`ng ma traˆ.n tu .o.ng u´.ng vo´.i boˆ. du˜ . lieˆ.u maˆ˜u cu’a virus da˜ “tu`. ho´a” (to word), moˆ˜i coˆ.t tu .o.ng u´.ng vo´.i moˆ. t tu` . duy nhaˆ´t. Mu. c tieˆu la` xa´c di.nh tro.ng soˆ´ W (f, w) trong tu`.ng taˆ.p f deˆ’ t´ınh doˆ. doˆ`ng da.ng du˜ . lieˆ.u (similarity measure) cu’a M vo´ .i ca´c 38 HOA`NG KIE´ˆM, TRU.O.NG MINH NHAˆ. T QUANG taˆ.p f theo coˆng thu´ .c: SIM(M, f) = ∑ w∈M F (M,w)W (f, w) min( ∑ w∈M F (M,w), ∑ w∈M W (f, w)) . Ca´c da. i lu .o.. ng du`ng t´ınh toa´n SIM du .o.. c di.nh ngh˜ıa trong Ba’ng 1. Sau khi cho.n du .o.. c f (co´ SIM caonhaˆ´t), t´ınh ty’ leˆ. ma˜ doˆ.c cu’aM so vo´ .i ca´c maˆ˜u trong f : wRatei(M, vi) = FF (vi, w) ∀vi la` maˆ˜u thu´ . i trong taˆ.p f . Ba’ng 1. Ca´c da. i lu .o.. ng t´ınh toa´n theo nguyeˆn ly´ TD-IDF Cuoˆ´i cu`ng, cho.n maˆ˜u co´ wRatei lo´ .n nhaˆ´t. Neˆ´u: - wRate = 1, keˆ´t luaˆ.n M la` ma˜ doˆ.c. - wRate ≥ λ, du.. ba´o M chu´ .a (wRate× 100)% ma˜ doˆ.c. 4. KEˆ´T QUA’ THU . . C NGHIEˆ.M 4.1. Thu.’ nghieˆ.m toˆ´c doˆ. thu . . c thi cu’a MAV Cu`ng vo´.i MAV, ca´c AV thu.’ nghieˆ.m goˆ`m Norton Anti-virus (NAV), Kaspersky Lab (KL) va` Grisoft Anti-virus (AVG). Taˆ.p du˜ . lieˆ.u X co´ 36178 taˆ.p tin. Ca´ch thu . . c hieˆ.n nhu . sau: - Do tho`.i gian trung b`ınh cu’a ca´c VirusFix (chı’ que´t moˆ.t virus) cu’a moˆ˜i AV. - Do tho`.i gian cha.y trung b`ınh cu’a moˆ˜i AV hoa`n chı’nh (co´ soˆ´ virus xa´c di.nh). - T´ınh toˆ´c doˆ. que´t trung b`ınh cu’a moˆ˜i AV trong die`ˆu kieˆ.n chuaˆ’n (DKC). Doˆ´i vo´.i moˆ˜i anti-virus thu.’ nghieˆ.m, go. i: - Vc la` soˆ´ maˆ˜u tin trong CSDL virus. - T0 la` tho` .i gian (giaˆy) que´t toa`n boˆ. taˆ.p X trong tru .`o.ng ho.. p Vc = 1. - T la` tho`.i gian (giaˆy) que´t toa`n boˆ. taˆ.p X trong tru .`o.ng ho.. p Vc > 1. - T1 la` tho` .i gian trung b`ınh (giaˆy) chaˆ’n doa´n moˆ. t virus treˆn taˆ.p X : T1 = T/Vc. - T2 la` tho`.i gian trung b`ınh (giaˆy) chaˆ’n doa´n moˆ. t maˆ˜u tin trong CSDL: T2 = (T − T0)/(Vc − 1) - Ve la` soˆ´ maˆ˜u tin trong CSDL virus o .’ DKC. - Ce la` dung lu .o.. ng (KB) du˜ . lieˆ.u trong DKC. - Te la` tho`.i gian (giaˆy) chaˆ’n doa´n trong DKC: Te = T + (Ve − Vc)× T2. - Se la` toˆ´c doˆ. (KB/giaˆy) do du .o.. c trong DKC: Se = Ce/Te. DKC cho Ve = 2.000;Ce = 10.000.000 KB. Keˆ´t qua’ thu . . c nghieˆ.m trong Ba’ng 2 va` Hı`nh 5. CO . CHE´ˆ MA´Y HO. C CHAˆ ’N DOA´N VIRUS MA´Y TI´NH 39 Ba’ng 2. Keˆ´t qua’ thu.’ nghieˆ.m toˆ´c doˆ. ca´c AV trong die`ˆu kieˆ.n chuaˆ’n Anti-virus T0(s) T1 (s) T2 (s) T (s) Te (s) Se (KB/s) MAV 195 0.498 0.1987 324 592.245 16884.9 NAV 196 0.699 0.5748 1095 1345.038 7434.734 AVG 337 3.897 2.6259 1025 5586.188 1790.129 KL 390 5.918 2.7704 728 5928.041 1686.898 H`ınh 5. So sa´nh toˆ´c doˆ. ca´c AV thu .’ nghieˆ.m trong die`ˆu kieˆ.n chuaˆ’n 4.2. Thu.’ nghieˆ.m hieˆ.u qua’ nhaˆ.n da.ng virus cu’a MAV Trong thu.’ nghieˆ.m na`y, ca´c AV tham gia goˆ`m NAV, VirusScan (McAfee) va` Bit Defender. Khoˆng gian quan sa´t goˆ`m 35178 teˆ.p du˜ . lieˆ.u va` 1000 maˆ˜u virus. Keˆ´t qua’ MAV va` BitDef pha´t hieˆ.n 957 va` 959 virus, NAV va` Scan la` 907 va` 906 virus (Ba’ng 3). Doˆ. du . . ba´o cu’a ca´c AV la` ty’ soˆ´ cu’a soˆ´ ca’nh ba´o vo´.i hieˆ.u cu’a soˆ´ virus thu .’ nghieˆ.m va` soˆ´ pha´t hieˆ.n ch´ınh xa´c: Proactivedetection = Proaction/(V iruses −Detections) Ba’ng 3. Keˆ´t qua’ thu.’ nghieˆ.m hieˆ.u qua’ nhaˆ.n da.ng cu’a ca´c anti-virus AV Soˆ´ virus Phieˆn ba’n Ca’nh ba´o Ch´ınh xa´c Bo’ so´t Du.. ba´o Doˆ. du . . ba´o (%) NAV 72020 9.05.15 907 889 93 18 16.22 Scan N/A 4.0.4682 906 877 94 29 23.57 BitDef 253993 7.05450 959 925 41 34 45.33 MAV 890 N/A 957 483 43 474 91.68 Ba’ng 4. Hieˆ.u qua’ du . . ba´o virus la. cu’a MAV phu. thuoˆ. c va`o heˆ. soˆ´ λ λ Du.. Ty’ leˆ. Nha`ˆm Ty’ leˆ. nha`ˆm λ Du . . Ty’ leˆ. Nha`ˆm Ty’ leˆ. nha`ˆm % ba´o du.. ba´o (%) (%) % ba´o du . . ba´o (%) (%) 100 474 91.68 0 0 89 495 95.74 1 0.003 98 476 92.07 0 0 87 496 95.94 2 0.006 96 480 92.84 0 0 84 496 95.94 6 0.017 95 482 93.23 0 0 81 496 95.94 9 0.025 93 488 94.39 0 0 79 497 96.13 10 0.028 90 495 95.74 0 0 75 497 96.13 13 0.036 Khi gia’m λ, doˆ. du . . ba´o cu’a MAV toˆ´t ho .n nhu.ng cu˜ng ta˘ng ru’ i ro pha´t hieˆ.n nha`ˆm (Ba’ng 4). Keˆ´t qua’ thu.’ nghieˆ.m cho thaˆ´y vo´ .i CSDL khieˆm toˆ´n, MAV vaˆ˜n co´ theˆ’ pha´t hieˆ.n soˆ´ virus tu.o.ng du.o.ng vo´.i ca´c pha`ˆn me`ˆm co´ soˆ´ virus caˆ.p nhaˆ. t nhie`ˆu ho .n vo´.i ty’ leˆ. du . . ba´o virus mo´ .i treˆn 91%. Khi λ = 0, 9, ty’ leˆ. na`y la` 95,74%, MAV se˜ da.t hieˆ.u qua’ du . . ba´o virus la. toˆ´t nhaˆ´t. 40 HOA`NG KIE´ˆM, TRU.O.NG MINH NHAˆ. T QUANG 5. KEˆ´T LUAˆ. N VA` HU . O´ . NG PHA´T TRIEˆ ’ N Nhaˆ.n di.nh ba’n chaˆ´t hoa.t doˆ.ng cu’a anti-virus va` virus ma´y t´ınh la` cuoˆ.c daˆ´u tr´ı giu˜ .a ca´c chuyeˆn gia anti-virus va` hacker, chu´ng toˆi vaˆ.n du. ng ca´c nguyeˆn ly´ co . ba’n cu’a khoa ho.c tr´ı tueˆ. nhaˆn ta.o deˆ’ xaˆy du . . ng moˆ. t heˆ. pho`ng choˆ´ng virus ma´y t´ınh hu .´o.ng tieˆ´p caˆ.n ma´y ho.c. A´p du. ng chieˆ´n thuaˆ.t “chia deˆ’ tri.”, ba`i toa´n nhaˆ.n da.ng virus ma´y t´ınh du .o.. c gia’ i quyeˆ´t tu` .ng pha`ˆn ba`˘ng ca´c ba`i toa´n ho.c tu` . do.n gia’n deˆ´n phu´.c ta.p. Trong moˆ˜i ba`i toa´n, ca´c moˆ h`ınh ho.c du .o.. c lu.. a cho.n phu` ho . . p vo´ .i da˘.c dieˆ’m va` t`ınh h`ınh laˆy nhie˜ˆm o .’ theˆ´ gio´.i thu.. c. Keˆ´t qua’ thu . . c nghieˆ.m chu´.ng to’ tieˆ´p caˆ.n ma´y ho.c kha´ th´ıch ho . . p cho ba`i toa´n nhaˆ.n da.ng virus ma´y t´ınh. Sa˘´p to´.i, chu´ng toˆi se˜ a´p du. ng ly´ thuyeˆ´t mo` . deˆ’ ca’ i thieˆ.n doˆ. du . . ba´o ba`˘ng ca´ch ho.c ca´c gia´ tri. t´ıch lu˜y cu’a ha`˘ng soˆ´ λ. Tu` . nhu˜.ng keˆ´t qua’ bu.´o.c da`ˆu na`y, chu´ng toˆi se˜ tieˆ´p tu. c nghieˆn cu´ .u ca´c gia’ i pha´p keˆ´ thu`.a tri thu´.c tu`. ca´c heˆ. anti-virus kha´c, hu .´o.ng deˆ´n mu. c tieˆu pha´t trieˆ’n MAV tha`nh heˆ. t´ıch ho . . p tri thu´ .c chuyeˆn gia trong l˜ınh vu.. c nhaˆ.n da.ng thoˆng minh virus ma´y t´ınh. TA`I LIEˆ. U THAM KHA ’O [1] E.H. Spafford, Computer viruses as artificial life, Journal of Artificial Life, 1994. [2] M. Bordera, “The Computer Virus War. Is The Legal System Fighting or Surrendering?” Com- puter and Law, University of Buffalo School of Law, 1997. [3] Peter Szor, The Art of Computer Virus Research and Defense, Addison Wesley Professional, ISBN 0-321-30454-3, February 03, 2005. [4] R.W. Lo, K. N. Levitt, R.A. Olsson, MCF: a malicious code filter, Computer & Security 14 (6) (1995) 541–566. [5] Jeffrey O. Kephart and William C. Arnold, Automatic extraction of computer virus signatures, Proceedings of the 4th Virus Bulletin Conference, Jersey - England, October 1994 (178–184). [6] Eugene H. Spafford, “The Internet worm program: an analysis. Technical Report CSD-TR-823,” Department of Computer Science, Purdue University, 1998. [7] Gerald Tesauro, Jeffred O. Kephart, Gregory B. Sorkin, Neural networks for computer virus recognition, IEEE Expert 11 (4) (August 1996) 5–6. [8] WilliamArnold, Gerald Tesauro, Automatically generated Win32 heuristic virus detection, Pro- ceedings of the 2000 International Virus Bulletin Conference, Orlando-USA, September 2000. [9] Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo, Data mining methods for detection of new malicious executables, Proceedings of IEEE Symposium on Security and Privacy, Oakland, CA. May 2001. [10] Hoa`ng Kieˆ´m, Doˆ˜ Va˘n Nho.n, Doˆ˜ Phu´c, Gia´o tr`ınh ca´c heˆ. co . so.’ tri thu´.c, NXB DHQG Tp. Hoˆ` Ch´ı Minh, 2002. [11] Hoang Kiem, Nguyen Thanh Thuy, Truong Minh Nhat Quang, A machine learning approach to anti-virus system, Joint Workshop of Vietnamese Society of AI, SIGKBS-JSAI, ICS-IPSJ and IEICE-SIGAI on Active Mining, Hanoi-VN, 4-7 Dec. 2004, (61–65). [12] Vesselin Bontchev, Solving the VBA upconversion problem, Virus Bulletin Conference, Ox- fordshire, England, 2000. CO . CHE´ˆ MA´Y HO. C CHAˆ ’N DOA´N VIRUS MA´Y TI´NH 41 [13] Nguye˜ˆn Dı`nh Thu´c, Tr´ı tueˆ. nhaˆn ta. o - Ma´y ho. c, NXB Lao doˆ.ng Xa˜ hoˆ. i, 2002. [14] Nguye˜ˆn Thanh Thu’y, Tru.o.ng Minh Nhaˆ.t Quang, Ca´c co . cheˆ´ chaˆ’n doa´n virus tin ho.c thoˆng minh du.. a treˆn tri thu´ .c, Ta. p ch´ı Tin ho. c va` ie`ˆu khieˆ’n 14 (2) (1998) 45–52. [15] Nguyen Thanh Thuy, Truong Minh Nhat Quang, A global solution to anti-virus systems, The Proceedings of the 1st International Conference on Advanced Communication Technology, Muju-Korea, 10-12 February 1999 (374–377). [16] Nguye˜ˆn Thanh Thu’y, Tru.o.ng Minh Nhaˆ.t Quang, Ma´y a’o, coˆng cu. hoˆ˜ tro . . chaˆ’n doa´n va` dieˆ.t virus tin ho.c thoˆng minh, Ta. p ch´ı Tin ho. c va` ie`ˆu khieˆ’n 16 (2) (2000) 37–40. [17] M. Pietrek, Windows 95 System Programming Secrets, IDG Books, 1995. [18] Truong Minh Nhat Quang, Hoang Van Kiem, Nguyen Thanh Thuy, Association model of knowl- edge base and database in machine learning anti-virus system, The Proceedings of the WMSCI 2006 Conference, Florida-USA, July 2006 (277–282). [19] Truong Minh Nhat Quang, Hoang Trong Nghia, A multi-agent mechanism in machine learning approach to anti-virus system, The 2nd Symposium on Agents and Multi-Agent Systems, KES-AMSTA’08, Korea. Springer LNAI, Vol. 4953, (743–752). [20] Ian Waller, Controled worm replication - ‘Internet-In-A-Box’, Virus Bulletin Conference, Ox- fordshire, England, 2000. [21] Tru.o.ng Minh Nhaˆ.t Quang, Hoa`ng Kieˆ´m, Nguye˜ˆn Thanh Thu’y, U´ . ng du.ng Ma´y ho.c va` Heˆ. chuyeˆn gia trong phaˆn loa.i va` nhaˆ.n da.ng virus ma´y t´ınh, Ta. p ch´ı Coˆng ngheˆ. Thoˆng tin va` Truye`ˆn thoˆng (19) (2-2008) 93–101. [22] J.A. Black, N. Ranjan, Automated event extraction from email, “Final Report of CS224N/ Ling237 Course in Stanford” ( Nhaˆ. n ba`i nga`y 15 - 10 - 2007 Nhaˆ. n la. i sau su .’ a nga`y 14 - 1 -2008
File đính kèm:
- co_che_may_hoc_chan_doan_virus_may_tinh.pdf