Malware Analysis (1/2)securesw.dankook.ac.kr/ISS20-1/STinCS_06_2020_malware... · 2020-03-25 · Malware Detection Anti-virus software (= Virus scanner) typically employ a variety

컴퓨터 보안 특론 (Special Topics in Computer Security)

Malware Analysis (1/2)

조성제 (Cho, Seong-je)

Spring, 2020

Computer Security & OS Lab.

Dankook University

References

A general definition of malware, S. Kramer and J. C. Bradfield, J Comput Virol (2010) 6:105–114

Malware Incident Response - Static Analysis, CIS 6395, Incident Response Technologies, Fall 2016, Dr. Cliff Zou, UCF

Practical Malware Analysis, Kris Kendall and Chad McMillan, Mandiant (Intelligent Information Security, Black Hat

COEN 252 Computer Forensics, Investigating Hacker Tools

CS155: Computer and Network Security (Stanford Univ.)

Introduction to Malware, Murat Kantarcioglu, UT Dallas

Wikipedia

Many slides come from the references above, Please do not replicate, distribute, upload, and post this lecture notes.

2Computer Security & OS Lab, DKU

Contents

Malware Detection

● Signature-based, Fingerprint-based

● Behavior-based

● Heuristic-based

Statistical Structures: Fingerprinting Malware for Classification and Analysis

Malware Analysis

● Static Analysis

● Dynamic Analysis


Computer Virus

Program that inserts itself into one or more files and performs some action

● Insertion phase is inserting itself into file

● Execution phase is performing some (possibly null) action

Pseudocodebeginvirus:

if spread-condition then begin

for some set of target files do begin

if target is not infected then begin

determine where to place virus instructions

copy instructions from beginvirus to endvirus into target

alter target to execute added instructions

end;

end;

end;

perform some action(s)

goto beginning of infected program

endvirus:


Malware Detection

Anti-virus software (= Virus scanner) typically employ a variety of methods to detect malware programs

● Signature-based scanning

− Fingerprinting method

− A virus signature is the fingerprint of a virus. It is a set of unique data, or bits of code, that allow it to be identified. (source: Computer Hope)

− A virus signature is a continuous sequence of bytes that is common for a certain malware sample. (source: Kaspersky.com)

− Anti-virus software uses a virus signature to find a virus in a computer file system, allowing to detect, quarantine, and remove the virus.

− Anti-virus software uses the virus signature to scan for the presence of malicious code.

● Heuristic-based detection

● Behavioral detection


Quarantine: 격리하다. 격리

Fingerprint

Fingerprinting algorithm is a procedure that maps an arbitrarily large data item (such as a computer file) to a much shorter bit string, its fingerprint

● Fingerprint uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes.

● This fingerprint may be used for data deduplication purposes.

● This is also referred to as file fingerprinting, data fingerprinting, or structured data fingerprinting.

File fingerprinting

● unique identifiers for their corresponding data and files

● Fingerprinting does not always work for certain file types, including documents that are encrypted or password protected, images and videos, and data in which the text does not perfectly match a predefined document fingerprint.


https://en.wikipedia.org/wiki/Map_(mathematics)

https://en.wikipedia.org/wiki/Data_(computing)

https://en.wikipedia.org/wiki/File_(computing)

https://en.wikipedia.org/wiki/Bit_(computing)

https://en.wikipedia.org/wiki/Fingerprint

https://en.wikipedia.org/wiki/Data_deduplication

Fingerprint functions

Cryptographic hash functions generally can serve as high-quality fingerprint functions, are subject to intense scrutiny from cryptanalysts, and have the advantage that they are believed to be safe against malicious attacks.


Source: Wikipedia and https://hackaday.com/2015/11/10/your-unhashable-fingerprints-secure-nothing/

• Hash value• One-way• Collision

https://hackaday.com/2015/11/10/your-unhashable-fingerprints-secure-nothing/

File Fingerprinting

As a first step, fingerprint the files you are examining so you will know if they change during analysis

Used md5deep, md5sum, etc.


When you have completed your analysis, or at various points

● It will verify (check) the contents of file

If no changed hello.c: OK

After changing the contents of file md5sum_hello_files.txt, the output will be hello.c: FAILED

Virus Scan

Multiple viruses may have the same virus signature, which allows antivirus programs to detect multiple viruses when looking for a single virus signature.

● Because of this sharing of the same virus signature between multiple viruses, antivirus programs can sometimes detect an unknown virus.

● New viruses have a virus signature that are not used by other viruses, but new "strains" of known virus sometimes use the same virus signature as earlier strains.

● Source: What is a Virus Signature? - Computer Hope

Virus Scan

● Always scan new malware with an up to date virus scanner

● If the code is not sensitive, consider submitting to https://www.virustotal.com/

Strain: (동식물질병등의) 종류[유형]


https://www.virustotal.com/

Malware Defenses (Detection & Analysis)

Distinguish between data, instructions

Limit objects accessible to processes

Inhibit sharing

Detect altering of files

Detect actions beyond specifications

Analyze statistical characteristics

https://personal.utdallas.edu/~muratk/courses/dbsec09s_files/Malware-Intro.pdf


https://personal.utdallas.edu/~muratk/courses/dbsec09s_files/Malware-Intro.pdf

Statistical Structures:Fingerprinting Malware forClassification and Analysis

Daniel Bilar

Wellesley College (Wellesley, MA)

Colby College (Waterville, ME)

Proceedings of Black Hat Federal 2006 (2006).

Computer Security & OS Lab, DKU 11

Why Structural Fingerprinting?

Goal: Identifying and classifying malware

Problem: For any single fingerprint, balance between over-fitting (type II error) and under-fitting (type I error) hard to achieve.

● Type I error: the rejection of a true null hypothesis (FP)

● Type II error: the non-rejection of a false null hypothesis (FN)

FN: The number of benign software cases incorrectly detected as malware

FP: The number of malware cases misclassified as benign software

TP: The number of goodware cases correctly classified

TN: The number of malware cases correctly classified

Approach: View binaries simultaneously from different structural perspectives and perform statistical analysis on these ‘structural fingerprints’


Different Definitions: TP, TN, FP, FN

One definition

● True Positive (TP): Number of correctly identified goodware applications.

● True Negative (TN): Number of correctly identified malware applications.

● False Positive (FP): Number of wrongly identified malware applications.

● False Negative (FN): Number of wrongly identified goodware applications.

Source: “Permission-Based Android Malware Detection”

The other definition

● TP: The number of malware samples correctly classified

− Number of dirty files classifies as dirty

● TN: The number of benign data correctly classified

− Number of clean files classified as clean

● FP: The number of benign samples classified as malicious

− Number of clean files classified as dirty

− The number of benign apps that are incorrectly as malware

● FN: The number of malware samples classified as benign

− Number of dirty files classified as clean

Source: “Selecting Features to Classify Malware”

“PUMA: Permission Usage to detect Malware in Android“


Different Perspectives

Idea: Multiple perspectives may increase likelihood of correct identification and classification


Static + Dynamic

Fingerprint: Opcode frequency distribution

Synopsis: Statically disassemble the binary, tabulate the opcode frequencies and construct a statistical fingerprint with a subset of said opcodes.

Goal: Compare opcode fingerprint across non-malicious software and malware classes for quick identification and classification purposes.

Main result: ‘Rare’ opcodes explain more data variation then common ones


Example of Disassembled Code

Source: https://www.fireeye.com/blog/threat-research/2017/12/recognizing-and-avoiding-disassembled-junk.html

https://www.fireeye.com/blog/threat-research/2017/12/recognizing-and-avoiding-disassembled-junk.html

Goodware: Opcode Distribution

Procedure:1. Inventoried PEs (EXE, DLL, etc) on XP

box with Advanced Disk Catalog

2. Chose random EXE samples with MS Excel and Index your Files

3. Ran IDA with modified InstructionCounter plugin on sample PEs

4. Augmented IDA output files with PEiDresults (compiler) and general ‘functionality class’ (e.g. file utility, IDE, network utility, etc.)

5. Wrote Java parser for raw data files and fed JAMA’ed matrix into Excel for analysis

Inventory: …의목록을만들다.

JAMA is a basic linear algebra package for Java. It provides user-level classes for manipulating real, dense matrices.


Malware: Opcode Distribution

Procedure:1. Booted VMPlayer with XP image

2. Inventoried PEs from Chris Riesmalware collection with Advanced Disk Catalog

3. Fixed 7 classes (e.g. virus, rootkit, etc.), chose random PEs samples with MS Excel and Index your Files

4. Ran IDA with modified InstructionCounter plugin on sample PEs

5. Augmented IDA output files with PEID results (compiler, packer) and ‘class’

6. Wrote Java parser for raw data files and fed JAMA’ed matrix into Excel for analysis

VMPlayer: VMware Workstation Player

The authors did joint work with Chris Ries


Aggregate (Goodware): Opcode Breakdown

20 EXEs

(size-blocked random samples from 538 inventoried EXEs)

~1,520,000 opcodes read

192 out of 398 possible opcodes found

72 opcodes in pie chart account for >99.8%

14 opcodes labelled account for ~90%

Top 5 opcodes account for ~64 %


Aggregate (Malware): Opcode Breakdown

67 PEs

(class-blocked random samples

from 250 inventoried PEs)

~665,000 opcodes read

141 out of 398 possible opcodes found (two undocumented)

60 opcodes in pie chart account for >99.8%

14 opcodes labelled account for >92%

Top 5 opcodes account for ~65%


Top 14 Opcodes: Frequency

20Computer Security & OS Lab, DKURK = Rootkit

Comparison Opcode Frequencies

Perform distribution tests for top 14 opcodes on 7 classes of malware:

● Rootkit (kernel + user)

● Virus and Worms

● Trojan and Tools

● Bots

Investigate: Which, if any, opcode frequency is significantly different for malware?


Top 14 Opcode Testing (z-scores)

Tests suggests opcode frequencyroughly

1/3 same

1/3 lower

1/3 higher

vs

goodware


Top 14 Opcodes Results Interpretation


Rare 14 Opcodes (parts per million)


Rare 14 Opcode Testing (z-scores)

Tests suggestsopcode frequency roughly

1/10 lower

1/5 higher

7/10 same

vs

goodware


Rare 14 Opcodes: Interpretation


Summary: Opcode Distribution

Compare opcode fingerprints against various software classes for quick identification and classification

Malware opcode frequency distribution seems to deviate significantly from non-malicious software

‘Rare’ opcodes explain more frequency variation then common ones


Opcodes: Further directions

Acquire more samples and software class differentiation

Investigate sophisticated tests for stronger control of false discovery rate and type I error

● Type I error (FP)

− FP: The number of malware cases misclassified as benign software

Study n-way association with more factors (compiler, type of opcodes, size)

Go beyond isolated opcodes to semantic ‘nuggets’ (size-wise between isolated opcodes and basic blocks)

Investigate equivalent opcode substitution effects

Nugget: <작지만가치있는생각·사실등> (=snippet)


Fingerprint: Win 32 API calls

Synopsis: Observe and record Win32 API calls made by malicious code during execution, then compare them to calls made by other malicious code to find Similarities

− Dynamic analysis

Goal: Classify malware quickly into a family (set of variants make up a family)

Main result: Simple model yields > 80% correct classification, call vectors seem robust towards different packer


Win 32 API call: System overview

Data Collection: Run malicious code, recording Win32 API calls it makes

Vector Builder: Build count vector from collected API call data and store in database

Comparison: Compare vector to all other vectors in the database to see if its related to any of them


Win 32 API Call: Data Collection

Malware runs for short period of time on VMWare machine, can interact with fake network

API calls recorded by logger, passed on to Relayer

Relayer forwards logs to file, console


Win 32 API Call: Call Recording

Malicious process is started in suspended state

DLL is injected into process’s address space

When DLL’s DllMain() function is executed, it hooks the Win32 API function

Hook records the call’s time and arguments, calls the target, records the return value, and then returns the target’s return value to the calling function.


Function call before hooking Function call after hooking

Win 32 API call: Call Vector

Column of the vector represents a hooked function and # of times called

1200+ different functions recorded during execution

For each malware specimen, vector values recorded to database


Win 32 API call: Comparison

Computes cosine similarity measure csm between vector and each vector in the database


If csm (vector, most similar vector in the database) > threshold vector is classified as member of familymost-similar-vector

Otherwise vector classified as member of familyno-variants-yet

Win 32 API call: Results


Win 32 API call: Packers

Wide variety of different packers used within same families

Dynamic Win 32 API call fingerprint seems robust towards packer

8 Netsky variants in sample, 7 identified


Summary: Win 32 API calls

Allows researchers and analysts to quickly identify variants reasonably well, without manual analysis

Simple model yields > 80% correct classification

Resolved discrepancies between some AV scanners

Dynamical API call vectors seem robust towards different packer


API call : Further directions

Acquire more malware samples for better variant classification

Explore resiliency to obfuscation techniques (substitutions of Win 32 API calls, call spamming)

− Call spamming == Obfuscated API calls

Investigate patterns of ‘call bundles’ instead of just isolated calls for richer identification

Replace VSM with finite state automaton that captures rich set of call relations

● VSM (?)

− Vector Space Model (?), Virtual State Machines (?), Value Stream Mapping (?)


Fingerprint: PDG measures

Program Dependence Graph

● Control dependence

● Data dependence

Synopsis: Represent binaries as a System Dependence Graph, extract graph features to construct ‘graph-structural’ fingerprints for particular software classes

Goal: Compare ‘graph structure’ fingerprint of unknown binaries across non-malicious software and malware classes for identification, classification and prediction purposes

Main result: Work in progress


Program Dependence Graph

A PDG models intra-procedural

Data Dependence:

Program statements compute data that are used by other statements.

Control Dependence:

Arise from the ordered flow of control in a program.


Picture from J. Stafford (Colorado, Boulder)

Fingerprint: PDG measures

For more info., please visit

● https://blackhat.com/presentations/bh-usa-06/BH-US-06-Bilar.pdf

● http://muhaz.org/program-slicing-theory-and-practice-tibor-gyimthy.html


https://blackhat.com/presentations/bh-usa-06/BH-US-06-Bilar.pdf

http://muhaz.org/program-slicing-theory-and-practice-tibor-gyimthy.html

Malware Analysis

Static feature (?)

Dynamic feature (?)

for machine learning

Practical Malware Analysis, K. Kendall & C. McMillan, Mandiant


Program Analysis

Given an executable, how do we find out what it does?● Try to find the program online.

− Analyze source code to find clues.

− Search for the name of the program.

● Perform (source) code review.

● Execute the program in a sandbox.− Some programs can break out of a sandbox / jail.

☞ Program Compilation● Stripping: Removes all human-readable symbols from object code.

− Combats reverse engineering.

● Packing with UPX, etc.

− Compresses code (achieves ratios of 20%~40%)


Cheat Sheet for Analyzing Malicious Software

General Approach

● Set up a controlled, isolated laboratory in which to examine the malware specimen.

● Perform behavioral analysis to examine the specimen’s interactions with its environment.

● Perform static code analysis to further understand the specimen’s inner-workings.

● Perform dynamic code analysis to understand the more difficult aspects of the code.

● If necessary, unpack the specimen.

● Repeat steps 2, 3, and 4 (order may vary) until sufficient analysis objectives are met.

● Document findings and clean-up the laboratory for future analysis.

https://zeltser.com/reverse-malware-cheat-sheet/

Specimen: 견본, 샘플, 표본, (의학검사용)시료


https://zeltser.com/reverse-malware-cheat-sheet/

Static vs. Dynamic Analysis

Static analysis

● Code is Not executed

● Static reverse engineering

− Viewers or Editors for executables: Binary viewer, PEiD, Hex Workshop, …

• Hex Workshop: Hex editor, Sector editor, Base converter and Hex calculator for Windows

− Disassembling, Decompiling

● Autopsy or Dissection of “Dead” Code

Dynamic analysis

● Observing and controlling running (“live”) code

● Dynamic reverse engineering

− Debugging

− Emulator (or VM)

● Ant farm

The fastest path to the Best answers will usually involve a combination of Both


Static Analysis

Static Analysis● Determine the type of executable.

− ELF file in Unix

− PE file (Exe-type) in Windows

− DEX file in Android

● Symbol Extraction:− Use a program like strings to find symbols left in object code.

− Names give hints on program.

− Will not work for stripped files.

Static analysis is Safer

● Since we aren’t actually running malicious code, we don’t have to worry (as much) about creating a safe environment.

● If possible, perform static analysis in a different OS than your malware targets

− Analyst can reduce Risk using Platform Diversity

− IDA Pro for OS X (?)


What is Static Analysis?

Analysis of malware performed without actually executing the rogue code

Analysis can be performed on any platform because you are not intending to run the malware which may be platform specific (e. g., a Win32 executable)

Some questions to be answered include:

◦ What type of file is this ?

◦ (batch file, shell script, Windows executable, Android DEX, Linux ELF, Javascript, etc.)

◦ What does it do?

◦ Does it spread itself via physical media or network resources?

◦ Does it steal, alter, or delete information?

Rogue: {형용사} 무리를떠나사는 (그래서위험할수도있는), {명사} 사기꾼, 범죄자 (=rascal), 악당


http://en.wikipedia.org/wiki/Portable_Executable

http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

https://en.wikipedia.org/wiki/JavaScript

General Procedures of Static analysis

Determine the type of file you are examining, its internal structures (sections and headers)

Review the ASCII and Unicode strings contained within the binary file

Submit the code to a virus program or online scanner such as https://www.virustotal.com;

signature analysis may help determine the name and functionality of the malware

Perform additional online research to determine the malware’s purpose and capabilities


• string_ids: String where code contains (only Address)

• type_ids: Class, Method type container (only Address)

• proto_ids: Class, Method parameter return info (only Address)

• Method, Class,..: real Method, Class, Field, Data container

Source: Inc0gnito 2015 Android DEX Analysis Technique, 김남준

https://www.virustotal.com/

Disassembly

Disassembler:

● Decodes binary machine code into a readable assembly language text

Automated disassemblers can take machine code and “reverse” it to a slightly higher-level

Many tools can disassemble x86 code

● Objdump, Python w/ libdisassemble, IDA Pro

● ILDasm (Microsoft .Net IL disassembler)

But, IDA Pro is what everyone uses

Manual examination of disassembly is somewhat painstaking, slow, and can be hard

● Keep your goals in mind and don’t get bogged down

Bog: 수렁에빠뜨리다[빠지다], 꼼짝못하게하다[되다], 난항하다[하게하다] Bog down: 교착상태에빠지다. 꼼짝못하게하다.



Reverse Engineering Android: Disassembly

Unzip APK & disassemble classes.dex


Source: Reverse Engineering Android: Disassembly & Code Injection, Thanasis Petsas, SYSSEC-Project.eu,

● http://www.syssec-project.eu/m/page-media/158/syssec-summer-school-Android-Code-Injection.pdf

smali/baksmali is an assembler/disassembler for the dex format used by dalvik.

● Baksmali takes a dex file and produces human readable assembly, and smali takes the human readable assembly and produces a dex file.

Apktool is a more general took for unpacking and repacking an apk.

● It uses smali/baksmali under the hood in order to assemble/disassemble the dex file.

● It unpacks the binary resources and binary xml files back into the standard textual format,

http://www.syssec-project.eu/m/page-media/158/syssec-summer-school-Android-Code-Injection.pdf

Decompile, Dump

Decompilers

● Attempt to produce a high-level language source-code-like

representation from a binary.

● Never completely possible because

− The compiler removes some information,

− The compiler optimizes the code.

Executable-dumping

● Dumpbin (MS)

● PEView

● PEBrowse Professional


PEiD (PE iDentifier)

PEiD detects most common packers, cryptors and compilers for PE files.

It can currently detect more than 470 different signatures in PE files.

It seems that the official website (www.peid.info) has been discontinued.

● Hence, the tool is no longer available from the official website but it still hosted on other sites.

Source: https://www.aldeid.com/wiki/PEiD, https://tuts4you.com/e107_plugins/download/download.php?view.398


https://www.aldeid.com/wiki/PEiD

https://tuts4you.com/e107_plugins/download/download.php?view.398

Signature-based malware scanning

● Extremely low false positive (FP) rate

− Probability of mistaking a goodware program for a malware program is very low.

● Less proactive than desired

Most signatures used in existing signature-based malware scanners are hashsignatures, each of which is the hash of a malware file.

● The number of malware samples covered by each hash signature is low – typically one.

One possible solution is to replace hash signatures with string signatures, each of which corresponds to a short, contiguous byte sequence from a malware binary.

● Thus, each string signature can cover many malware les.

● Hancock is an automatic string signature generation system

− It generates high-quality string signatures with minimal FPs and maximal malware coverage.


Source: “Automatic Generation of String Signatures for Malware Detection”, Sep. 2009

Signature-based malware scanning

Good signature on Aug. 2008

● First, it uses 16-bit registers, which is quite rare in goodware.

● Second, it has 8 constants with different, unusual values.


Source: “Automatic Generation of String Signatures for Malware Detection”, Sep. 2009

Strings

Sometimes things are easy

First look at the obvious – strings


Utilities: strings, Bintext, Hex Workshop, IDA Pro

● BinText:

− a file text scanner / extractor that helps find character strings buried in binary files

− It finds ASCII, Unicode and Resource strings in a file

Source: https://www.howtoforge.com/linux-strings-command/

https://www.howtoforge.com/linux-strings-command/

Strings

Be careful about drawing conclusions

There is nothing stopping the attacker from planting strings meant to deceive the analyst

However, strings are a good first step and can sometimes even provide attribution

No Strings may be Attached

● Point-and-click “packers” make it easy for intruders to obfuscate the contents of binary tools

Point-and-click: 마우스로이용가능한


Conducting Web Research

Look at unique strings, email addresses, network info

● But! The intruder/author could be watching for you

Search the web

● Be careful … Google cache != Anonymous

● You might find other victims, or complete analysis

● Don’t forget newsgroups

It helps if you know Chinese (or Russian, or Spanish)

● https://www.google.com/language_tools?hl=en


https://www.google.com/language_tools?hl=en

Dynamic Analysis


Dynamic Program Analysis

Run the program and see what it is doing.

Requires security mechanisms:

● Dedicated machine.

● Not connected to the internet.

● Or: Virtual machine.

− However: Code can recognize whether it is running in VMWare.

• E.g. by the internal MAC addresses, …

Transport malware on a non-writable CD / DVD


Dynamic Analysis

Static analysis will reveal some immediate information

Exhaustive static analysis could theoretically answer any question, but it is slow and hard

Usually you care more about “what” malware is doing than “how” it is being accomplished

Dynamic analysis is conducted by observing and manipulating malware as it runs.

● Analyst needs to intercept communication of program.

− Need to generate a fake network in a safe environment.


Safe Environment

Nice, safe analytical environment wasn’t that important during static analysis

As soon as you run an unknown piece of code on your system, nothing that’s writable can be trusted

In general, we will need to run the program many times

● Snapshots make life easier

− Snapshot:

• 메모리바이트, 하드웨어레지스터, 상태표시기등의모든내용을포함한메모리의현재상태를저장한것

• 과거의한때존재하고유지시킨컴퓨터파일과디렉토리의모임

− A VMware snapshot is a copy of the virtual machine's disk file (VMDK) at a given point in time.

• Snapshots provide a change log for the virtual disk and are used to restore a VM to a particular point in time when a failure or system error occurs.


Creating a Safe Environment

Do not run malware on your computer!

Old and Busted

● Shove several PCs in a room on an isolated network, create disk images, re-image a target machine to return to pristine state

The (not so) new hotness

● Use virtualization to make thinks fast and safe

● VMWare (Workstation, Server [free])

● Parallels (cheap)

● Microsoft Virtual PC (free)

● Xen (free)

Bust: 부수다, 고장내다, busted: (못된짓을하다가) 걸린

Shove: (거칠게) 밀치다[떠밀다], 아무렇게나놓다[넣다]

Pristine: 완전새것같은, 아주깨끗한 (=immaculate), 오염되지않은 (=unspoiled)


Avoidance techniques of malware• Emulator detection• Anti-debugging


It is easier to perform analysis if you allow the malware to “call home” …

However:

● The attacker might change his behavior

● By allowing malware to connect to a controlling server, you may be entering a real-time battle with an actual human for control of your analysis (virtual) machine

● Your IP might become the target for additional attacks

● You may end up attacking other people

End up: 결국 (어떤처지에) 처하게되다



Therefore, we usually do not allow malware to touch the real network

● Use the host-only networking feature of your virtualization platform

● Establish real services (DNS, Web, etc.) on your host OS or other virtual machines

● Use netcat to create listening ports and interact with text-based client

− netcat ( = nc)

• a computer networking utility for reading from and writing to network connections using TCP or UDP.

• It is a feature-rich network debugging and investigation tool.

● Build custom controlling servers as required (usually in a high-level scripting language)

65Computer Security & OS Lab, DKUSource: https://security.stackexchange.com/questions/205802/netcat-reverseshell-hanging-after-connection & Wikipedia

https://en.wikipedia.org/wiki/Transmission_Control_Protocol

https://en.wikipedia.org/wiki/User_Datagram_Protocol

https://security.stackexchange.com/questions/205802/netcat-reverseshell-hanging-after-connection

Virtualization Considerations

Using a virtual machine helps, but …

Set up the “victim” with no network or host-only networking

Your virtualization software is not perfect

Malicious code can detect that it is running in a virtual machine

A 0-day worm that can exploit a listening service on your host OS will escape the sandbox

● Even if you are using host-only networking!


Dynamic Program Analysis

strace, systrace:● Run the programming, but keep track of the system calls that

it makes with parameters.

− More relevant calls (Unix):• open• read• write• unlink• lstat• socket• close

− strace has an option that intercepts all network related calls.

Use fport, netstat, … to determine ports opened by the program.


System Monitoring

What we are after

● Registry Activity

● File Activity

● Process Activity

● Network Traffic

The tools

● SysInternals Process Monitor

− It records information about File system, Registry, and Process/Thread activity

● Wireshark

● + a whole bunch of other stuff


Anti-malware evasion techniques

Malware writers can use anti-reversing techniques.● Eliminate symbolic information.

● Encrypt code.

● Code obfuscation.

− Make HLL constructs difficult to understand.

● Anti-debugger Methods:

− Use the IsDebuggerPresent API to protect against user-level debuggers.

− Use the NTQuerySystemInformation API to determine if a kernel debugger is attached to the system.

− Set a trap flag and check whether it is still there.

• A debugger would “swallow” it.

− Put in bogus bytes over which the code jumps.

• Does not work for all disassemblers.


Summary

Malware Detection

● Anti-malware software = Malware scanner

● Signature(fingerprint)-based detection, Behavioral detection, …

Statistical Structures: Fingerprinting Malware for Classification and Analysis

● Opcode frequency distribution

● API call vector

● Graph structural properties: PDG

Malware Analysis

● Static analysis: Disassemble, Decompile, Hex edit, Binary viewer, …

− Executable file format: PE, ELF, DEX, …

● Dynamic analysis: Debugging on a secure VM

Anti-malware evasion techniques


Documents

Malware Analysis (1/2)securesw.dankook.ac.kr/ISS20-1/STinCS_06_2020_malware... · 2020-03-25 · Malware Detection Anti-virus software (= Virus scanner) typically employ a variety