Project7-Faculty2

November 16th, 2009 by zss93
  • predicted number of hours to complete: 10
  • actual number of hours to complete: 50

Faculty2

November 1st, 2009 by zss93
  • predicted number of hours to complete: 10
  • actual number of hours to complete: 20

Faculty1

October 18th, 2009 by zss93
  • predicted number of hours to complete: 3
  • actual number of hours to complete: 8

Exploring Python’s map()

October 12th, 2009 by zss93

As Dr. Glenn showed us the other day, the performance of map() could be superior to other methods when trying for 100 times to create a list of the square root of 10,000 values. The other methods that competed against map() to comprehend this list were (taken from classpage linked above):

def for_function () :
    l = []
    for v in xrange(s) :
        l.append(math.sqrt(v))
    return l

def list_comprehension_function () :
    return [math.sqrt(v) for v in xrange(s)]

def map_function () :
    return map(math.sqrt, xrange(s))

def generator_function () :
    return list((math.sqrt(v) for v in xrange(s)))

Here are the actual results for running these functions:

"""
Performance.py
2.6.2 (r262:71600, Jul 28 2009, 14:05:43)
[GCC 4.2.2]

for_function
0.82852602005

list_comprehension_function
0.580744028091

map_function
0.353476047516

generator_function
0.658768892288

Done.
"""

I wanted to know why map was faster, so I explored that a little bit. The map() function is faster because it is compiled into C code and then natively ran on the machine. Intuitively this means that it will run faster because we are porting the code to a lower level instead of interpreting it first and running then on a higher abstraction mechanism. The other three methods specified above to construct the equivalent list are only being interpreted and hence will run slower in most cases.

Under Python’s performance wiki (Loops section), I found some documentation about performance and loops:

If the body of your loop is simple, the interpreter overhead of the for loop itself can be a substantial amount of the overhead. This is where themap function is handy. You can think of map as a for moved into C code. The only restriction is that the “loop body” of map must be a function call.

After reading this paragraph above, I have decided to write a little method that uses map but I make map take a lambda:

def map_lambda_function () :
    return map(lambda x:math.sqrt(x), xrange(s))

I ran this against the other functions and I got the results:

Performance.py
2.6.2 (r262:71600, Jul 28 2009, 14:05:43)
[GCC 4.2.2]

for_function
1.85488390923

list_comprehension_function
1.26931381226

map_function
1.26931381226

map_lambda_function
2.04882907867

generator_function
1.48617100716

As shown above, the passing of a lambda (and not a regular function) to map makes it much less faster, and this case it was the slowest of all methods. I hope this will be beneficial to whoever reads it. Peace.

Database Design - Keys

October 7th, 2009 by zss93
From http://www.tomjewett.com/dbdesign/dbdesign.php?page=keys.php:
  • A super key is any set of attributes whose values, taken together, uniquely identify each row of a table.
  • A primary key is the specific super key set of attributes that we picked to serve as the unique identifier for rows of this table.
  • A candidate key is a minimal super key.
  • When designing a table, you need to design it so that you would have at least one candidate key. If you don’t reach a state where this is true, then your design is most likely flawed.
  • Generally pick one of the candidate keys to be a primary key. Otherwise, you would not be space efficient.
  • A surrogate primary key is a single, small (usually a number) that doesn’t have any descriptive value.
  • A substitute key is a single, small attribute that has at least some descriptive value.

Project 4: MatLab

October 4th, 2009 by zss93
  • predicted number of hours to complete: 5
  • actual number of hours to complete: 7
  • TestMatLab.out

Python and Security

October 4th, 2009 by zss93

Introduction:

You might be thinking: the security of an algorithm depends on the programmer’s ability to write secure code. This statement is absolutely true and cannot be underestimated. Yet, I believe that an extensive number of secure tools and certain design features in a programming language will make a programmer’s life easier in securing his or her code. This article will try to focus on some interesting Python design features and tools (built-in functionality and provided standard libraries) and see how they affect the security of coding in Python, in general.

A brief history of security in Python:

  • In 1995, a module called rexec (for restricted execution) was introduced. It’s purpose was to provide an enforcement of a security policy for running Python scripts. It provided a way for the safe execution of code by isolation (sandboxing) and provided ways for controlling the global namespace.
  • In 2003, when Python’s 2.3 version was released, this module was dropped and hasn’t been replaced with a built-in module that covers the same functionality since then.

The problems with the current model:

  • Unsafe code execution possible: with the absence of a module enforcing a security policy, malicious code can be executed on a machine running a bare Python interpreter. Fundamentally, there’s no way for the interpreter to differentiate between different sets of codes running (Brett Cannon and Eric Wohlstadter). So imagine a set of code U (Unsafe) which contains code that you just downloaded off the Internet which contains a lot of useful code but 5 lines of code that modify some resources on your file system in some bad way. It would be impossible to restrict the interpreter’s reach for these resources from within the interpreter. This could be devastating. Furthermore, the interpreter allows the importing of compiled Python code (more on this below).
  • Importing compiled Python code is plausible (and dangerous): Let’s examine the previous point for a second. To import code in a compiled format means that you don’t necessarily have access to an easily readable code to check before executing. Note: the set U above could have been compiled! The inability of the Python interpreter to verify the soundness of imported byte-code could possibly result in a DOS attack by crashing the interpreter (Brett Cannon and Eric Wohlstadter).
  • No private namespace: Unlike Java, Python does not contain the ability to restrict an object’s access to another’s private domain. However, there’s a primitive attempt to define a pseudo-private notation. For ex: __BANK_ACCOUNT_BALANCE. This attempt is insufficient if used as a tool to protect an attribute. Note that importing using import <your-module> will get you all the attributes, but some need a little trick to access (see: ClassVariables.py). Since I can always use this import directive with the trick, the notation __<VAR> is useless when there’s a need for an effectively secure programming model in Python.

Programmer tools and libraries to enhance security:

  • Preventing SQL injections: Python is rich in different libraries that will allow a programmer to interface with many popular databases. Although, Python programmers don’t like using SQL directly, there are many functions that come in handy when looking to parse user input and make sure it is safe to execute via an SQL query. Even better, Python has libraries that support ORM.
  • Buffer Overflows: Since objects are allocated dynamically on the heap, the possibility of buffer overflows is minimal in a sound environment. Also bounds checking and other techniques can prevent a programmer from encoutering a buffer overflow. The question that you might be thinking about now is: Is Python buffer overflow proof? Simply put No. I found an elegant answer to this question on a forum post. It can be formulated into a proof by contradiction.
  • The Python interpreter is written in C. Python extension modules are
    written in C (or something similar). If you find an unprotected buffer
    in this C code, you can possibly overflow this buffer. This can be
    used for nasty things like corrupting the stack and injecting
    malicious code. There is a reason why the Python sandbox (rexec and
    Bastion modules) was disabled in Python 2.3.
  • Garbage Collection: Garbage collection in Python is automatic and that helps. Automatic garbage collection is superior to manual garbage whithin a security context. Managing the allocation and deallocation manually could result in logical errors as well as runtime ones (possibly buffer overflows). These errors (depending on how you use them and when you get them) could be a threat to your application.
  • Crypto libraries: There are a bunch of libraries available to Python programmer to use for hashing and other related security needs. You can MD5, SHA-1, and others. (see Python documentation). There are also other open source (and in the standard library) tools that can be used to implement many security solutions (see PyCrypto).
  • Open-source: Python’s source code is published for anyone to look at and modify. Having a large community of volunteer programmers greatly enhances the probability of producing bug-minimum code.

Sources:

  1. http://people.cs.ubc.ca/~drifty/papers/python_security.pdf (Very interesting!)
  2. http://us.pycon.org/common/talkdata/PyCon2007/062/PyCon_2007.pdf
  3. http://docs.python.org/howto/webservers.html?highlight=mysql
  4. http://docs.python.org/c-api/memory.html
  5. http://www.pubbs.net/python/200908/1069/
  6. http://docs.python.org/library/crypto.html
  7. http://www.cs.utexas.edu/users/downing/examples/python/ClassVariables.py.html

Project 3: Voting

September 27th, 2009 by zss93

predicted number of hours to complete: 3
actual number of hours to complete : 5
Voting.out:

http://blogs.utexas.edu/zss93/project-3-votingout

TestVoting.out:

http://blogs.utexas.edu/zss93/project-3-testvotingout

    Primes Project: Summation of Four Primes

    September 13th, 2009 by zss93

    Description:

    For any given number that is less than or equal to ten million, express that number as a sum of four primes (that is, find four primes that add up to it).

    Misc:

    Predicted numbers of hours to complete: 2.

    Actual hours to complete:2.

    Output:

    Primes.out: http://blogs.utexas.edu/zss93/project-2-primesout/

    TestPrimes.out: http://blogs.utexas.edu/zss93/project-2-testprimesout/

    XP Installed Chapter Eleven Summary

    September 9th, 2009 by zss93

    Programming:

    • XP programming is carried out using the pair programming techniques. Partners can change several times a day.
    • Pair programming involves a pair of programmers and one computer (shared).
    • After completing each task the programmers need to integrate it into the release code after testing it.
    • Integrate often and commit the source for the tasks and include yours tests with that. Refer to Continuous Integration for more information.
    • Collective Code Ownership: means that every programmer has to own all the code and have the ability to improve it at any time.
    • Every XP team needs to have a Coding Standard and adhere to it as much as possible.
    • Simple Design:
    • A design that runs all the tests and passes with a 100% score.
    • Contains no duplicate code.
    • Expresses and implements ideas that you wish to express and implement.
    • Has a minimum number of classes and methods.
    • Refactoring is the process by which programmers improve on the implementation of the code while maintaining the features that it had.